Least Squares Method For Linear Regression

Part of series of blogs under BSoC

Jun 04, 2023

Loss function for the algorithm?

loss function in machine learning is simply a measure of how different the predicted value is from the actual value.
The quadratic Loss Function calculates the loss or error in our model. It can be defined as:

\(L(x)=\Sigma_{i=1}^n (yᵢ-pᵢ)²\)

Minimizing it by finding the partial derivative of L, equating it to 0, and then finding an expression for m and c. After we do the math, we are left with these equations:

\(c= ȳ-mx̄\)

\(m=\frac{\Sigma_{i=1}^n (xᵢ-x̄)(yᵢ- ȳ)}{\Sigma_{i=1}^n (xᵢ-x̄)²}\)

Least Squares using matrices

The idea is to find the set of parameters (Θ) such that:

\(X θ = y\)

where,

X: input data with dimensions (n,m).
Θ: parameters with dimensions (m,1).
y: output data with dimensions (n,1).
n: number of samples.
m: number of features.

The parameter matrix Θ can be directly determined by multiplying both sides of the equation with the inverse of X, as shown below:

\(θ=X⁻¹y\)

But because X might be a non-square matrix, its inverse may not be defined.

Note: Inverse still may not exist for a square matrix if it is a singular matrix

To resolve this, first, we multiply with the transpose of X on both sides, as shown below:

\(XᵀXθ=Xᵀy\)

This makes the product of X with its transpose, a square matrix.

The obtained matrix, being square, can be inverted. Taking the collective inverse of the product to get the following:

\(θ=(XᵀX)⁻¹ Xᵀy\)

No randomness. Thus, it will always return the same solution, which is also optimal.

This is precisely what the Linear Regression class of Sklearn implements, Instead of gradient descent.

Head over here to see projects being mentored over at BSoC
https://bitbytesummerofcode.netlify.app/

Aryan’s Substack

Discussion about this post