Last modified: March 07, 2022

This article is written in: 🇺🇸

Least Squares Regression

Least Squares Regression is a fundamental technique in statistical modeling and data analysis used for fitting a model to observed data. The primary goal is to find a set of parameters that minimize the discrepancies (residuals) between the model’s predictions and the actual observed data. The "least squares" criterion is chosen because it leads to convenient mathematical properties and closed-form solutions, particularly for linear models.

In its simplest form, least squares regression is applied to linear regression, where we assume a linear relationship between a set of input variables (features) X and an output variable Y. More generally, it can be extended to polynomial regression and multiple linear regression with multiple input variables. Because of its simplicity, transparency, and relative mathematical convenience, least squares remains one of the most widely used techniques in data analysis.

output(31)

Mathematical Formulation

Given a matrix of features XRm×n (with m observations and n features), and a vector of target variables YRm, we seek a coefficient vector βRn that best fits the data in the sense of minimizing the sum of squared residuals. If we include a column of ones in X to represent the intercept term, β naturally includes the intercept as well.

We model:

ˆY=Xβ.

Objective: Minimize the Residual Sum of Squares (RSS):

RSS(β)=YXβ22=(YXβ)(YXβ).

The goal is to find β that solves:

minβYXβ22.

By setting the gradient of this objective with respect to β to zero, we obtain the Normal Equation:

XXβ=XY.

Provided XX is invertible, we have a closed-form solution:

β=(XX)1XY.

This β is the least squares estimate of the coefficient vector, ensuring that the fitted line (or hyperplane, in the multi-dimensional case) is the best fit in the least squares sense.

Derivation

I. Set up the Problem:

Suppose we have observations (xi,yi)mi=1, where xiRn is a vector of features for the i-th observation and yi is the response. We assume a linear model:

$$\hat{y}i = \sum{j=1}^n \beta_j x_{ij} = x_i^\top \beta,$$ or in matrix form:

ˆY=Xβ, where X is the m×n matrix with rows xi, and YRm is the vector of observed responses.

II. Defining the Error to Minimize:

We define the residuals as:

r=YXβ.

The objective is to minimize:

RSS(β)=rr=(YXβ)(YXβ).

III. Finding the Minimum:

To minimize with respect to β, take the gradient and set it to zero:

RSSβ=2X(YXβ)=0.

This implies:

XYXXβ=0XXβ=XY.

IV. Solving the Normal Equation:

If XX is invertible:

β=(XX)1XY.

This formula provides a closed-form solution for the ordinary least squares estimator β.

Algorithm Steps

I. Data Preparation:

II. Compute Matrices:

Compute XX and XY.

III. Check Invertibility:

IV. Solve for β:

β=(XX)1XY.

V. Use the Model for Prediction:

For a new input xnew, predict:

$$\hat{y}{\text{new}} = x{\text{new}}^\top \beta.$$

Example

Given Data Points: (x,y) = (1,1),(2,2),(3,2).

Step-by-step:

I. Add an intercept term:

X=[111213]

Y=[1 2 2]

II. Compute:

XX=[36 614]

XY=[5 12]

III. Invert XX:

(XX)1=[21 10.5]

IV. Compute β:

β=(XX)1XY=[0.5 0.5]

Thus, the fitted line is:

ˆy=0.5+0.5x

Advantages

Limitations

Table of Contents

    Least Squares Regression
    1. Mathematical Formulation
    2. Derivation
    3. Algorithm Steps
    4. Example
    5. Advantages
    6. Limitations