Last modified: February 07, 2025
This article is written in: 🇺🇸
Logistic regression is a statistical method used for modeling the probability of a binary outcome based on one or more predictor variables. It is widely used in various fields such as medicine, social sciences, and machine learning for classification problems where the dependent variable is dichotomous (e.g., success/failure, yes/no, positive/negative).
In logistic regression, the dependent variable $y $ takes on binary values:
$$ y_i = \begin{cases} 1 & \text{if event occurs (e.g., success)} \\ 0 & \text{if event does not occur (e.g., failure)} \end{cases} $$
The logistic function models the probability that $y = 1$ given the predictor variables $x_1, x_2, \dots, x_p $:
$$ p(x) = P(y = 1 \mid x) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_p x_p)}} $$
This function maps any real-valued number into a value between 0 and 1, making it suitable for probability estimation.
The odds of an event occurring is the ratio of the probability that the event occurs to the probability that it does not occur.
$$ \text{Odds}(x) = \frac{p(x)}{1 - p(x)} $$
The natural logarithm of the odds is called the logit function.
$$ \text{logit}(p(x)) = \ln \left( \frac{p(x)}{1 - p(x)} \right) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_p x_p $$
The logit function establishes a linear relationship between the predictor variables and the log-odds of the outcome.
Exponentiating the coefficients provides the odds ratios:
$$ \text{Odds Ratio for } x_j = e^{\beta_j} $$
An odds ratio greater than 1 indicates that as $x_j $ increases, the odds of the outcome occurring increase.
Unlike linear regression, logistic regression does not have a closed-form solution for estimating the coefficients. Instead, coefficients are estimated using Maximum Likelihood Estimation (MLE).
Given $n $ independent observations, the likelihood function $L(\boldsymbol{\beta})$ is the product of the probabilities of observing the data:
$$ L(\boldsymbol{\beta}) = \prod_{i=1}^{n} [p(x_i)]^{y_i} [1 - p(x_i)]^{1 - y_i} $$
Taking the natural logarithm simplifies the product into a sum:
$$ \ell(\boldsymbol{\beta}) = \ln L(\boldsymbol{\beta}) = \sum_{i=1}^{n} \left[ y_i \ln p(x_i) + (1 - y_i) \ln (1 - p(x_i)) \right] $$
The goal is to find the coefficients $\boldsymbol{\beta}$ that maximize $\ell(\boldsymbol{\beta})$. This is typically done using iterative numerical methods such as:
An iterative method that updates the coefficient estimates using:
$$ \boldsymbol{\beta}^{(k+1)} = \boldsymbol{\beta}^{(k)} - [\mathbf{H}(\boldsymbol{\beta}^{(k)})]^{-1} \mathbf{g}(\boldsymbol{\beta}^{(k)}) $$
Where:
We aim to predict whether a student will pass a test ($y = 1$) or fail ($y = 0 $) based on:
Student ($i $) | Hours Studied ($x_{1i}$) | Practice Exams ($x_{2i}$) | Pass ($y_i $) |
1 | 2 | 1 | 0 |
2 | 4 | 2 | 0 |
3 | 5 | 2 | 1 |
4 | 6 | 3 | 1 |
5 | 8 | 4 | 1 |
We need to estimate $\beta_0 $, $\beta_1$, and $\beta_2$ in the logistic regression model:
$$ p(x) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + \beta_2 x_2)}} $$
Let $\mathbf{X}$ be the design matrix including the intercept term:
$$ \mathbf{X} = \begin{bmatrix} 1 & x_{11} & x_{21} \\ 1 & x_{12} & x_{22} \\ 1 & x_{13} & x_{23} \\ 1 & x_{14} & x_{24} \\ 1 & x_{15} & x_{25} \\ \end{bmatrix} = \begin{bmatrix} 1 & 2 & 1 \\ 1 & 4 & 2 \\ 1 & 5 & 2 \\ 1 & 6 & 3 \\ 1 & 8 & 4 \\ \end{bmatrix} $$
Response vector $\mathbf{y}$:
$$ \mathbf{y} = \begin{bmatrix} 0 \\ 0 \\ 1 \\ 1 \\ 1 \\ \end{bmatrix} $$
Start with initial guesses for $\boldsymbol{\beta}$, say:
$$ \boldsymbol{\beta}^{(0)} = \begin{bmatrix} \beta_0^{(0)} \\ \beta_1^{(0)} \\ \beta_2^{(0)} \\ \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ \end{bmatrix} $$
Due to the complexity of MLE calculations, we will provide a simplified explanation. In practice, statistical software performs these calculations.
Iteration Steps:
$$ p_i^{(k)} = \frac{1}{1 + e^{-(\beta_0^{(k)} + \beta_1^{(k)} x_{1i} + \beta_2^{(k)} x_{2i})}} $$
Assuming the optimization converges, we obtain estimated coefficients (using statistical software):
$$ \hat{\beta}_0 = -9.28, \quad \hat{\beta}_1 = 1.23, \quad \hat{\beta}_2 = 0.98 $$
$$ p(x) = \frac{1}{1 + e^{-(-9.28 + 1.23 x_1 + 0.98 x_2)}} $$
Intercept ($\beta_0 = -9.28$): The log-odds of passing when $x_1 = 0 $ and $x_2 = 0 $.
Coefficient for Hours Studied ($\beta_1 = 1.23$): For each additional hour studied, the log-odds of passing increase by 1.23 units, holding $x_2$ constant.
Odds Ratio:
$$ e^{1.23} \approx 3.42 $$
The odds of passing are approximately 3.42 times higher for each additional hour studied.
Coefficient for Practice Exams ($\beta_2 = 0.98$): For each additional practice exam taken, the log-odds of passing increase by 0.98 units, holding $x_1$ constant.
Odds Ratio:
$$ e^{0.98} \approx 2.66 $$
The odds of passing are approximately 2.66 times higher for each additional practice exam taken.
Compute the probability of passing:
$$ p = \frac{1}{1 + e^{-(-9.28 + 1.23 \times 5 + 0.98 \times 2)}} = \frac{1}{1 + e^{-(-9.28 + 6.15 + 1.96)}} = \frac{1}{1 + e^{-(-1.17)}} = \frac{1}{1 + e^{1.17}} \approx 0.24 $$
The probability of passing is approximately 24%.
$$ p = \frac{1}{1 + e^{-(-9.28 + 1.23 \times 7 + 0.98 \times 3)}} = \frac{1}{1 + e^{-(-9.28 + 8.61 + 2.94)}} = \frac{1}{1 + e^{-2.27}} = \frac{1}{1 + e^{-2.27}} \approx 0.906 $$
The probability of passing is approximately 90.6%.
Plot:
Using a threshold of 0.5 to classify predictions:
Student ($i $) | $y_i $ | Predicted $p_i $ | Predicted Class ($\hat{y}_i $) |
1 | 0 | $p_1 \approx 0.005 $ | 0 |
2 | 0 | $p_2 \approx 0.046 $ | 0 |
3 | 1 | $p_3 \approx 0.24 $ | 0 |
4 | 1 | $p_4 \approx 0.82$ | 1 |
5 | 1 | $p_5 \approx 0.99 $ | 1 |
$$ \text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{Total}} = \frac{2 + 2}{5} = 0.8 $$
$$ \text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}} = \frac{2}{2 + 0} = 1.0 $$
$$ \text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}} = \frac{2}{2 + 1} = \frac{2}{3} \approx 0.667 $$
$$ \text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} = 2 \times \frac{1.0 \times 0.667}{1.0 + 0.667} \approx 0.8 $$