Last modified: December 17, 2022

This article is written in: πŸ‡ΊπŸ‡Έ

Logistic Regression

Logistic regression is a statistical method used for modeling the probability of a binary outcome based on one or more predictor variables. It is widely used in various fields such as medicine, social sciences, and machine learning for classification problems where the dependent variable is dichotomous (e.g., success/failure, yes/no, positive/negative).

The Logistic Regression Model

Binary Outcomes

In logistic regression, the dependent variable y takes on binary values:

yi={1if event occurs (e.g., success)0if event does not occur (e.g., failure)

Logistic Function (Sigmoid Function)

The logistic function models the probability that y=1 given the predictor variables x1,x2,…,xp:

p(x)=P(y=1∣x)=11+eβˆ’(Ξ²0+Ξ²1x1+Ξ²2x2+β‹―+Ξ²pxp)

This function maps any real-valued number into a value between 0 and 1, making it suitable for probability estimation.

Odds and Logit Function

The odds of an event occurring is the ratio of the probability that the event occurs to the probability that it does not occur.

Odds(x)=p(x)1βˆ’p(x)

The natural logarithm of the odds is called the logit function.

logit(p(x))=ln⁑(p(x)1βˆ’p(x))=Ξ²0+Ξ²1x1+Ξ²2x2+β‹―+Ξ²pxp

The logit function establishes a linear relationship between the predictor variables and the log-odds of the outcome.

Interpretation of Coefficients

Exponentiating the coefficients provides the odds ratios:

Odds Ratio for xj=eΞ²j

An odds ratio greater than 1 indicates that as xj increases, the odds of the outcome occurring increase.

Estimation of Coefficients

Unlike linear regression, logistic regression does not have a closed-form solution for estimating the coefficients. Instead, coefficients are estimated using Maximum Likelihood Estimation (MLE).

Likelihood Function

Given n independent observations, the likelihood function L(Ξ²) is the product of the probabilities of observing the data:

L(Ξ²)=∏i=1n[p(xi)]yi[1βˆ’p(xi)]1βˆ’yi

Log-Likelihood Function

Taking the natural logarithm simplifies the product into a sum:

β„“(Ξ²)=ln⁑L(Ξ²)=βˆ‘i=1n[yiln⁑p(xi)+(1βˆ’yi)ln⁑(1βˆ’p(xi))]

Maximizing the Log-Likelihood

The goal is to find the coefficients Ξ² that maximize β„“(Ξ²). This is typically done using iterative numerical methods such as:

Newton-Raphson Method

An iterative method that updates the coefficient estimates using:

Ξ²(k+1)=Ξ²(k)βˆ’[H(Ξ²(k))]βˆ’1g(Ξ²(k))

Where:

Assumptions of Logistic Regression

  1. The binary outcome indicates that the dependent variable has two possible values.
  2. Independent observations ensure that each data point is independent of the others.
  3. Linearity in log-odds means that the logit of the outcome is a linear function of the predictor variables.
  4. No multicollinearity ensures that the predictor variables are not highly correlated with each other.
  5. A large sample size is preferred, as maximum likelihood estimation (MLE) performs better with more data.

Example

Problem Statement

We aim to predict whether a student will pass a test (y=1) or fail (y=0) based on:

Data

Student (i) Hours Studied (x1i) Practice Exams (x2i) Pass (yi)
1 2 1 0
2 4 2 0
3 5 2 1
4 6 3 1
5 8 4 1

Step-by-Step Estimation

1. Initial Setup

We need to estimate Ξ²0, Ξ²1, and Ξ²2 in the logistic regression model:

p(x)=11+eβˆ’(Ξ²0+Ξ²1x1+Ξ²2x2)

2. Construct the Design Matrix and Response Vector

Let X be the design matrix including the intercept term:

X=[1x11x211x12x221x13x231x14x241x15x25]=[121142152163184]

Response vector y:

y=[00111]

3. Initialize Coefficients

Start with initial guesses for Ξ², say:

Ξ²(0)=[Ξ²0(0)Ξ²1(0)Ξ²2(0)]=[000]

4. Iterative Optimization (Simplified Explanation)

Due to the complexity of MLE calculations, we will provide a simplified explanation. In practice, statistical software performs these calculations.

Iteration Steps:

pi(k)=11+eβˆ’(Ξ²0(k)+Ξ²1(k)x1i+Ξ²2(k)x2i)

5. Estimated Coefficients

Assuming the optimization converges, we obtain estimated coefficients (using statistical software):

Ξ²^0=βˆ’9.28,Ξ²^1=1.23,Ξ²^2=0.98

6. Logistic Regression Equation

p(x)=11+eβˆ’(βˆ’9.28+1.23x1+0.98x2)

Interpretation of Coefficients

Intercept (Ξ²0=βˆ’9.28): The log-odds of passing when x1=0 and x2=0.

Coefficient for Hours Studied (Ξ²1=1.23): For each additional hour studied, the log-odds of passing increase by 1.23 units, holding x2 constant.

Odds Ratio:

e1.23β‰ˆ3.42

The odds of passing are approximately 3.42 times higher for each additional hour studied.

Coefficient for Practice Exams (Ξ²2=0.98): For each additional practice exam taken, the log-odds of passing increase by 0.98 units, holding x1 constant.

Odds Ratio:

e0.98β‰ˆ2.66

The odds of passing are approximately 2.66 times higher for each additional practice exam taken.

Predicting Probabilities

Predict for a Student Who Studied 5 Hours and Took 2 Practice Exams

Compute the probability of passing:

p=11+eβˆ’(βˆ’9.28+1.23Γ—5+0.98Γ—2)=11+eβˆ’(βˆ’9.28+6.15+1.96)=11+eβˆ’(βˆ’1.17)=11+e1.17β‰ˆ0.24

The probability of passing is approximately 24%.

Predict for a Student Who Studied 7 Hours and Took 3 Practice Exams

p=11+eβˆ’(βˆ’9.28+1.23Γ—7+0.98Γ—3)=11+eβˆ’(βˆ’9.28+8.61+2.94)=11+eβˆ’2.27=11+eβˆ’2.27β‰ˆ0.906

The probability of passing is approximately 90.6%.

Plot:

output(14)

Model Evaluation

Confusion Matrix

Using a threshold of 0.5 to classify predictions:

Student (i) yi Predicted pi Predicted Class (y^i)
1 0 p1β‰ˆ0.005 0
2 0 p2β‰ˆ0.046 0
3 1 p3β‰ˆ0.24 0
4 1 p4β‰ˆ0.82 1
5 1 p5β‰ˆ0.99 1

Accuracy

Accuracy=TP+TNTotal=2+25=0.8

Precision

Precision=TPTP+FP=22+0=1.0

Recall (Sensitivity)

Recall=TPTP+FN=22+1=23β‰ˆ0.667

F1 Score

F1 Score=2Γ—PrecisionΓ—RecallPrecision+Recall=2Γ—1.0Γ—0.6671.0+0.667β‰ˆ0.8

Receiver Operating Characteristic (ROC) Curve

Table of Contents

  1. The Logistic Regression Model
    1. Binary Outcomes
    2. Logistic Function (Sigmoid Function)
    3. Odds and Logit Function
    4. Interpretation of Coefficients
  2. Estimation of Coefficients
    1. Likelihood Function
    2. Log-Likelihood Function
    3. Maximizing the Log-Likelihood
    4. Newton-Raphson Method
  3. Assumptions of Logistic Regression
  4. Example
    1. Problem Statement
    2. Data
    3. Step-by-Step Estimation
      1. 1. Initial Setup
      2. 2. Construct the Design Matrix and Response Vector
      3. 3. Initialize Coefficients
      4. 4. Iterative Optimization (Simplified Explanation)
      5. 5. Estimated Coefficients
      6. 6. Logistic Regression Equation
    4. Interpretation of Coefficients
    5. Predicting Probabilities
      1. Predict for a Student Who Studied 5 Hours and Took 2 Practice Exams
      2. Predict for a Student Who Studied 7 Hours and Took 3 Practice Exams
    6. Model Evaluation
      1. Confusion Matrix
      2. Accuracy
      3. Precision
      4. Recall (Sensitivity)
      5. F1 Score
    7. Receiver Operating Characteristic (ROC) Curve