Last modified: February 07, 2025

This article is written in: 🇺🇸

Simulations in Statistical Inference

Statistical inference often involves estimating population parameters and constructing confidence intervals based on sample data. Traditional methods rely on assumptions about the sampling distribution of estimators, such as normality and known standard errors. However, these assumptions may not hold, especially with small sample sizes or complex estimators. Simulations, like the Monte Carlo method and bootstrap techniques, offer powerful alternatives to traditional inference by using computational methods to approximate sampling distributions and estimate standard errors.

Confidence Intervals for the Population Mean

Traditional Confidence Interval

For a population mean $\mu$, the traditional confidence interval is:

$$ \bar{x} \pm z_{\alpha/2} \cdot SE(\bar{x}) $$

where:

Limitations of Traditional Methods

Simulations in Statistical Inference

Simulations provide a way to estimate the sampling distribution of an estimator $\hat{\theta}$ without relying on strict theoretical assumptions.

The Monte Carlo Method

The Monte Carlo method uses random sampling from a known distribution to approximate numerical results, particularly for estimating parameters and their variability.

Estimating an Unknown Parameter $\theta$

Estimating the Standard Error $SE(\hat{\theta})$

$$ SE(\hat{\theta}) \approx \sqrt{\frac{1}{B - 1} \sum_{b=1}^{B} (\hat{\theta}_b - \bar{\hat{\theta}})^2} $$

where $\bar{\hat{\theta}} = \frac{1}{B} \sum_{b=1}^{B} \hat{\theta}_b$.

Advantages

Here’s the improved version with the plot reference included and a better name for it:

Example: Estimating Standard Error Using Monte Carlo Simulation

In this example, we estimate the standard error of the sample mean $\hat{\theta}$ using a Monte Carlo approach. This method involves generating multiple independent samples from a population, calculating the mean $\hat{\theta_b}$ for each sample, and using these means to estimate the standard error.

Steps: - Draw $B$ independent samples of size $n$ from a normal population. - Calculate the sample mean $\hat{\theta_b}$ for each sample. - Estimate the standard error $SE(\hat{\theta})$ using the formula:

$$ SE(\hat{\theta}) \approx \sqrt{\frac{1}{B - 1} \sum_{b = 1}^{B} (\hat{\theta_b} - \bar{\theta})^2} $$

where $\bar{\theta} = \frac{1}{B} \sum_{b = 1}^{B} \hat{\theta_b}$ is the average of all sample means.

Parameters:

Monte Carlo Simulation:

Estimate of Standard Error: - Calculate the overall mean of the sample means $\bar{\theta}$. - Use the formula for standard error estimation from the sample means.

Sample Means Distribution with Monte Carlo Estimated Standard Error

Estimated Standard Error (Monte Carlo): 2.7208

The Bootstrap Principle

The Plug-in Principle

Bootstrap Procedure

  1. Start with the original sample, denoted as $X = { X_1, X_2, \dots, X_n }$.
  2. Generate bootstrap samples by sampling with replacement from $X$ to create $B$ bootstrap samples $X^{*b}$, each of size $n$.
  3. For each bootstrap sample $X^{b}$, compute bootstrap estimates $\hat{\theta}^_b$.
  4. Finally, estimate the standard error using the formula:

$$ SE_{boot}(\hat{\theta}) = \sqrt{\frac{1}{B-1}\sum_{b=1}^{B}\left(\hat{\theta}_b - \bar{\hat{\theta}}\right)^2} $$

with

$$ \bar{\hat{\theta}} = \frac{1}{B}\sum_{b=1}^{B}\hat{\theta}_b. $$

Bootstrap Confidence Intervals

Bootstrapping allows construction of confidence intervals without relying on normality or known standard errors.

1. Normal Approximation Interval

$$ [\hat{\theta} - z_{\alpha/2} \cdot SE_{\text{boot}}(\hat{\theta}), \quad \hat{\theta} + z_{\alpha/2} \cdot SE_{\text{boot}}(\hat{\theta})] $$

This approach is appropriate for symmetric distributions and large sample sizes.

2. Bootstrap Percentile Interval

$$ [\hat{\theta}{\alpha/2},\quad \hat{\theta}{1-\alpha/2}] $$

3. Bootstrap Pivotal Interval

$$[2\hat{\theta} - \hat{\theta}{1-\alpha/2},\quad 2\hat{\theta} - \hat{\theta}{\alpha/2}]$$

This interval is more accurate for skewed distributions.

Bootstrapping for Regression

Bootstrapping can estimate the variability of regression coefficients when traditional assumptions (like normality of errors) may not hold.

Simple Linear Regression Model

$$ Y_i = a + b X_i + e_i, \quad i = 1, 2 \dots n $$

Residual Resampling

$$ \hat{e}_i = Y_i - \hat{a} - \hat{b}\, X_i. $$

$$ Y_i^ = \hat{a} + \hat{b}\, X_i + \hat{e}_i^. $$

Case Resampling (Pairs Method)

Wild Bootstrap

$$ Y_i^* = \hat{a} + \hat{b}\, X_i + \hat{e}_i\, \eta_i $$

to account for this variability.

Example: Estimating Variability of Regression Slope Using Residual Resampling

In this example, we demonstrate the Residual Resampling method to estimate the variability of the regression slope $b$.

Steps:

$$ \hat{e}_i = Y_i - \hat{a} - \hat{b}\, X_i. $$

Distribution of Slope Estimates from Residual Resampling

The initial linear regression model provided an estimate for the slope of approximately $\hat{b} = 1.954$.

Bootstrap Resampling:

Practical Considerations

Number of Bootstrap Samples $B$

Assumptions and Limitations

Table of Contents

    Simulations in Statistical Inference
    1. Confidence Intervals for the Population Mean
      1. Traditional Confidence Interval
      2. Limitations of Traditional Methods
    2. Simulations in Statistical Inference
    3. The Monte Carlo Method
      1. Estimating an Unknown Parameter $\theta$
      2. Estimating the Standard Error $SE(\hat{\theta})$
      3. Advantages
    4. Example: Estimating Standard Error Using Monte Carlo Simulation
    5. The Bootstrap Principle
      1. The Plug-in Principle
      2. Bootstrap Procedure
    6. Bootstrap Confidence Intervals
      1. 1. Normal Approximation Interval
      2. 2. Bootstrap Percentile Interval
      3. 3. Bootstrap Pivotal Interval
    7. Bootstrapping for Regression
      1. Simple Linear Regression Model
      2. Residual Resampling
      3. Case Resampling (Pairs Method)
      4. Wild Bootstrap
      5. Example: Estimating Variability of Regression Slope Using Residual Resampling
    8. Practical Considerations
      1. Number of Bootstrap Samples $B$
      2. Assumptions and Limitations