Last modified: February 05, 2025

This article is written in: 🇺🇸

Confidence Intervals

Confidence intervals (CIs) provide a range of values which are believed, with a certain degree of confidence, to contain a population parameter, like the mean or proportion. They are constructed from a sampled data set and offer an interval estimate for the parameter of interest.

Definition and Components

A confidence interval (CI) provides a range of values within which we expect a population parameter (such as a mean or proportion) to lie, based on sample data. It is typically expressed as:

(Point EstimateMargin of Error, Point Estimate+Margin of Error)

Where:

The formula for the margin of error is:

Margin of Error=Critical Value×Standard Error

The confidence level (e.g., 90%, 95%, or 99%) reflects how confident we are that the interval contains the true population parameter. A higher confidence level implies a wider interval, offering greater certainty but less precision.

Example: Confidence Interval for the Mean

Suppose we generate sample data from a population with a true mean of 50 and a standard deviation of 5. Based on this sample, we calculate the sample mean and construct a confidence interval around it.

confidence_interval_mean_example

In this example:

This interval suggests that we can be 95% confident that the true mean lies between the lower and upper bounds. Note that the true population mean may or may not fall within this interval in any particular sample, but over many repeated samples, 95% of such intervals will capture the true mean.

Example: Confidence Intervals for Simulated Stock Returns

confidence_interval_stock_returns

The plot above shows confidence intervals for simulated stock returns at various confidence levels (90%, 95%, and 99%). Here’s what the visualization shows:

The horizontal line at zero helps indicate whether the confidence intervals capture positive or negative stock returns.

Key Insights:

This demonstrates the key trade-off in confidence intervals, namely as you increase the confidence level, the interval becomes wider, sacrificing precision to gain certainty.

Confidence Interval Construction

Suppose 62% of 150 million likely voters approve of the President. A Gallup poll surveys 1,200 voters to estimate this percentage, but the result includes some sampling error. The standard error (SE) quantifies this error.

Step 1: Use the Central Limit Theorem (CLT)

According to the CLT, the sample proportion (or mean) follows a normal distribution. For example, in this case, where 62% of likely voters approve of the President's performance, we can calculate the standard error (SE) for the sample proportion as follows:

SE=p(1p)n

Where:

Substitute the values into the formula:

SE=0.62×0.381,200=0.23561,200=0.0001963=0.014 or 1.4%

This means the standard error is approximately 1.4%.

Step 2: Construct the 95% Confidence Interval

According to the empirical rule (or 68-95-99.7 rule), there is a 95% chance that the sample proportion is within 2 standard errors (SEs) of the true population proportion.

For example, if the sample result from the Gallup poll is 60%, we can calculate the 95% confidence interval as:

Confidence Interval=[p^±2×SE]

Where:

Thus, the confidence interval is:

60%±2×1.4%=60%±2.8%=57.2% to 62.8%

This means we are 95% confident that the true population approval percentage falls between 57.2% and 62.8%.

Step 3: Understanding "Confidence" vs. "Probability"

The term "95% confidence" refers to the long-run success of this sampling method. Specifically:

Confidence Level and Z-Scores

The confidence level represents how confident we are that a calculated interval contains the true population parameter (e.g., mean or proportion). Common confidence levels and their corresponding Z-scores are:

The formula for constructing a confidence interval is:

Confidence Interval=estimate±z×SE

Where:

Bootstrap Principle for Estimating Standard Error (SE)

When the population standard deviation (σ) is unknown, we can use the bootstrap principle to estimate it. The bootstrap principle uses the sample standard deviation (s) as an approximation for the unknown population standard deviation (σ).

Example: In a survey where 60% of the sample approves of the President’s job, the standard error (SE) can be estimated as follows:

SE=p(1p)n

Where:

Substitute the values:

SE=0.60×0.401,000=0.241,000=0.00024=0.0155 or 1.55%

Constructing the Confidence Interval

Using the 95% confidence level (z=1.96), the confidence interval is calculated as:

Confidence Interval=60%±1.96×1.55%

Confidence Interval=60%±3.04%=56.96% to 63.04%

This means we are 95% confident that the true approval rating lies between 56.96% and 63.04%.

Width of Confidence Intervals and Margin of Error

The width of a confidence interval is determined by the margin of error, which is calculated as:

Margin of Error=z×SE

Where:

Impact:

Confidence Level and Interpretation

The confidence level indicates the proportion of confidence intervals, constructed from repeated samples, that would capture the true population parameter. Common confidence levels are 90%, 95%, and 99%. For example, a 95% confidence level means that if we were to take 100 different samples and calculate confidence intervals (CIs) for each, we expect about 95 of those intervals to contain the true population parameter.

Example: Health App Steps Count

Consider two health apps estimating daily steps:

App Y, by including a confidence interval, acknowledges the inherent variability in the data and offers a more reliable range, assuming its data collection methods are sound.

95% Confidence Interval for a Parameter

When the distribution of the point estimate follows the Central Limit Theorem (i.e., when sample sizes are sufficiently large), the estimate tends to follow a normal distribution. In this case, we can construct a 95% confidence interval using the formula:

Point Estimate±1.96×SE

The value 1.96 comes from the standard normal distribution, where 95% of the data falls within 1.96 standard deviations from the mean.

Example: Carry-on Baggage Weight

Suppose the average weight of a sample of carry-on bags is 3.2 kg with a standard error (SE) of 0.053 kg. To calculate the 95% CI:

Confidence Interval=Sample Mean±(Critical Value×Standard Error)

For a 95% confidence level, the critical value (Z-score) is approximately 1.96. Therefore:

Confidence Interval=3.2±(1.96×0.053)=(3.09612, 3.30388)

This interval suggests that the true mean weight of all carry-on bags is likely between 3.09612 kg and 3.30388 kg, with 95% confidence.

Example: Calibrating a Digital Thermometer

When calibrating a digital thermometer, it's important to determine an error range such that the thermometer reflects the actual temperature with 95% accuracy. For a normal distribution, 95% of the data lies within 1.96 standard deviations from the mean. Thus, the calibration error should be set to 1.96 times the standard error (SE) around the expected mean reading.

The formula for a 95% Confidence Interval (CI) for the thermometer’s temperature reading (T) is:

CI=T±Z×SE

Where: - T is the mean temperature reading. - Z is the Z-score for the 95% confidence level, which is 1.96. - SE is the standard error, calculated as:

SE=σn

Where σ is the standard deviation of the thermometer’s readings and n is the number of trials.

Suppose the thermometer was tested over 30 trials, yielding a mean reading (T) of 37.0°C with a standard deviation (σ) of 0.5°C. We calculate the standard error (SE) as:

SE=0.5300.091

Now, using this SE to find the 95% CI:

CI=37.0±(1.96×0.091)37.0±0.178=(36.822,37.178)

Thus, the acceptable calibration range for the thermometer is from 36.822°C to 37.178°C, meaning we can be 95% confident that the true temperature lies within this interval.

Misconceptions and Clarifications

A common misconception about confidence intervals (CIs) is to interpret them as saying, "There is a 95% chance that the true mean lies within this specific interval." This is incorrect because the true population parameter (like the mean) is fixed, not random. The interval calculated from the sample data either contains the true mean, or it does not—there's no probability attached to this fact for a single interval.

The correct interpretation is that if we were to repeatedly sample from the population and construct a confidence interval for each sample, 95% of those intervals would contain the true mean (for a 95% confidence level). This reflects the long-term reliability of the method used to create the intervals, not the probability that a specific interval from a single sample contains the true mean.

Changing the Confidence Level

If the point estimate follows a normal distribution with a known standard error (SE), the confidence interval for the population parameter is given by:

Point Estimate±z×SE

Where: - $z^isthecriticalvaluecorrespondingtotheselectedconfidencelevel(e.g.,1.645for90z^ \times SE$ is the margin of error, which determines the width of the confidence interval.

Example: Blood Pressure Readings

Consider a case where the blood pressure readings of a group of patients are normally distributed, but the population mean is unknown. The population standard deviation is 12 mmHg, and a random sample of 50 patients has a sample mean of 130 mmHg. We will calculate the 90% and 99% confidence intervals for the population mean blood pressure.

Step 1: Calculate the Standard Error (SE)

The standard error is calculated as:

SE=σn=12501.697

Step 2: Determine Critical Values (z)
Step 3: Calculate the Confidence Intervals

90% Confidence Interval:

130±1.645×1.697=130±2.79=(127.21,132.79)

99% Confidence Interval:

130±2.576×1.697=130±4.37=(125.63,134.37)

Interpretation of Results

Note how the 99% confidence interval is wider than the 90% confidence interval. This reflects the trade-off between certainty (higher confidence levels) and precision (narrower intervals).

Table of Contents

    Confidence Intervals
    1. Definition and Components
      1. Example: Confidence Interval for the Mean
      2. Example: Confidence Intervals for Simulated Stock Returns
    2. Confidence Interval Construction
      1. Step 1: Use the Central Limit Theorem (CLT)
      2. Step 2: Construct the 95% Confidence Interval
      3. Step 3: Understanding "Confidence" vs. "Probability"
    3. Confidence Level and Z-Scores
    4. Bootstrap Principle for Estimating Standard Error (SE)
      1. Constructing the Confidence Interval
    5. Width of Confidence Intervals and Margin of Error
    6. Confidence Level and Interpretation
    7. Example: Health App Steps Count
    8. 95% Confidence Interval for a Parameter
    9. Example: Carry-on Baggage Weight
    10. Example: Calibrating a Digital Thermometer
    11. Misconceptions and Clarifications
      1. Changing the Confidence Level
      2. Example: Blood Pressure Readings
      3. Interpretation of Results