Last modified: May 18, 2022

This article is written in: 🇺🇸

The Normal Curve and Z-Scores

The Normal Curve

A normal distribution (often referred to as the normal curve or Gaussian distribution) is a continuous probability distribution that is symmetric about the mean, where most of the observations cluster around the central peak and taper off symmetrically towards both ends. Many real-world datasets such as human heights, IQ scores, and measurement errors exhibit this kind of distribution.

The probability density function (PDF) of the normal distribution is given by:

$$ f(x | \mu, \sigma^2) = \frac{1}{\sigma \sqrt{2 \pi}} e^{ -\frac{(x - \mu)^2}{2\sigma^2} } $$

where:

This formula describes the shape of the curve mathematically. The mean $\mu$ determines the center of the distribution, and the standard deviation $\sigma$ determines the width of the bell curve.

output

Examples

The Empirical Rule (68-95-99.7 Rule)

The Empirical Rule provides a rough estimate for the spread of data in a normal distribution. It applies to any dataset that approximately follows the normal distribution.

Example

Suppose we have a dataset where the heights of fathers are normally distributed with:

According to the Empirical Rule:

These intervals can be visualized as follows:

$$ \text{Interval} \quad \mu \pm n\sigma \quad \text{Proportion of Data Contained} $$

$$ \mu \pm \sigma \quad (66.5 \text{ to } 70.1) \quad \approx 68\% $$

$$ \mu \pm 2\sigma \quad (64.7 \text{ to } 71.9) \quad \approx 95\% $$

$$ \mu \pm 3\sigma \quad (62.9 \text{ to } 73.7) \quad \approx 99.7\% $$

output(1)

Standardizing Data and Z-Scores

To compare values from different normal distributions or to work with a standardized form of a dataset, we can convert raw data values to z-scores.

Definition of Z-Score

The z-score is a way of describing a value in terms of how many standard deviations it is away from the mean. The formula for the z-score is:

$$ z = \frac{x - \mu}{\sigma} $$

Where:

A z-score tells us:

Example

Suppose a father is 71.9 inches tall. We want to find his z-score given that the mean height is 68.3 inches and the standard deviation is 1.8 inches.

$$ z = \frac{71.9 - 68.3}{1.8} = \frac{3.6}{1.8} = 2 $$

This means that a height of 71.9 inches is 2 standard deviations above the mean.

Conversely, if a father is 67.4 inches tall:

$$ z = \frac{67.4 - 68.3}{1.8} = \frac{-0.9}{1.8} = -0.5 $$

This means that a height of 67.4 inches is 0.5 standard deviations below the mean.

Standard Normal Distribution

After converting all values in a normal distribution to z-scores, we obtain the standard normal distribution, which has:

The standard normal distribution is often used in statistics because it allows for easy computation of probabilities and comparison across different datasets.

Finding Areas Under the Normal Curve

To find the proportion of data within a certain range, we can use z-scores to convert the raw data points and then look up the corresponding probabilities using a z-table or statistical software. The area under the normal curve between two z-scores represents the proportion of data that lies between those values.

Example

To find the proportion of fathers with heights between 67.4 inches and 71.9 inches, we first compute the z-scores for these heights:

For 67.4 inches:

$$ z_{67.4} = \frac{67.4 - 68.3}{1.8} = -0.5 $$

For 71.9 inches:

$$ z_{71.9} = \frac{71.9 - 68.3}{1.8} = 2 $$

Next, using a z-table (or software), we find the area to the left of these z-scores:

Thus, the proportion of fathers with heights between 67.4 and 71.9 inches is:

$$ P(67.4 \leq \text{height} \leq 71.9) = 0.9772 - 0.3085 = 0.6687 \approx 66.87\% $$

output(2)

Computing Percentiles

The percentile of a value in a normal distribution tells us the percentage of the data that is less than or equal to that value. To compute percentiles, we:

  1. Find the z-score corresponding to the desired percentile using a z-table or statistical software.
  2. Convert the z-score back into the raw data value.

Example

Suppose we want to compute the 30th percentile of fathers' heights. Using a z-table, we find that the z-score corresponding to the 30th percentile is approximately $z = -0.52$.

To find the corresponding height, we use the z-score formula in reverse:

$$ \text{height} = \mu + z\sigma = 68.3 + (-0.52)(1.8) = 68.3 - 0.936 = 67.364 \text{ inches}. $$

Thus, the 30th percentile corresponds to a height of approximately 67.36 inches.

This process can be applied to any percentile by finding the appropriate z-score and converting it back to the original scale using the mean and standard deviation of the dataset.

Table of Contents

    The Normal Curve and Z-Scores
    1. The Normal Curve
      1. Examples
    2. The Empirical Rule (68-95-99.7 Rule)
      1. Example
    3. Standardizing Data and Z-Scores
      1. Definition of Z-Score
      2. Example
      3. Standard Normal Distribution
    4. Finding Areas Under the Normal Curve
      1. Example
    5. Computing Percentiles
      1. Example