Last modified: May 11, 2025

This article is written in: 🇺🇸

Moments, Moment‐Generating Functions, and Their Connections

In both statistics and mechanics the word moment measures how much "leverage" the values of a quantity exert about a chosen reference point. In statistics the leverage is exerted by probability mass, in mechanics by physical mass, but the mathematics is identical: take a distance from the reference point, square or otherwise power it, weight it, and sum or integrate.

Statistical moments

For a real‐valued random variable X with mean μ=E[X], the nth raw moment is μn=E[Xn]. The nth central moment shifts the origin to the mean so that only the shape of the distribution enters:

μn=E[(Xμ)n].

The first central moment is always zero. The second central moment, μ2, is the familiar variance σ2=E[(Xμ)2]; its square root is the standard deviation. Higher‐order central moments refine the description: the third captures skewness (asymmetry) and the fourth describes kurtosis (tail weight).

The moment–generating function (MGF)

Whenever E[etX] exists in an open interval around t=0, we may define the moment–generating function

MX(t)=E[etX].

Expanding etX as a power series and exchanging expectation with the series term‐by‐term shows that MX collects every raw moment:

MX(t)=1+μ1t+μ22!t2+.

Consequently

μn=M(n)X(0)=dnMX(t)dtn|t=0

The mean is the first derivative at zero, and the variance is the second derivative minus the square of the first:

E[X]=M(1)X(0)

Var(X)=M(2)X(0)(M(1)X(0))2

Because differentiation is often easier than direct integration, the MGF is a powerful computational shortcut; it also uniquely determines the distribution whenever it exists in a neighbourhood of the origin.

output

What the figure shows

Because ˆM(t) is an unbiased estimator of MX(t), the scatter hugs the curve; sampling noise widens a bit as t1 where the variance of etX blows up. Zooming near t=0 (not shown) the slope of the curve equals the mean E[X]=1, its curvature gives E[X2]=2, and so on—illustrating how successive derivatives at t=0 generate the raw moments.

Mechanical moments: the moment of inertia

In classical mechanics the moment of inertia of a body about a fixed axis measures its resistance to rotational acceleration. For a continuous body with density ρ(r) the moment of inertia about the axis is

I=r2dm=r2ρ(r)dV,

where r is the perpendicular distance from the axis. For a system of point masses the integral reduces to the sum I=imir2i.

Variance and moment of inertia: a shared formula

Both variance and moment of inertia are weighted sums of squared distances. Replace mass mi with probability weight 1/N and the distance to the axis with the deviation from the mean, and the formulas coincide. This is why early statisticians, led by Karl Pearson, borrowed the mechanical term moment to describe sums of powered deviations in probability theory.

output(1)

σr=1nni=1

They play the same geometric role that \sigma does on a 1-D number line.

Squaring the radial distances and averaging (then halving in physics) gives the statistical moment of inertia—identical in algebra to the physical moment of inertia but with probability mass instead of physical mass. Visually, you can see that most points fall within the 1\sigma circle, and the spread of the cloud determines the circle’s radius.

Principal‑component analysis (PCA) as a "rotational" problem

Principal‑component analysis diagonalises the covariance matrix

\Sigma_{jk} = E\bigl[(X_j-\mu_j)(X_k-\mu_k)\bigr]

thereby finding orthogonal directions in which the variance—and hence the statistical moment of inertia—is extremal. In mechanics the same eigen‑problem arises when one seeks the axes about which a rigid body’s physical moment of inertia is minimal or maximal. PCA is thus the statistical analogue of aligning a rigid body with its principal axes.

output(2)

The long arrow shows where variance—and thus the statistical moment of inertia—is greatest; the short arrow shows the direction of minimal variance. Conceptually, PCA has “rotated” the coordinate system to align with these axes, mirroring how a rigid body is aligned with its principal axes of rotation to reveal its physical moments of inertia.

Historical remarks

Pafnuty Chebyshev and his students used powered deviations in probabilistic inequalities during the 1860s–1880s but did not employ the term moment. Karl Pearson introduced that name in an 1893 Nature letter and, in 1901, formalised the method of moments for parameter estimation by equating sample moments with their theoretical counterparts. The unifying physics analogy he invoked remains the standard intuition today.

Table of Contents

    Moments, Moment‐Generating Functions, and Their Connections
    1. Statistical moments
    2. The moment–generating function (MGF)
    3. Mechanical moments: the moment of inertia
    4. Variance and moment of inertia: a shared formula
    5. Principal‑component analysis (PCA) as a "rotational" problem
    6. Historical remarks