Last modified: May 11, 2025
This article is written in: 🇺🇸
Moments, Moment‐Generating Functions, and Their Connections
In both statistics and mechanics the word moment measures how much "leverage" the values of a quantity exert about a chosen reference point. In statistics the leverage is exerted by probability mass, in mechanics by physical mass, but the mathematics is identical: take a distance from the reference point, square or otherwise power it, weight it, and sum or integrate.
Statistical moments
For a real‐valued random variable X with mean μ=E[X], the nth raw moment is μ′n=E[Xn]. The nth central moment shifts the origin to the mean so that only the shape of the distribution enters:
μn=E[(X−μ)n].
The first central moment is always zero. The second central moment, μ2, is the familiar variance σ2=E[(X−μ)2]; its square root is the standard deviation. Higher‐order central moments refine the description: the third captures skewness (asymmetry) and the fourth describes kurtosis (tail weight).
The moment–generating function (MGF)
Whenever E[etX] exists in an open interval around t=0, we may define the moment–generating function
MX(t)=E[etX].
Expanding etX as a power series and exchanging expectation with the series term‐by‐term shows that MX collects every raw moment:
MX(t)=1+μ′1t+μ′22!t2+⋯.
Consequently
μ′n=M(n)X(0)=dnMX(t)dtn|t=0
The mean is the first derivative at zero, and the variance is the second derivative minus the square of the first:
E[X]=M(1)X(0)
Var(X)=M(2)X(0)−(M(1)X(0))2
Because differentiation is often easier than direct integration, the MGF is a powerful computational shortcut; it also uniquely determines the distribution whenever it exists in a neighbourhood of the origin.
What the figure shows
- Solid curve – the exact MGF MX(t)=λ/(λ−t) for an exponential with rate λ=1 (valid for t<λ=1).
- Dots – Monte-Carlo estimates ˆM(t)=1N∑Ni=1etXi from N=10,000 simulated draws.
Because ˆM(t) is an unbiased estimator of MX(t), the scatter hugs the curve; sampling noise widens a bit as t↑1 where the variance of etX blows up. Zooming near t=0 (not shown) the slope of the curve equals the mean E[X]=1, its curvature gives E[X2]=2, and so on—illustrating how successive derivatives at t=0 generate the raw moments.
Mechanical moments: the moment of inertia
In classical mechanics the moment of inertia of a body about a fixed axis measures its resistance to rotational acceleration. For a continuous body with density ρ(r) the moment of inertia about the axis is
I=∫r2dm=∫r2ρ(r)dV,
where r is the perpendicular distance from the axis. For a system of point masses the integral reduces to the sum I=∑imir2i.
Variance and moment of inertia: a shared formula
Both variance and moment of inertia are weighted sums of squared distances. Replace mass mi with probability weight 1/N and the distance to the axis with the deviation from the mean, and the formulas coincide. This is why early statisticians, led by Karl Pearson, borrowed the mechanical term moment to describe sums of powered deviations in probability theory.
- Points – a cloud of 200 synthetic observations drawn from a bivariate normal distribution (mean ≈ (2, −1), isotropic unit variance).
- × symbol – the sample mean ˆμ.
- Dashed circles – one and two “standard-deviation radii,” where
σr=1nn∑i=1‖
They play the same geometric role that \sigma does on a 1-D number line.
Squaring the radial distances and averaging (then halving in physics) gives the statistical moment of inertia—identical in algebra to the physical moment of inertia but with probability mass instead of physical mass. Visually, you can see that most points fall within the 1\sigma circle, and the spread of the cloud determines the circle’s radius.
Principal‑component analysis (PCA) as a "rotational" problem
Principal‑component analysis diagonalises the covariance matrix
\Sigma_{jk} = E\bigl[(X_j-\mu_j)(X_k-\mu_k)\bigr]
thereby finding orthogonal directions in which the variance—and hence the statistical moment of inertia—is extremal. In mechanics the same eigen‑problem arises when one seeks the axes about which a rigid body’s physical moment of inertia is minimal or maximal. PCA is thus the statistical analogue of aligning a rigid body with its principal axes.
- Crosses – 300 correlated points drawn from a 2-D normal distribution with unequal variances.
- Black arrows – the principal-component directions (eigenvectors of the sample covariance matrix) emanating from the sample mean.
- Labels – the eigenvalues \lambda_1 \approx 3.36 and \lambda_2 \approx 0.28, i.e. the variances captured along each axis.
The long arrow shows where variance—and thus the statistical moment of inertia—is greatest; the short arrow shows the direction of minimal variance. Conceptually, PCA has “rotated” the coordinate system to align with these axes, mirroring how a rigid body is aligned with its principal axes of rotation to reveal its physical moments of inertia.
Historical remarks
Pafnuty Chebyshev and his students used powered deviations in probabilistic inequalities during the 1860s–1880s but did not employ the term moment. Karl Pearson introduced that name in an 1893 Nature letter and, in 1901, formalised the method of moments for parameter estimation by equating sample moments with their theoretical counterparts. The unifying physics analogy he invoked remains the standard intuition today.