Last modified: May 11, 2025
This article is written in: đșđž
Moments, MomentâGenerating Functions, and Their Connections
In both statistics and mechanics the word moment measures how much "leverage" the values of a quantity exert about a chosen reference point. In statistics the leverage is exerted by probability mass, in mechanics by physical mass, but the mathematics is identical: take a distance from the reference point, square or otherwise power it, weight it, and sum or integrate.
Statistical moments
For a realâvalued random variable $X$ with mean $\mu = E[X]$, the nth raw moment is $\mu'_n = E[X^{n}]$. The $n$th central moment shifts the origin to the mean so that only the shape of the distribution enters:
$$ \mu_n = E[(X-\mu)^{n}]. $$
The first central moment is always zero. The second central moment, $\mu_2$, is the familiar variance $\sigma^{2} = E[(X-\mu)^2]$; its square root is the standard deviation. Higherâorder central moments refine the description: the third captures skewness (asymmetry) and the fourth describes kurtosis (tail weight).
The momentâgenerating function (MGF)
Whenever $E[e^{tX}]$ exists in an open interval around $t = 0$, we may define the momentâgenerating function
$$ M_X(t) = E[e^{tX}]. $$
Expanding $e^{tX}$ as a power series and exchanging expectation with the series termâbyâterm shows that $M_X$ collects every raw moment:
$$ M_X(t) = 1 + \mu'_1 t + \frac{\mu'_2}{2!}t^2 + \cdots. $$
Consequently
$$ \mu_n' = M_X^{(n)}(0) = \frac{d^n M_X(t)}{dt^n}\big|_{t=0} $$
The mean is the first derivative at zero, and the variance is the second derivative minus the square of the first:
$$ E[X] = M_X^{\,(1)}(0) $$
$$ Var(X) = M_X^{\,(2)}(0) - \bigl(M_X^{\,(1)}(0)\bigr)^2 $$
Because differentiation is often easier than direct integration, the MGF is a powerful computational shortcut; it also uniquely determines the distribution whenever it exists in a neighbourhood of the origin.
What the figure shows
- Solid curve â the exact MGF $M_X(t)=\lambda/(\lambda-t)$ for an exponential with rate $\lambda=1$ (valid for $t<\lambda=1$).
- Dots â Monte-Carlo estimates $\hat M(t)=\frac1N\sum_{i=1}^N e^{tX_i}$ from $N=10{,}000$ simulated draws.
Because $\hat M(t)$ is an unbiased estimator of $M_X(t)$, the scatter hugs the curve; sampling noise widens a bit as $t\uparrow1$ where the variance of $e^{tX}$ blows up. Zooming near $t=0$ (not shown) the slope of the curve equals the mean $E[X]=1$, its curvature gives $E[X^2]=2$, and so onâillustrating how successive derivatives at $t=0$ generate the raw moments.
Mechanical moments: the moment of inertia
In classical mechanics the moment of inertia of a body about a fixed axis measures its resistance to rotational acceleration. For a continuous body with density $\rho(\mathbf r)$ the moment of inertia about the axis is
$$ I = \int r^{2} \, dm = \int r^{2}\, \rho(\mathbf r)\, dV, $$
where $r$ is the perpendicular distance from the axis. For a system of point masses the integral reduces to the sum $I = \sum_i m_i r_i^{2}$.
Variance and moment of inertia: a shared formula
Both variance and moment of inertia are weighted sums of squared distances. Replace mass $m_i$ with probability weight $1/N$ and the distance to the axis with the deviation from the mean, and the formulas coincide. This is why early statisticians, led by Karl Pearson, borrowed the mechanical term moment to describe sums of powered deviations in probability theory.
- Points â a cloud of 200 synthetic observations drawn from a bivariate normal distribution (mean â (2, â1), isotropic unit variance).
- Ă symbol â the sample mean $\hat\mu$.
- Dashed circles â one and two âstandard-deviation radii,â where
$$ \sigma_r = \sqrt{\tfrac1n\sum_{i=1}^n\|X_i-\hat\mu\|^2} $$
They play the same geometric role that $\sigma$ does on a 1-D number line.
Squaring the radial distances and averaging (then halving in physics) gives the statistical moment of inertiaâidentical in algebra to the physical moment of inertia but with probability mass instead of physical mass. Visually, you can see that most points fall within the $1\sigma$ circle, and the spread of the cloud determines the circleâs radius.
Principalâcomponent analysis (PCA) as a "rotational" problem
Principalâcomponent analysis diagonalises the covariance matrix
$$ \Sigma_{jk} = E\bigl[(X_j-\mu_j)(X_k-\mu_k)\bigr] $$
thereby finding orthogonal directions in which the varianceâand hence the statistical moment of inertiaâis extremal. In mechanics the same eigenâproblem arises when one seeks the axes about which a rigid bodyâs physical moment of inertia is minimal or maximal. PCA is thus the statistical analogue of aligning a rigid body with its principal axes.
- Crosses â 300 correlated points drawn from a 2-D normal distribution with unequal variances.
- Black arrows â the principal-component directions (eigenvectors of the sample covariance matrix) emanating from the sample mean.
- Labels â the eigenvalues $\lambda_1 \approx 3.36$ and $\lambda_2 \approx 0.28$, i.e. the variances captured along each axis.
The long arrow shows where varianceâand thus the statistical moment of inertiaâis greatest; the short arrow shows the direction of minimal variance. Conceptually, PCA has ârotatedâ the coordinate system to align with these axes, mirroring how a rigid body is aligned with its principal axes of rotation to reveal its physical moments of inertia.
Historical remarks
Pafnuty Chebyshev and his students used powered deviations in probabilistic inequalities during the 1860sâ1880s but did not employ the term moment. Karl Pearson introduced that name in an 1893 Nature letter and, in 1901, formalised the method of moments for parameter estimation by equating sample moments with their theoretical counterparts. The unifying physics analogy he invoked remains the standard intuition today.