Last modified: December 23, 2023
This article is written in: 🇺🇸
Covariance is a fundamental statistical measure that quantifies the degree to which two random variables change together. It indicates the direction of the linear relationship between variables:
The covariance between two random variables $X$ and $Y$ is defined as the expected value (mean) of the product of their deviations from their respective means:
$$ \text{Cov}(X, Y) = \mathbb{E}\left[ (X - \mu_X)(Y - \mu_Y) \right] $$
Where:
By expanding the definition and applying the linearity properties of expectation, covariance can also be expressed as:
$$ \text{Cov}(X, Y) = \mathbb{E}[XY] - \mathbb{E}[X] \mathbb{E}[Y] $$
Derivation:
$$ \text{Cov}(X, Y) = \mathbb{E}\left[ (X - \mu_X)(Y - \mu_Y) \right] $$
$$ \text{Cov}(X, Y) = \mathbb{E}\left[ XY - X \mu_Y - \mu_X Y + \mu_X \mu_Y \right] $$
$$ \text{Cov}(X, Y) = \mathbb{E}[XY] - \mu_Y \mathbb{E}[X] - \mu_X \mathbb{E}[Y] + \mu_X \mu_Y $$
$$ \text{Cov}(X, Y) = \mathbb{E}[XY] - \mu_Y \mu_X - \mu_X \mu_Y + \mu_X \mu_Y = \mathbb{E}[XY] - \mu_X \mu_Y $$
Thus, we arrive at:
$$ \text{Cov}(X, Y) = \mathbb{E}[XY] - \mathbb{E}[X] \mathbb{E}[Y] $$
Important Note:
I. Symmetry:
$$ \text{Cov}(X, Y) = \text{Cov}(Y, X) $$
II. Linearity in Each Argument:
For constants $a$ and $b$, and random variables $X$, $Y$, and $Z$:
$$ \text{Cov}(aX + bY, Z) = a \text{Cov}(X, Z) + b \text{Cov}(Y, Z) $$
III. Covariance with Itself (Variance Relation):
The covariance of a variable with itself is the variance of that variable:
$$ \text{Cov}(X, X) = \text{Var}(X) $$
IV. Scaling:
If $a$ and $b$ are constants:
$$ \text{Cov}(aX, bY) = ab \text{Cov}(X, Y) $$
V. Addition of Constants:
Adding a constant to a variable does not affect the covariance:
$$ \text{Cov}(X + c, Y) = \text{Cov}(X, Y) $$
VI. Relationship with Correlation:
Covariance is related to the correlation coefficient $\rho_{XY}$:
$$ \rho_{XY} = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y} $$
Where $\sigma_X$ and $\sigma_Y$ are the standard deviations of $X$ and $Y$, respectively.
When working with sample data, the sample covariance between two variables $X$ and $Y$ is calculated as:
$$ s_{XY} = \text{Cov}(X, Y) = \frac{1}{n - 1} \sum_{i=1}^{n} (X_i - \bar{X})(Y_i - \bar{Y}) $$
Where:
Note: The denominator $n - 1$ provides an unbiased estimate of the covariance for a sample drawn from a population.
Let's calculate the covariance between two variables $X$ and $Y$ using the following dataset:
Observation ($i$) | $X_i$ | $Y_i$ |
1 | 1 | 2 |
2 | 2 | 4 |
3 | 3 | 6 |
Compute the mean of $X$ and $Y$:
$$ \bar{X} = \frac{1}{n} \sum_{i=1}^{n} X_i = \frac{1 + 2 + 3}{3} = \frac{6}{3} = 2 $$
$$ \bar{Y} = \frac{1}{n} \sum_{i=1}^{n} Y_i = \frac{2 + 4 + 6}{3} = \frac{12}{3} = 4 $$
Calculate $(X_i - \bar{X})$ and $(Y_i - \bar{Y})$:
$i$ | $X_i$ | $Y_i$ | $X_i - \bar{X}$ | $Y_i - \bar{Y}$ |
1 | 1 | 2 | $1 - 2 = -1$ | $2 - 4 = -2 $ |
2 | 2 | 4 | $2 - 2 = 0 $ | $4 - 4 = 0 $ |
3 | 3 | 6 | $3 - 2 = 1$ | $6 - 4 = 2 $ |
Compute $(X_i - \bar{X})(Y_i - \bar{Y})$:
$i$ | $X_i - \bar{X}$ | $Y_i - \bar{Y}$ | $(X_i - \bar{X})(Y_i - \bar{Y})$ |
1 | -1 | -2 | $(-1)(-2) = 2 $ |
2 | 0 | 0 | $(0)(0) = 0 $ |
3 | 1 | 2 | $(1)(2) = 2 $ |
Compute the sum:
$$ \sum_{i=1}^{n} (X_i - \bar{X})(Y_i - \bar{Y}) = 2 + 0 + 2 = 4 $$
Use the sample covariance formula:
$$ s_{XY} = \text{Cov}(X, Y) = \frac{1}{n - 1} \sum_{i=1}^{n} (X_i - \bar{X})(Y_i - \bar{Y}) $$
Since $n = 3 $:
$$ s_{XY} = \frac{1}{3 - 1} \times 4 = \frac{1}{2} \times 4 = 2 $$
Interpretation:
For completeness, calculate the variances of $X$ and $Y$:
$$ s_{XX} = \text{Var}(X) = \frac{1}{n - 1} \sum_{i=1}^{n} (X_i - \bar{X})^2 $$
Compute $(X_i - \bar{X})^2 $:
$i$ | $X_i - \bar{X}$ | $(X_i - \bar{X})^2 $ |
1 | -1 | $(-1)^2 = 1$ |
2 | 0 | $(0)^2 = 0 $ |
3 | 1 | $(1)^2 = 1$ |
Sum:
$$ \sum_{i=1}^{n} (X_i - \bar{X})^2 = 1 + 0 + 1 = 2 $$
Compute variance:
$$ s_{XX} = \frac{1}{2} \times 2 = 1 $$
Similarly, compute $(Y_i - \bar{Y})^2 $:
$i$ | $Y_i - \bar{Y}$ | $(Y_i - \bar{Y})^2 $ |
1 | -2 | $(-2)^2 = 4 $ |
2 | 0 | $(0)^2 = 0 $ |
3 | 2 | $(2)^2 = 4 $ |
Sum:
$$ \sum_{i=1}^{n} (Y_i - \bar{Y})^2 = 4 + 0 + 4 = 8 $$
Compute variance:
$$ s_{YY} = \text{Var}(Y) = \frac{1}{2} \times 8 = 4 $$
The correlation coefficient $r_{XY}$ standardizes the covariance, providing a dimensionless measure of the strength and direction of the linear relationship:
$$ r_{XY} = \frac{s_{XY}}{\sqrt{s_{XX} \times s_{YY}}} = \frac{2}{\sqrt{1 \times 4}} = \frac{2}{2} = 1 $$
Interpretation:
Plot:
I. Scale Dependence:
II. Comparison Difficulties:
III. Not a Measure of Strength:
IV. Linear Relationships Only: