Last modified: July 28, 2024
This article is written in: 🇺🇸
Covariance is a fundamental statistical measure that quantifies the degree to which two random variables change together. It indicates the direction of the linear relationship between variables:
- A positive covariance implies that as one variable increases, the other tends to increase as well.
- A negative covariance suggests that as one variable increases, the other tends to decrease.
- A zero covariance indicates no linear relationship between the variables.
Definition
The covariance between two random variables X and Y is defined as the expected value (mean) of the product of their deviations from their respective means:
Cov(X,Y)=E[(X−μX)(Y−μY)]
Where:
- Cov(X,Y) is the covariance between X and Y.
- E denotes the expected value operator.
- μX=E[X] is the mean of X.
- μY=E[Y] is the mean of Y.
Alternative Expression
By expanding the definition and applying the linearity properties of expectation, covariance can also be expressed as:
Cov(X,Y)=E[XY]−E[X]E[Y]
Derivation:
- Start with the definition:
Cov(X,Y)=E[(X−μX)(Y−μY)]
- Expand the product inside the expectation:
Cov(X,Y)=E[XY−XμY−μXY+μXμY]
- Use the linearity of expectation:
Cov(X,Y)=E[XY]−μYE[X]−μXE[Y]+μXμY
- Recognize that μX=E[X] and μY=E[Y]:
Cov(X,Y)=E[XY]−μYμX−μXμY+μXμY=E[XY]−μXμY
Thus, we arrive at:
Cov(X,Y)=E[XY]−E[X]E[Y]
Interpretation
- Positive Covariance (Cov(X,Y)>0): Indicates that X and Y tend to increase or decrease together.
- Negative Covariance (Cov(X,Y)<0): Indicates that when X increases, Y tends to decrease, and vice versa.
- Zero Covariance (Cov(X,Y)=0): Suggests no linear relationship between X and Y.
Important Note:
- If X and Y are independent, then Cov(X,Y)=0.
- However, a covariance of zero does not necessarily imply independence. Variables can be uncorrelated (zero covariance) but still dependent in a non-linear way.
Properties of Covariance
I. Symmetry:
Cov(X,Y)=Cov(Y,X)
II. Linearity in Each Argument:
For constants a and b, and random variables X, Y, and Z:
Cov(aX+bY,Z)=aCov(X,Z)+bCov(Y,Z)
III. Covariance with Itself (Variance Relation):
The covariance of a variable with itself is the variance of that variable:
Cov(X,X)=Var(X)
IV. Scaling:
If a and b are constants:
Cov(aX,bY)=abCov(X,Y)
V. Addition of Constants:
Adding a constant to a variable does not affect the covariance:
Cov(X+c,Y)=Cov(X,Y)
VI. Relationship with Correlation:
Covariance is related to the correlation coefficient ρXY:
ρXY=Cov(X,Y)σXσY
Where σX and σY are the standard deviations of X and Y, respectively.
Sample Covariance
When working with sample data, the sample covariance between two variables X and Y is calculated as:
sXY=Cov(X,Y)=1n−1n∑i=1(Xi−ˉX)(Yi−ˉY)
Where:
- n is the number of observations.
- Xi and Yi are the i-th observations of variables X and Y.
- ˉX and ˉY are the sample means of X and Y.
Note: The denominator n−1 provides an unbiased estimate of the covariance for a sample drawn from a population.
Example: Calculating Covariance Step by Step
Let's calculate the covariance between two variables X and Y using the following dataset:
Observation (i) | Xi | Yi |
1 | 1 | 2 |
2 | 2 | 4 |
3 | 3 | 6 |
Step 1: Calculate the Sample Means
Compute the mean of X and Y:
ˉX=1nn∑i=1Xi=1+2+33=63=2
ˉY=1nn∑i=1Yi=2+4+63=123=4
Step 2: Compute the Deviations from the Mean
Calculate (Xi−ˉX) and (Yi−ˉY):
i | Xi | Yi | Xi−ˉX | Yi−ˉY |
1 | 1 | 2 | 1−2=−1 | 2−4=−2 |
2 | 2 | 4 | 2−2=0 | 4−4=0 |
3 | 3 | 6 | 3−2=1 | 6−4=2 |
Step 3: Calculate the Product of Deviations
Compute (Xi−ˉX)(Yi−ˉY):
i | Xi−ˉX | Yi−ˉY | (Xi−ˉX)(Yi−ˉY) |
1 | -1 | -2 | (−1)(−2)=2 |
2 | 0 | 0 | (0)(0)=0 |
3 | 1 | 2 | (1)(2)=2 |
Step 4: Sum the Products of Deviations
Compute the sum:
n∑i=1(Xi−ˉX)(Yi−ˉY)=2+0+2=4
Step 5: Calculate the Sample Covariance
Use the sample covariance formula:
sXY=Cov(X,Y)=1n−1n∑i=1(Xi−ˉX)(Yi−ˉY)
Since n=3:
sXY=13−1×4=12×4=2
Interpretation:
- The positive covariance of 2 indicates that X and Y tend to increase together.
- Since the data points lie perfectly on a straight line (Y=2X), the covariance reflects a perfect positive linear relationship.
Step 6: Calculate the Variances (Optional)
For completeness, calculate the variances of X and Y:
Variance of X
sXX=Var(X)=1n−1n∑i=1(Xi−ˉX)2
Compute (Xi−ˉX)2:
i | Xi−ˉX | (Xi−ˉX)2 |
1 | -1 | (−1)2=1 |
2 | 0 | (0)2=0 |
3 | 1 | (1)2=1 |
Sum:
n∑i=1(Xi−ˉX)2=1+0+1=2
Compute variance:
sXX=12×2=1
Variance of Y
Similarly, compute (Yi−ˉY)2:
i | Yi−ˉY | (Yi−ˉY)2 |
1 | -2 | (−2)2=4 |
2 | 0 | (0)2=0 |
3 | 2 | (2)2=4 |
Sum:
n∑i=1(Yi−ˉY)2=4+0+4=8
Compute variance:
sYY=Var(Y)=12×8=4
Step 7: Calculate the Correlation Coefficient (Optional)
The correlation coefficient rXY standardizes the covariance, providing a dimensionless measure of the strength and direction of the linear relationship:
rXY=sXY√sXX×sYY=2√1×4=22=1
Interpretation:
- A correlation coefficient of 1 indicates a perfect positive linear relationship between X and Y.
- This makes sense since Y=2X in the dataset.
Plot:
Limitations of Covariance
I. Scale Dependence:
- Covariance values are affected by the units of measurement of the variables.
- For example, measuring height in meters vs. centimeters will change the covariance.
II. Comparison Difficulties:
- Because covariance is not standardized, comparing covariances across different datasets or variables with different scales is challenging.
- This is why the correlation coefficient, which standardizes covariance, is often used.
III. Not a Measure of Strength:
- The magnitude of covariance does not directly indicate the strength of the relationship.
- A large covariance could be due to large variances rather than a strong relationship.
IV. Linear Relationships Only:
- Covariance measures only linear relationships.
- It does not capture non-linear dependencies between variables.