Last modified: December 23, 2023
This article is written in: 🇺🇸
Covariance is a fundamental statistical measure that quantifies the degree to which two random variables change together. It indicates the direction of the linear relationship between variables:
- A positive covariance implies that as one variable increases, the other tends to increase as well.
- A negative covariance suggests that as one variable increases, the other tends to decrease.
- A zero covariance indicates no linear relationship between the variables.
Definition
The covariance between two random variables and is defined as the expected value (mean) of the product of their deviations from their respective means:
Where:
- is the covariance between and .
- denotes the expected value operator.
- is the mean of .
- is the mean of .
Alternative Expression
By expanding the definition and applying the linearity properties of expectation, covariance can also be expressed as:
Derivation:
- Start with the definition:
- Expand the product inside the expectation:
- Use the linearity of expectation:
- Recognize that and :
Thus, we arrive at:
Interpretation
- Positive Covariance (): Indicates that and tend to increase or decrease together.
- Negative Covariance (): Indicates that when increases, tends to decrease, and vice versa.
- Zero Covariance (): Suggests no linear relationship between and .
Important Note:
- If and are independent, then .
- However, a covariance of zero does not necessarily imply independence. Variables can be uncorrelated (zero covariance) but still dependent in a non-linear way.
Properties of Covariance
I. Symmetry:
II. Linearity in Each Argument:
For constants and , and random variables , , and :
III. Covariance with Itself (Variance Relation):
The covariance of a variable with itself is the variance of that variable:
IV. Scaling:
If and are constants:
V. Addition of Constants:
Adding a constant to a variable does not affect the covariance:
VI. Relationship with Correlation:
Covariance is related to the correlation coefficient :
Where and are the standard deviations of and , respectively.
Sample Covariance
When working with sample data, the sample covariance between two variables and is calculated as:
Where:
- is the number of observations.
- and are the -th observations of variables and .
- and are the sample means of and .
Note: The denominator provides an unbiased estimate of the covariance for a sample drawn from a population.
Example: Calculating Covariance Step by Step
Let's calculate the covariance between two variables and using the following dataset:
Observation () | ||
1 | 1 | 2 |
2 | 2 | 4 |
3 | 3 | 6 |
Step 1: Calculate the Sample Means
Compute the mean of and :
Step 2: Compute the Deviations from the Mean
Calculate and :
1 | 1 | 2 | ||
2 | 2 | 4 | ||
3 | 3 | 6 |
Step 3: Calculate the Product of Deviations
Compute :
1 | -1 | -2 | |
2 | 0 | 0 | |
3 | 1 | 2 |
Step 4: Sum the Products of Deviations
Compute the sum:
Step 5: Calculate the Sample Covariance
Use the sample covariance formula:
Since :
Interpretation:
- The positive covariance of indicates that and tend to increase together.
- Since the data points lie perfectly on a straight line (), the covariance reflects a perfect positive linear relationship.
Step 6: Calculate the Variances (Optional)
For completeness, calculate the variances of and :
Variance of
Compute :
1 | -1 | |
2 | 0 | |
3 | 1 |
Sum:
Compute variance:
Variance of
Similarly, compute :
1 | -2 | |
2 | 0 | |
3 | 2 |
Sum:
Compute variance:
Step 7: Calculate the Correlation Coefficient (Optional)
The correlation coefficient standardizes the covariance, providing a dimensionless measure of the strength and direction of the linear relationship:
Interpretation:
- A correlation coefficient of indicates a perfect positive linear relationship between and .
- This makes sense since in the dataset.
Plot:
Limitations of Covariance
I. Scale Dependence:
- Covariance values are affected by the units of measurement of the variables.
- For example, measuring height in meters vs. centimeters will change the covariance.
II. Comparison Difficulties:
- Because covariance is not standardized, comparing covariances across different datasets or variables with different scales is challenging.
- This is why the correlation coefficient, which standardizes covariance, is often used.
III. Not a Measure of Strength:
- The magnitude of covariance does not directly indicate the strength of the relationship.
- A large covariance could be due to large variances rather than a strong relationship.
IV. Linear Relationships Only:
- Covariance measures only linear relationships.
- It does not capture non-linear dependencies between variables.