Last modified: December 17, 2022
This article is written in: 🇺🇸
In time series analysis, understanding the relationships between observations at different time lags is crucial for model identification and forecasting. Two essential tools for analyzing these relationships are the Autocorrelation Function (ACF) and the Partial Autocorrelation Function (PACF).
The Autocorrelation Function (ACF) measures the correlation between a time series and its lagged values. It helps detect patterns such as trends and seasonality. The autocorrelation at lag $k$, denoted $\rho_k$, is defined as:
$$ \rho_k = \frac{\gamma_k}{\gamma_0} $$
Where:
The autocovariance at lag $k$ is the covariance between observations separated by $k$ time periods. It is given by:
$$ \gamma_k = \text{Cov}(X_t, X_{t+k}) = \mathbb{E}[(X_t - \mu)(X_{t+k} - \mu)] $$
Where:
The autocorrelation coefficient at lag $k$ normalizes the autocovariance $\gamma_k$ by dividing it by the variance $\gamma_0$. It is a dimensionless quantity that ranges between -1 and 1, making it easier to interpret:
$$ \rho_k = \frac{\gamma_k}{\gamma_0} = \frac{\mathbb{E}[(X_t - \mu)(X_{t+k} - \mu)]}{\mathbb{E}[(X_t - \mu)^2]} $$
In practice, the ACF is estimated from the data using sample autocorrelations. The sample autocorrelation coefficient $r_k$ at lag $k$ is calculated as:
$$ r_k = \frac{\sum_{t=1}^{N-k} (x_t - \bar{x})(x_{t+k} - \bar{x})}{\sum_{t=1}^{N} (x_t - \bar{x})^2} $$
Where:
The Autocorrelation Function (ACF) plot, or Correlogram, is a useful tool for understanding the structure of time series data. In Python, you can generate and interpret the ACF plot using libraries like statsmodels
and matplotlib
. The ACF plot helps identify significant correlations at different lags and reveals patterns in the data.
e ACF plot can provide answers to the following questions:
the observed time series white noise / random? Is an observation related to an adjacent observation, an observation twice-removed, and so on? Can the observed time series be modeled with an MA model? If yes, what is the order?
Key Points for Interpreting the ACF Plot
Below is a Python example where we generate and plot the ACF for three different types of time series: one with a trend, one with seasonal patterns, and one following a moving average process.
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf
# code for simulating time series with trend and seasonality
np.random.seed(42)
N = 1000
# Example 1: Time Series with a stronger trend (Random Walk)
trend_series = np.cumsum(np.random.normal(1, 1, N)) # Random walk simulating a trend with positive drift
# Example 2: Time Series with clearer seasonality (less noise)
seasonal_series = np.sin(np.linspace(0, 20 * np.pi, N)) # A sine wave to emphasize seasonality
# Moving Average Process (MA(1)) remains the same
ma_series = np.random.normal(0, 1, N)
for i in range(1, N):
ma_series[i] += 0.5 * ma_series[i - 1] # Moving average with lag 1
# Plotting the time series
plt.figure(figsize=(12, 8))
plt.subplot(3, 1, 1)
plt.plot(trend_series, label="Time Series with Trend")
plt.title('Time Series with Trend')
plt.grid(True)
plt.subplot(3, 1, 2)
plt.plot(seasonal_series, label="Time Series with Seasonality")
plt.title('Time Series with Seasonality')
plt.grid(True)
plt.subplot(3, 1, 3)
plt.plot(ma_series, label="Moving Average (MA(1)) Process")
plt.title('Moving Average (MA(1)) Process')
plt.grid(True)
plt.tight_layout()
plt.show()
# Plotting ACF for each time series
plt.figure(figsize=(12, 8))
# ACF for the time series with trend
plt.subplot(3, 1, 1)
plot_acf(trend_series, lags=50, ax=plt.gca())
plt.title('ACF of Time Series with Trend')
# ACF for the time series with seasonality
plt.subplot(3, 1, 2)
plot_acf(seasonal_series, lags=50, ax=plt.gca())
plt.title('ACF of Time Series with Seasonality')
# ACF for the MA(1) process
plt.subplot(3, 1, 3)
plot_acf(ma_series, lags=50, ax=plt.gca())
plt.title('ACF of Moving Average (MA(1)) Process')
plt.tight_layout()
plt.show()
Time Series Data:
Acf plots:
Interpreting the ACF Plot:
The Partial Autocorrelation Function (PACF) measures the correlation between the time series and its lagged values, after removing the linear effects of the intermediate lags. It helps isolate the direct impact of each lag.
The PACF at lag $k$, denoted by $\phi_{kk}$, represents the correlation between $X_t$ and $X_{t+k}$, after accounting for the effect of $X_{t+1}, X_{t+2}, \dots, X_{t+k-1}$.
The Yule-Walker equations for an autoregressive (AR) process provide a recursive way to compute the PACF for different lags. For an AR(p) process:
$$ \gamma_k = \sum_{j=1}^{p} \phi_{pj} \gamma_{k-j} $$
Where $\phi_{pj}$ are the partial autocorrelation coefficients, and $\gamma_k$ is the autocovariance at lag $k$.
The PACF at lag $k$ can be recursively calculated as:
I. $\phi_{11} = \rho_1$
II. For $k \geq 2$:
$$ \phi_{kk} = \frac{\rho_k - \sum_{j=1}^{k-1} \phi_{k-1,j} \rho_{k-j}}{1 - \sum_{j=1}^{k-1} \phi_{k-1,j} \rho_j} $$
III. The intermediate coefficients $\phi_{kj}$ (for $j < k$) are updated using:
$$ \phi_{kj} = \phi_{k-1,j} - \phi_{kk} \phi_{k-1,k-j} $$
The Partial Autocorrelation Function (PACF) plot is a valuable tool for understanding the relationship between a time series and its lagged values after accounting for the influence of intervening lags. Unlike the ACF, which shows the correlation between the series and its lagged values, the PACF removes the effect of any intermediate lags.
The PACF is particularly useful for identifying the order of an Autoregressive (AR) process. If you suspect your time series follows an AR model, the PACF plot can help you determine the number of lag terms to include in your model.
he PACF plot can provide answers to the following questions:
Can the observed time series be modeled with an AR model? If yes, what is the order?
Key Points for Interpreting the PACF Plot:
In this example, we will simulate different time series data (AR, MA, and ARMA processes) and plot their PACF to see how they behave.
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima_process import ArmaProcess
from statsmodels.graphics.tsaplots import plot_pacf
# Example 1: Simulating an AR(2) process
np.random.seed(42)
ar2 = np.array([1, -0.75, 0.25]) # AR(2) coefficients (X_t = 0.75*X_t-1 - 0.25*X_t-2 + noise)
ma0 = np.array([1]) # No MA component
AR_process = ArmaProcess(ar2, ma0)
ar_series = AR_process.generate_sample(nsample=1000)
# Example 2: Simulating a Moving Average (MA) process
ma1 = np.array([1, 0.5]) # MA(1) coefficients
MA_process = ArmaProcess([1], ma1)
ma_series = MA_process.generate_sample(nsample=1000)
# Example 3: Simulating an ARMA(1,1) process
ar1 = np.array([1, 0.5]) # AR(1) coefficients
ma1 = np.array([1, -0.5]) # MA(1) coefficients
ARMA_process = ArmaProcess(ar1, ma1)
arma_series = ARMA_process.generate_sample(nsample=1000)
# Plotting the time series
plt.figure(figsize=(12, 8))
plt.subplot(3, 1, 1)
plt.plot(ar_series, label="AR(2) Process")
plt.title('AR(2) Process')
plt.grid(True)
plt.subplot(3, 1, 2)
plt.plot(ma_series, label="MA(1) Process")
plt.title('MA(1) Process')
plt.grid(True)
plt.subplot(3, 1, 3)
plt.plot(arma_series, label="ARMA(1,1) Process")
plt.title('ARMA(1,1) Process')
plt.grid(True)
plt.tight_layout()
plt.show()
# Plotting PACF for each time series
plt.figure(figsize=(12, 8))
# PACF for the AR(2) process
plt.subplot(3, 1, 1)
plot_pacf(ar_series, lags=30, ax=plt.gca())
plt.title('PACF of AR(2) Process')
# PACF for the MA(1) process
plt.subplot(3, 1, 2)
plot_pacf(ma_series, lags=30, ax=plt.gca())
plt.title('PACF of MA(1) Process')
# PACF for the ARMA(1,1) process
plt.subplot(3, 1, 3)
plot_pacf(arma_series, lags=30, ax=plt.gca())
plt.title('PACF of ARMA(1,1) Process')
plt.tight_layout()
plt.show()
Time Series Data:
Pacf plots:
Interpreting the PACF Plot:
In practice:
Consider the autoregressive process of order 1, denoted AR(1):
$$ X_t = \phi X_{t-1} + \epsilon_t $$
Where $\epsilon_t$ is white noise.
The autocorrelation function for an AR(1) process is:
$$ \rho_k = \phi^k $$
This implies that the autocorrelation decays exponentially with increasing lag $k$, showing a gradual decay in the ACF plot.
The partial autocorrelation function for an AR(1) process shows a significant spike at lag 1, followed by zeros at higher lags. This is because, for an AR(1) process, only the first lag has a direct effect, while higher lags are indirectly related to the series.
The following is using mock data for time series with short-term dependencies, specifically one that could be modeled as an AR(1) process. Common data types that show this behavior include financial data (such as stock prices or returns), economic indicators, or meteorological data (like temperature series).
In time series analysis, Auto-Regressive (AR) and Moving Average (MA) models are widely used for modeling and forecasting. Identifying the correct order of these models relies on interpreting the Autocorrelation Function (ACF) and the Partial Autocorrelation Function (PACF).
The AR model assumes that the current value of a time series ((y_t)) is a linear combination of its past values:
[ \hat{y_t} = \alpha_1 y_{t-1} + \alpha_2 y_{t-2} + \dots + \alpha_p y_{t-p} ]
The MA model assumes that the current value ((y_t)) is influenced by past error terms ((\epsilon_t)):
[ \hat{y_t} = \epsilon_t + \beta_1 \epsilon_{t-1} + \beta_2 \epsilon_{t-2} + \dots + \beta_q \epsilon_{t-q} ]
acf and pacf plots for AR(1): Autoregressive process of order 1. AR(2): Autoregressive process of order 2. MA(1): Moving average process of order 1. MA(2): Moving average process of order 2. Linear Growing: A simple deterministic increasing trend. Constant: A flat series with a constant value. Sine with Noise: A sinusoidal series with added noise. White Noise: A purely random series.