Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF)

In time series analysis, understanding the relationships between observations at different time lags is crucial for model identification and forecasting. Two essential tools for analyzing these relationships are the Autocorrelation Function (ACF) and the Partial Autocorrelation Function (PACF).

Autocorrelation Function (ACF)

The Autocorrelation Function (ACF) measures the correlation between a time series and its lagged values. It helps detect patterns such as trends and seasonality. The autocorrelation at lag $k$, denoted $\rho_k$, is defined as:

$$ \rho_k = \frac{\gamma_k}{\gamma_0} $$


Autocovariance Function

The autocovariance at lag $k$ is the covariance between observations separated by $k$ time periods. It is given by:

$$ \gamma_k = \text{Cov}(X_t, X_{t+k}) = \mathbb{E}[(X_t - \mu)(X_{t+k} - \mu)] $$


Autocorrelation Coefficient

The autocorrelation coefficient at lag $k$ normalizes the autocovariance $\gamma_k$ by dividing it by the variance $\gamma_0$. It is a dimensionless quantity that ranges between -1 and 1, making it easier to interpret:

$$ \rho_k = \frac{\gamma_k}{\gamma_0} = \frac{\mathbb{E}[(X_t - \mu)(X_{t+k} - \mu)]}{\mathbb{E}[(X_t - \mu)^2]} $$

Sample Autocorrelation Function

In practice, the ACF is estimated from the data using sample autocorrelations. The sample autocorrelation coefficient $r_k$ at lag $k$ is calculated as:

$$ r_k = \frac{\sum_{t=1}^{N-k} (x_t - \bar{x})(x_{t+k} - \bar{x})}{\sum_{t=1}^{N} (x_t - \bar{x})^2} $$


Plotting the ACF

The Autocorrelation Function (ACF) plot, or Correlogram, is a useful tool for understanding the structure of time series data. In Python, you can generate and interpret the ACF plot using libraries like statsmodels and matplotlib. The ACF plot helps identify significant correlations at different lags and reveals patterns in the data.

Key Points for Interpreting the ACF Plot

  1. If the ACF values decrease slowly over many lags, this suggests the presence of a trend in the data.
  2. Repeated peaks or cyclic behavior in the ACF plot indicate seasonal patterns in the data, with regular intervals of high correlation.
  3. A rapid drop-off or sharp cutoff after a few lags suggests the data may follow a Moving Average (MA) process, where the current value is explained by a few prior error terms (shocks).

Python Example

Below is a Python example where we generate and plot the ACF for three different types of time series: one with a trend, one with seasonal patterns, and one following a moving average process.

import numpy as np
import matplotlib.pyplot as plt
from import plot_acf

# code for simulating time series with trend and seasonality
N = 1000

# Example 1: Time Series with a stronger trend (Random Walk)
trend_series = np.cumsum(np.random.normal(1, 1, N))  # Random walk simulating a trend with positive drift

# Example 2: Time Series with clearer seasonality (less noise)
seasonal_series = np.sin(np.linspace(0, 20 * np.pi, N))  # A sine wave to emphasize seasonality

# Moving Average Process (MA(1)) remains the same
ma_series = np.random.normal(0, 1, N)
for i in range(1, N):
    ma_series[i] += 0.5 * ma_series[i - 1]  # Moving average with lag 1

# Plotting the time series
plt.figure(figsize=(12, 8))
plt.subplot(3, 1, 1)
plt.plot(trend_series, label="Time Series with Trend")
plt.title('Time Series with Trend')

plt.subplot(3, 1, 2)
plt.plot(seasonal_series, label="Time Series with Seasonality")
plt.title('Time Series with Seasonality')

plt.subplot(3, 1, 3)
plt.plot(ma_series, label="Moving Average (MA(1)) Process")
plt.title('Moving Average (MA(1)) Process')


# Plotting ACF for each time series
plt.figure(figsize=(12, 8))

# ACF for the time series with trend
plt.subplot(3, 1, 1)
plot_acf(trend_series, lags=50, ax=plt.gca())
plt.title('ACF of Time Series with Trend')

# ACF for the time series with seasonality
plt.subplot(3, 1, 2)
plot_acf(seasonal_series, lags=50, ax=plt.gca())
plt.title('ACF of Time Series with Seasonality')

# ACF for the MA(1) process
plt.subplot(3, 1, 3)
plot_acf(ma_series, lags=50, ax=plt.gca())
plt.title('ACF of Moving Average (MA(1)) Process')


Interpreting the ACF Plot:

Partial Autocorrelation Function (PACF)

The Partial Autocorrelation Function (PACF) measures the correlation between the time series and its lagged values, after removing the linear effects of the intermediate lags. It helps isolate the direct impact of each lag.

The PACF at lag $k$, denoted by $\phi_{kk}$, represents the correlation between $X_t$ and $X_{t+k}$, after accounting for the effect of $X_{t+1}, X_{t+2}, \dots, X_{t+k-1}$.

Yule-Walker Equations

The Yule-Walker equations for an autoregressive (AR) process provide a recursive way to compute the PACF for different lags. For an AR(p) process:

$$ \gamma_k = \sum_{j=1}^{p} \phi_{pj} \gamma_{k-j} $$

Where $\phi_{pj}$ are the partial autocorrelation coefficients, and $\gamma_k$ is the autocovariance at lag $k$.

Recursive Calculation of PACF

The PACF at lag $k$ can be recursively calculated as:

I. $\phi_{11} = \rho_1$

II. For $k \geq 2$:

$$ \phi_{kk} = \frac{\rho_k - \sum_{j=1}^{k-1} \phi_{k-1,j} \rho_{k-j}}{1 - \sum_{j=1}^{k-1} \phi_{k-1,j} \rho_j} $$

III. The intermediate coefficients $\phi_{kj}$ (for $j < k$) are updated using:

$$ \phi_{kj} = \phi_{k-1,j} - \phi_{kk} \phi_{k-1,k-j} $$

Plotting the PACF

The Partial Autocorrelation Function (PACF) plot is a valuable tool for understanding the relationship between a time series and its lagged values after accounting for the influence of intervening lags. Unlike the ACF, which shows the correlation between the series and its lagged values, the PACF removes the effect of any intermediate lags.

The PACF is particularly useful for identifying the order of an Autoregressive (AR) process. If you suspect your time series follows an AR model, the PACF plot can help you determine the number of lag terms to include in your model.

Key Points for Interpreting the PACF Plot:

  1. Significant spikes at early lags indicate that those specific lags are important for modeling the time series. For an AR(p) process, you will see significant spikes up to lag ( p ), and the PACF will then cut off.
  2. A sharp drop after lag ( p ) suggests that the time series follows an AR(p) process, meaning that only ( p ) past observations are needed to model the series.
  3. If the PACF plot exhibits a gradual decay, this indicates the presence of a Moving Average (MA) process, since partial correlations decrease slowly over many lags.

Python Example

In this example, we will simulate different time series data (AR, MA, and ARMA processes) and plot their PACF to see how they behave.

import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima_process import ArmaProcess
from import plot_pacf

# Example 1: Simulating an AR(2) process
ar2 = np.array([1, -0.75, 0.25])  # AR(2) coefficients (X_t = 0.75*X_t-1 - 0.25*X_t-2 + noise)
ma0 = np.array([1])  # No MA component
AR_process = ArmaProcess(ar2, ma0)
ar_series = AR_process.generate_sample(nsample=1000)

# Example 2: Simulating a Moving Average (MA) process
ma1 = np.array([1, 0.5])  # MA(1) coefficients
MA_process = ArmaProcess([1], ma1)
ma_series = MA_process.generate_sample(nsample=1000)

# Example 3: Simulating an ARMA(1,1) process
ar1 = np.array([1, 0.5])  # AR(1) coefficients
ma1 = np.array([1, -0.5])  # MA(1) coefficients
ARMA_process = ArmaProcess(ar1, ma1)
arma_series = ARMA_process.generate_sample(nsample=1000)

# Plotting the time series
plt.figure(figsize=(12, 8))
plt.subplot(3, 1, 1)
plt.plot(ar_series, label="AR(2) Process")
plt.title('AR(2) Process')

plt.subplot(3, 1, 2)
plt.plot(ma_series, label="MA(1) Process")
plt.title('MA(1) Process')

plt.subplot(3, 1, 3)
plt.plot(arma_series, label="ARMA(1,1) Process")
plt.title('ARMA(1,1) Process')


# Plotting PACF for each time series
plt.figure(figsize=(12, 8))

# PACF for the AR(2) process
plt.subplot(3, 1, 1)
plot_pacf(ar_series, lags=30, ax=plt.gca())
plt.title('PACF of AR(2) Process')

# PACF for the MA(1) process
plt.subplot(3, 1, 2)
plot_pacf(ma_series, lags=30, ax=plt.gca())
plt.title('PACF of MA(1) Process')

# PACF for the ARMA(1,1) process
plt.subplot(3, 1, 3)
plot_pacf(arma_series, lags=30, ax=plt.gca())
plt.title('PACF of ARMA(1,1) Process')


Interpreting the PACF Plot:

Comparing ACF and PACF

In practice:

Example: ACF and PACF for AR(1) Process

Consider the autoregressive process of order 1, denoted AR(1):

$$ X_t = \phi X_{t-1} + \epsilon_t $$

Where $\epsilon_t$ is white noise.

ACF for AR(1)

The autocorrelation function for an AR(1) process is:

$$ \rho_k = \phi^k $$

This implies that the autocorrelation decays exponentially with increasing lag $k$, showing a gradual decay in the ACF plot.

PACF for AR(1)

The partial autocorrelation function for an AR(1) process shows a significant spike at lag 1, followed by zeros at higher lags. This is because, for an AR(1) process, only the first lag has a direct effect, while higher lags are indirectly related to the series.

Visualization of ACF and PACF

The following is using mock data for time series with short-term dependencies, specifically one that could be modeled as an AR(1) process. Common data types that show this behavior include financial data (such as stock prices or returns), economic indicators, or meteorological data (like temperature series).


Left Plot: Autocorrelation Function (ACF)

Right Plot: Partial Autocorrelation Function (PACF)

