Last modified: March 23, 2026
This article is written in: 🇺🇸
ARMA, ARIMA, and SARIMA are models commonly used to analyze and forecast time series data. ARMA (AutoRegressive Moving Average) combines two ideas: using past values to predict current ones (autoregression) and smoothing out noise using past forecast errors (moving average). ARIMA (AutoRegressive Integrated Moving Average) builds on ARMA by adding a step to handle trends in non-stationary data through differencing. SARIMA (Seasonal ARIMA) takes it a step further by accounting for repeating seasonal patterns. These models are practical and versatile for working with time series data that show trends, noise, or seasonal effects.
ARMA models combine autoregressive (AR) and moving average (MA) components to model time series data exhibiting both autocorrelation and serial dependence.
An ARMA($p, q$) model is defined by:
$$X_t = c + \sum_{i=1}^{p} \phi_i X_{t-i} + \epsilon_t + \sum_{j=1}^{q} \theta_j \epsilon_{t-j}$$
or, equivalently, using the backshift operator $B$:
$$\phi(B) X_t = c + \theta(B) \epsilon_t$$
where:
An AR($p$) process is stationary if all the roots of the characteristic polynomial $\phi(B) = 0$ lie outside the unit circle in the complex plane. This condition ensures that the time series has a constant mean and variance over time.
An MA($q$) process is invertible if all the roots of $\theta(B) = 0$ lie outside the unit circle. Invertibility allows the MA process to be expressed as an infinite AR process, ensuring a unique representation and facilitating parameter estimation.
An ARMA process is causal if it can be written as a convergent linear filter of current and past shocks:
$$ X_t = \sum_{j=0}^{\infty} \psi_j \epsilon_{t-j} $$
This holds when the AR polynomial has no roots on or inside the unit circle, ensuring the solution depends only on present and past innovations.
AR(∞) Representation of MA Processes:
An MA process can be expressed as an infinite-order AR process:
$$X_t = \sum_{k=1}^{\infty} \pi_k X_{t-k} + \epsilon_t$$
MA(∞) Representation of AR Processes:
An AR process can be expressed as an infinite-order MA process:
$$X_t = \sum_{k=0}^{\infty} \psi_k \epsilon_{t-k}$$
Consider the ARMA(1,1) model:
$$X_t = \phi X_{t-1} + \epsilon_t + \theta \epsilon_{t-1}$$
Let $\phi = 0.7$, $\theta = 0.2$, and $\epsilon_t$ is white noise.
Existence and causality (ARMA(1,1))
$$ X_t = \epsilon_t + (\theta + \phi)\sum_{j=1}^{\infty} \phi^{j-1} \epsilon_{t-j}. $$
$$ X_t = -\theta \phi^{-1} \epsilon_t + (\theta + \phi)\sum_{j=1}^{\infty} \phi^{-j-1} \epsilon_{t+j}. $$
Invertibility (ARMA(1,1))
To analyze this process, we simulate a large number of observations using statistical software (e.g., R or Python) to approximate its properties.
set.seed(500)
data <- arima.sim(n = 1e6, list(ar = 0.7, ma = 0.2))
AR(∞) Representation:
$$(1 - \phi B) X_t = (1 + \theta B) \epsilon_t$$
$$X_t = (1 - \phi B)^{-1} (1 + \theta B) \epsilon_t$$
$$X_t = [1 + \phi B + \phi^2 B^2 + \dots] (1 + \theta B) \epsilon_t$$
Multiplying the series:
$$X_t = [1 + (\phi + \theta) B + (\phi^2 + \phi \theta) B^2 + \dots] \epsilon_t$$
MA(∞) Representation:
$$X_t = \frac{1 + \theta B}{1 - \phi B} \epsilon_t = [1 + \psi_1 B + \psi_2 B^2 + \dots] \epsilon_t$$
Calculating $\psi$ coefficients:
$$\psi_k = \phi^k + \theta \phi^{k-1}$$
The autocorrelation function (ACF) for an ARMA(1,1) process is:
$$\rho_k = \phi^k \left( \frac{1 + \phi \theta}{1 + 2 \phi \theta + \theta^2} \right)$$
Calculations:
$$\rho_1 = 0.7 \left( \frac{1 + 0.7 \times 0.2}{1 + 2 \times 0.7 \times 0.2 + 0.2^2} \right) \approx 0.777$$
$$\rho_2 = 0.7 \times \rho_1 \approx 0.544$$
$$\rho_3 = 0.7 \times \rho_2 \approx 0.381$$
ARIMA models generalize ARMA models to include differencing, allowing them to model non-stationary time series data.
An ARIMA($p, d, q$) model is defined by:
$$\phi(B) (1 - B)^d X_t = c + \theta(B) \epsilon_t$$
where:
Synthetic differencing sequence:

Suppose we have a time series $X_t$ exhibiting an upward trend.
First-order differencing is applied to achieve stationarity:
$$Y_t = (1 - B) X_t = X_t - X_{t-1}$$
Analyzing the differenced series $Y_t$:
Assume ACF suggests MA(1) and PACF suggests AR(1).
Fit an ARIMA(1,1,1) model:
$$(1 - \phi B)(1 - B) X_t = c + (1 + \theta B) \epsilon_t$$
Estimate $\phi$, $\theta$, and $c$ using MLE.
The sample ACF and PACF plots provide characteristic patterns that help determine the orders $p$ and $q$:
| Model | ACF pattern | PACF pattern |
| AR($p$) | Tails off (decays exponentially or oscillates) | Cuts off after lag $p$ |
| MA($q$) | Cuts off after lag $q$ | Tails off (decays exponentially or oscillates) |
| ARMA($p, q$) | Tails off | Tails off |
When both ACF and PACF tail off gradually, an ARMA model is likely needed. If neither shows a clean cutoff, iterating over candidate $(p, q)$ pairs and comparing AIC or BIC values is a practical strategy.
Use the fitted model to forecast future values:
$$\hat{X}{t+h} = c + \phi \hat{X}{t+h-1} + \theta \hat{\epsilon}_{t+h-1}$$
Seasonal ARIMA (SARIMA) models are widely used for time series data exhibiting both trend and seasonal behaviors.
A SARIMA$(p, d, q)(P, D, Q)_s$ model incorporates both non-seasonal and seasonal factors:
$$\Phi_P(B^s) \phi_p(B) (1 - B^s)^D (1 - B)^d X_t = \Theta_Q(B^s) \theta_q(B) \epsilon_t$$
Non-seasonal components in time series models include several terms:
Seasonal components extend these concepts to capture repeating patterns:
The backshift operator is used to reference previous values in the series:
Model Equation:
$$(1 - \phi_1 B)(1 - \Phi_1 B^{12}) X_t = \epsilon_t$$
Model Equation:
$$(1 - B)(1 - B^4) X_t = (1 + \theta_1 B)(1 + \Theta_1 B^4) \epsilon_t$$
Non-seasonal Differencing:
$$(1 - B) X_t = X_t - X_{t-1}$$
Seasonal Differencing:
$$(1 - B^s) X_t = X_t - X_{t-s}$$
Combining Differencing:
$$(1 - B)(1 - B^s) X_t = X_t - X_{t-1} - X_{t-s} + X_{t-s-1}$$
First-order Seasonal Differencing:
$$\nabla_s X_t = X_t - X_{t-s}$$
Second-order Seasonal Differencing:
$$\nabla_s^2 X_t = X_t - 2 X_{t-s} + X_{t-2s}$$
The ACF of a SARIMA model displays patterns reflecting both seasonal and non-seasonal behavior.
Model Specification:
$$X_t = \epsilon_t + \theta_1 \epsilon_{t-1} + \Theta_1 \epsilon_{t-12} + \theta_1 \Theta_1 \epsilon_{t-13}$$
Parameters:
Error Term:
$\epsilon_t$: White noise with mean zero and variance $\sigma^2$
I. Variance ($\gamma_0$):
$$\gamma_0 = \text{Var}(X_t) = \sigma^2 \left(1 + \theta_1^2 + \Theta_1^2 + \theta_1^2 \Theta_1^2\right)$$
II. Covariance at Lag 1 ($\gamma_1$):
$$\gamma_1 = \text{Cov}(X_t, X_{t-1}) = \sigma^2 \theta_1 \left(1 + \Theta_1^2\right)$$
III. Covariance at Lag 12 ($\gamma_{12}$):
$$\gamma_{12} = \sigma^2 \Theta_1 \left(1 + \theta_1^2\right)$$
IV. Covariance at Lag 13 ($\gamma_{13}$):
$$\gamma_{13} = \sigma^2 \theta_1 \Theta_1 \left(1 + \theta_1 \Theta_1\right)$$
ACF at Lag $k$:
$$\rho_k = \frac{\gamma_k}{\gamma_0}$$
Consider a time series consisting of monthly airline passenger data. This particular time series is characterized by two distinct features: an upward trend indicating an increase in the number of passengers over time, and a seasonal pattern that repeats every 12 months, typically linked to factors such as holiday travel or seasonal tourism.
Steps to Model and Forecast the Time Series:
I. Decompose the Time Series into Trend, Seasonal, and Residual Components:
II. Detrend the Time Series Using Differencing or Transformation:
III. Fit an Appropriate Model that Accounts for Both Trend and Seasonality:
IV. Forecast Future Values Using the Fitted Model: