AR and ARMA Processes Explained. A Complete Guide

AR (autoregressive) and ARMA (autoregressive moving average) processes model how a time series variable depends on its own past values and past shocks. An AR(1) process sets Y_t = α + φ₁Y_{t-1} + ε_t. A stationary AR(1) has |φ₁| < 1, and its autocorrelation function decays as ρ(k) = φ₁^k. An ARMA(p,q) combines AR and MA components. This guide explains these processes, derives their properties, and shows how the ACF pattern differs across model types.

Process versus realisation

Before looking at any particular model, it is worth being precise about what we are modelling. A time series process Y_t (t = … −2, −1, 0, 1, 2, …) is the unknown data-generating mechanism. The sample realisation {y₁, y₂, …, y_T} is the single path of data we actually observe.

The Bank of England's monthly CPI data is one realisation of an unknown inflation process. Monthly IBM stock returns are one realisation of an unknown returns process. The aim of time series econometrics is to infer the properties of the underlying process from the sample realisation we observe.

This distinction matters because the same process generates different sample paths each time. If UK CPI could be "re-run" from 2000, we would observe a different sequence of monthly values, but all drawn from the same underlying process. Our one observed sample path should, under appropriate assumptions, reflect the properties of that process.

White noise and stationarity

The starting point for all ARMA models is the concept of white noise. A process ε_t is white noise with variance σ², written ε_t ~ WN(σ²), if:

E[ε_t] = 0 (zero mean) Var[ε_t] = σ² (constant variance) Cov(ε_t, ε_s) = 0 for t \neq s (no serial correlation)

White noise is the "unpredictable" component, the part of each observation that cannot be forecast from past observations. AR and ARMA processes are built by taking current and lagged values of white noise shocks and combining them with lagged values of the variable itself.

A process is covariance-stationary (or weakly stationary) if its mean, variance, and autocovariances do not depend on time t, only on the lag k. Formally:

E[Y_t] = μ (constant mean) Var[Y_t] = γ(0) < ∞ (finite, constant variance) Cov(Y_t, Y_{t-k}) = γ(k) (depends only on lag k)

Stationarity is what makes the ACF a well-defined, interpretable object. A non-stationary process, for example one whose variance grows over time, does not have a fixed ACF to plot or estimate.

The autocorrelation function (ACF)

The autocorrelation at lag k is:

ρ(k) = γ(k) / γ(0) = Cov(Y_t, Y_{t-k}) / Var[Y_t]

This is always between −1 and 1 (it is a correlation). The ACF plot, the correlogram, shows ρ(k) against k. It is the single most important diagnostic tool in time series analysis. The shape of the ACF is different for each class of process, which is what allows us to identify the appropriate model from data.

The AR(1) process

Definition. AR(1) process

Y_t = α + φ₁Y_t-1 + ε_t, ε_t ~ WN(σ²)

The current value Y_t depends on its immediately preceding value Y_t-1 and a white noise shock. The parameter φ₁ controls how strongly the past value influences the current one.

Stationarity condition

The AR(1) is stationary if and only if |φ₁| < 1. To see why, write the AR(1) out by substituting for lagged values repeatedly:

Y_t = α + φ₁Y_{t-1} + ε_t = α + φ₁(α + φ₁Y_{t-2} + ε_{t-1}) + ε_t = α(1 + φ₁) + φ₁²Y_{t-2} + φ₁ε_{t-1} + ε_t ⋮ = α/(1-φ₁) + Σⱼ₌₀^∞ φ₁ʲ ε_{t-j} [when |φ₁| < 1]

This is the MA(∞) representation of the stationary AR(1). The condition |φ₁| < 1 is needed for the sum Σφ₁^jε_t-j to converge, if |φ₁| ≥ 1, the influence of past shocks never dies out and the variance is infinite. When φ₁ = 1 we have a random walk, which is non-stationary.

Mean, variance and ACF

From the MA(∞) representation, the properties of the stationary AR(1) follow directly:

Mean: E[Y_t] = α / (1 - φ₁) Variance: Var[Y_t] = σ² / (1 - φ₁²) ACF: ρ(k) = φ₁ᵏ for k = 0, 1, 2, \dots

The mean and variance are both constant (confirming stationarity). The ACF has a clean closed form: it is φ₁ raised to the power k. This is the key identifying feature of an AR(1) in data.

The four panels below show the ACF for four different values of φ₁. Each panel shows ρ(k) against lag k for k = 0 to 20. The dashed lines are 95% confidence bands at ±1.96/√100, any bar outside these bands is statistically significantly different from zero at the 5% level for a sample of T = 100.

AR(1): φ₁ = 0.5, positive, moderate persistence

AR(1): φ₁ = 0.9, positive, high persistence

AR(1): φ₁ = −0.5, negative, alternating

AR(1): φ₁ = −0.9, negative, strong alternating

Figure 1. ACF of AR(1) for four values of φ₁. Copper bars = positive ACF; blue bars = negative ACF. Dashed lines: 95% confidence bands (±1.96/√100). As φ₁ increases toward 1, the ACF decays more slowly. A negative φ₁ produces an alternating sign pattern.

The four panels above are drawn directly from the lecture slides. Reading them:

φ₁ = 0.5: The ACF starts at 1 (lag 0) and decays quickly. By lag 3, the autocorrelation is 0.125, barely distinguishable from zero. The series has short memory.

φ₁ = 0.9: The ACF starts at 1 and decays very slowly, it is still 0.35 at lag 10 and 0.12 at lag 20. This is a highly persistent series. UK GDP growth is an example where moderate positive autocorrelation at low lags is typical.

φ₁ = −0.5: The ACF alternates in sign: positive at even lags, negative at odd lags, with the magnitude decaying. This means consecutive observations tend to reverse direction, if this week is above average, next week tends to be below average, but the week after tends to be above again.

φ₁ = −0.9: Strong alternating pattern. The series overshoots in both directions persistently, the ACF remains large in magnitude out to lag 10, alternating between strongly positive and strongly negative values.

The MA(q) process

Definition. MA(q) process

Y_t = α + ε_t + θ₁ε_t-1 + … + θ_qε_t-q, ε_t ~ WN(σ²)

A moving average process expresses Y_t as a linear combination of current and past white noise shocks. Unlike the AR process, there are no lagged values of Y_t itself, only lagged values of the shock ε_t.

The MA(q) process has a distinctive ACF: it is non-zero for lags 1 through q, and exactly zero for all lags beyond q. This is the key diagnostic signature, an ACF that "cuts off" sharply after lag q indicates an MA(q) process.

ACF of MA(q): ρ(k) = (Σˢ₌₀ᵍ⁻ᵏ θₛθₛ₊ₖ) / (Σˢ₌₀ᵍ θₛ²) for k = 1, \dots, q ρ(k) = 0 for k > q

For the simplest case, MA(1) with θ₁ = 0.8:

ρ(1) = θ₁ / (1 + θ₁²) = 0.8 / (1 + 0.64) = 0.488 ρ(k) = 0 for k \geq 2

The MA(q) is always stationary, regardless of the values of θ₁, …, θ_q. This is because it is a finite linear combination of stationary white noise, no stability condition is required.

MA(1): θ₁ = 0.8. ACF cuts off after lag 1

MA(1) with θ₁ = 0.8. The ACF is non-zero only at lag 1, then drops to exactly zero. This "cut-off" is the defining diagnostic of an MA process. Dashed lines show 95% confidence bands.

The ARMA(p,q) process

Definition. ARMA(p,q) process

Y_t = α + φ₁Y_t-1 + … + φ_pY_t-p + ε_t + θ₁ε_t-1 + … + θ_qε_t-q

The ARMA(p,q) is the general class that combines both AR and MA components. Setting q = 0 gives an AR(p). Setting p = 0 gives an MA(q). The ARMA(1,1), with both one AR and one MA term, is the most commonly used specification in practice.

Properties of the ARMA(1,1)

For Y_t = α + φ₁Y_t-1 + ε_t + θ₁ε_t-1:

Mean: E[Y_t] = α / (1 - φ₁) [same as AR(1)] Variance: Var[Y_t] = σ² (1 + 2φ₁θ₁ + θ₁²) / (1 - φ₁²) ACF: ρ(1) = (φ₁ + θ₁)(1 + φ₁θ₁) / (1 + 2φ₁θ₁ + θ₁²) ρ(k) = φ₁ \cdot ρ(k-1) for k \geq 2

The key point: the ACF of an ARMA(1,1) behaves like an AR(1) for lags k ≥ 2, it decays at rate φ₁^k. But the initial value ρ(1) is shifted by the MA component. This shift can make the ACF larger or smaller at lag 1 than a pure AR(1) would produce.

The stationarity condition is the same as for the AR(1): |φ₁| < 1. The MA component does not affect stationarity, only the AR part does.

The Wold Decomposition

A deeper result explains why AR processes can always be written as MA(∞). The Wold Decomposition Theorem states that any covariance-stationary process can be written as an MA(∞):

Y_t = μ + Σⱼ₌₀^\infty θⱼ εₜ₋ⱼ, θ₀ = 1, εₜ ~ WN(σ²)

where the θ_j coefficients satisfy a summability condition. This is why the AR(1) can be written as Σφ₁^jε_t-j, it is its Wold representation. The ARMA(1,1) has the MA(∞) form with coefficients that combine both the AR and MA parameters.

This result is foundational: it tells us that any stationary time series process, however complex its actual dynamics, can in principle be approximated arbitrarily well by a finite-order ARMA model.

How to tell them apart: the identification table

Process Definition ACF pattern

White noise ε_t ~ WN(σ²) All zeros for k ≥ 1. All bars within confidence bands.

MA(q) Finite weighted sum of shocks Non-zero for lags 1 to q, then cuts off to exactly zero. Sharp drop is the tell.

AR(1) Y_t = α + φ₁Y_t-1 + ε_t Decays exponentially as φ₁^k. Positive φ₁: smooth one-sided decay. Negative φ₁: alternating signs.

AR(p) Depends on p lags of Y_t Decays toward zero, can show oscillating patterns with complex roots. The PACF cuts off at lag p.

ARMA(p,q) Both AR and MA components Decays exponentially after lag q. No clean cut-off. Both ACF and PACF tail off, neither cuts off sharply.

Note on the PACF: The partial autocorrelation function (PACF) is the complement to the ACF for model identification. It measures correlation between Y_t and Y_t-k after removing the influence of the intermediate lags. An AR(p) has a PACF that cuts off after lag p, while an MA(q) has a PACF that tails off. In practice, you plot both to narrow down the model.

The general ARMA(p,q) stationarity condition

For an AR(p), and therefore for any ARMA(p,q), stationarity requires that the roots of the characteristic polynomial lie outside the unit circle. Writing the AR polynomial using the lag operator L (where LY_t = Y_t-1):

Φ(L) = 1 - φ₁L - φ₂L² - \dots - φₚLᵖ

Stationarity holds if and only if the roots of the polynomial equation Φ(z) = 0 all lie outside the unit circle in the complex plane (i.e., |z| > 1 for all solutions z).

For AR(1): 1 - φ₁z = 0 → z = 1/φ₁. This lies outside the unit circle iff |1/φ₁| > 1 iff |φ₁| < 1. Confirmed.

For AR(2): 1 - φ₁z - φ₂z² = 0 has two roots (which may be complex). Stationarity requires both to lie outside the unit circle. An AR(2) can produce more complex ACF patterns than AR(1), including damped oscillations, because complex roots produce cyclical autocorrelation.

A practical example: IBM monthly stock returns

Monthly IBM stock returns from 1991 to 2013 (T = 276) provide a natural example. The ACF of the raw return series shows no significant autocorrelation at any lag, the bars all lie within the ±1.96/√276 ≈ ±0.118 confidence bands. This is consistent with a white noise process: monthly stock returns are approximately unpredictable (as market efficiency would suggest).

However, the ACF of the squared returns, which measures volatility clustering, shows significant autocorrelations at many lags. High-volatility months tend to be followed by high-volatility months. This pattern motivates GARCH models, which are covered in a separate article.

If instead we were looking at the ACF of UK GDP growth, we would typically see one or two significant bars at low lags with exponential decay, consistent with a low-order AR process. This is why AR models are the standard starting point for macroeconomic forecasting.

Summary

The key results to retain from this article:

AR(1) is stationary iff |φ₁| < 1. Its ACF decays as ρ(k) = φ₁^k.
MA(q) is always stationary. Its ACF cuts off to zero after lag q.
ARMA(p,q) is stationary iff the AR roots lie outside the unit circle. Its ACF decays but does not cut off cleanly.
The Wold Decomposition guarantees any stationary process has an MA(∞) representation, this is why stationary AR processes can always be written as infinite MA processes.
In practice: look at the ACF and PACF together to identify model order. A cut-off in the ACF suggests MA. A cut-off in the PACF suggests AR. Neither cuts off cleanly. ARMA.

The next step from here is estimation: given a sample realisation, how do we estimate the parameters of an ARMA model? That involves the sample ACF, the Ljung-Box test, and information criteria such as AIC and BIC, covered in the article on how to read a correlogram and select a model.

Tuition in this topic

Time series econometrics. AR, ARMA, model selection, forecasting, is covered in depth in 1-1 online sessions with Dr Grant. The free initial consultation establishes your level and what you need. Book the free consultation →

AR and ARMA processes:a complete guide