The correlogram is a plot of the sample autocorrelation function (sample ACF) against lag. Each bar shows the estimated correlation between Y_t and Y_{t-k}. Bars outside the 95% confidence bands (±1.96/√T) are statistically significant. The shape of the correlogram identifies the model: a cut-off suggests MA, exponential decay suggests AR, neither cutting off suggests ARMA. The Ljung-Box Q statistic tests all correlations up to lag m jointly. AIC and BIC then select the final model order.
From theory to data
The previous article established what AR and ARMA processes look like in theory, their mean, variance, and autocorrelation function (ACF). The bridge from theory to data is the sample ACF: the estimated autocorrelations computed from an observed time series {y₁, y₂, …, yT}.
The process generating the data is unknown. What we observe is one realisation. Our aim is to use that realisation, through the correlogram and tests, to infer which ARMA model describes the process.
The sample autocorrelation function
ρ̂(k) = [ Σₜ₌ₖ₊₁ᵀ (yₜ − ȳ)(yₜ₋ₖ − ȳ) ] / [ Σₜ₌₁ᵀ (yₜ − ȳ)² ]
This is the sample analogue of the population autocorrelation ρ(k) = γ(k)/γ(0). It measures how strongly the series at time t co-moves with itself k periods earlier, estimated from the data. The correlogram plots ρ̂(k) for k = 1, 2, 3, … against k.
Are the sample autocorrelations statistically significant?
Under the null hypothesis that the true ρ(k) = 0 (no autocorrelation at lag k), the sample autocorrelation ρ̂(k) is approximately normally distributed with mean 0 and standard deviation 1/√T:
This gives 95% confidence bands at ±1.96/√T. Any bar in the correlogram that lies outside these bands is statistically significantly different from zero at the 5% level. In EViews, these bands are drawn automatically as horizontal dashed lines. For T = 100, the bands are ±0.196. For T = 250, they are ±0.124, narrower because a larger sample detects weaker dependence.
Important: These bands are only valid for testing one specific ρ(k) at a time. When looking at the full correlogram and checking 20 lags simultaneously, we expect approximately 20 × 0.05 = 1 bar to lie outside the bands by chance even if the process is truly white noise. This is why we also use a joint test, the Ljung-Box statistic.
Reading the shape: what are you looking for?
Three patterns matter when reading a correlogram:
Significant bars that cut off sharply. If bars are significant up to lag q and then all bars at higher lags lie within the confidence bands, this is consistent with an MA(q) process. The cut-off is the diagnostic, it is the defining theoretical property of the MA ACF.
Bars that decay gradually. Significant bars at low lags that decrease in size with each successive lag, eventually falling within the bands, is consistent with an AR process. The rate of decay reflects the AR coefficient: fast decay means a small |φ1|, slow decay means |φ1| close to 1.
Neither cuts off nor decays simply. Both the ACF and PACF tail off without a clean cut-off, this is the ARMA signature. You need both tools to identify an ARMA model, and often also information criteria.
The Ljung-Box portmanteau test
Rather than testing each lag individually, the Ljung-Box test asks whether the first m autocorrelations are jointly zero:
Q(m) = T(T+2) · Σₖ₌₁ᵐ ρ̂(k)² / (T−k)
Under H₀: ρ(1) = ρ(2) = … = ρ(m) = 0, the Q(m) statistic follows a chi-squared distribution with m degrees of freedom. We reject H₀ at significance level α if Q(m) > χ²m,α.
In practice: run the test for several values of m (m = 5, 10, 15 are common). If Q(m) is significant at all values of m, there is strong evidence of serial dependence in the data. If Q(m) applied to the residuals of an estimated ARMA model is not significant, the model has adequately captured the serial correlation in the data.
Common misuse: the Ljung-Box test is valid for testing the raw series for white noise, but when applied to residuals of an estimated ARMA(p,q) model, the degrees of freedom must be adjusted to m − p − q. Failure to do this leads to undersized tests, you reject white noise less often than you should.
Estimating ARMA models
Once the correlogram has suggested candidate model orders, we need to estimate the parameters.
AR models can be estimated by OLS, an AR(p) is just a linear regression of Yt on its own p lagged values. OLS is consistent and asymptotically normal under stationarity. This is why AR models are computationally straightforward to estimate and to understand.
MA and ARMA models require maximum likelihood estimation (MLE) because the lagged error terms εt-1, εt-2, … are not directly observed. EViews, Stata and R all implement this. The log-likelihood for the Gaussian case is:
where the εt are the model residuals, computed recursively from the data and estimated parameters. In practice, software handles this for you, but it is important to know that ARMA estimation is iterative and can sometimes converge to local maxima, so starting values matter.
Model selection: AIC and BIC
Having estimated several candidate ARMA models, we need a principled way to choose between them. Adding more AR or MA terms always improves fit (in-sample), but a more complex model may simply be fitting noise, not genuine structure. Information criteria penalise complexity.
AIC = −2·log L̂ + 2K
BIC = −2·log L̂ + K·log(T)
where L̂ is the maximised likelihood and K = p + q + 1 is the number of estimated parameters.
Both criteria reward a better fit (smaller −2 log L̂) and penalise complexity (larger K). The BIC penalty (K·log T) grows with the sample size, making it more conservative, it selects simpler models in large samples. The AIC penalty (2K) is fixed regardless of T, so it tends to select slightly larger models.
The rule: estimate all ARMA(p,q) models for p = 0, 1, …, pmax and q = 0, 1, …, qmax. Pick the model with the lowest AIC (or BIC). If the two criteria disagree, the preferred model depends on whether parsimony or predictive fit is the primary goal. BIC for parsimony, AIC for prediction.
Worked example: UK quarterly real GDP growth
UK quarterly real GDP growth from 1955Q1 to 2013Q2 (T = 234) provides a canonical example used in the ECON30401 PC Lab. The correlogram of GDP growth shows:
- A significant bar at lag 1 (ρ̂(1) ≈ 0.3, above the band ≈ ±0.13)
- A bar at lag 2 that is marginal (close to the band)
- All bars at lags 3+ within the confidence bands
- The Ljung-Box Q(5) statistic is significant (the first few autocorrelations are jointly non-zero)
This pattern is consistent with AR(1) or AR(2). Estimating both and comparing AIC values typically selects AR(1), the lag 2 bar is not significant enough to justify the extra parameter.
The estimated AR(1) model for UK GDP growth: ŷt = 0.48 + 0.31·yt-1. The coefficient 0.31 is statistically significant (t-statistic around 5). Applied to the residuals, Q(10) is not significant, the AR(1) has adequately removed the serial correlation.
The step-by-step procedure
- Plot the data. Look for obvious non-stationarity (trending mean or variance), seasonality, or structural breaks. If non-stationary, difference or detrend before proceeding.
- Plot the correlogram. Note which lags are significant. Does the ACF cut off (MA) or decay (AR)? Also plot the PACF, does it cut off (AR) or decay (MA)?
- Run the Ljung-Box test for several values of m. If Q(m) is not significant, the series may be white noise.
- Estimate candidate ARMA(p,q) models. Try p = 0, 1, 2 and q = 0, 1, 2. Use MLE for MA or ARMA components.
- Compare AIC and BIC. Select the model with the lowest information criterion. Prefer smaller models when AIC and BIC disagree.
- Check the residuals. Apply the Ljung-Box test to the residuals of the selected model. If significant serial correlation remains, increase p or q.
Related articles
← AR and ARMA processes explained, the theoretical background this article builds on.
Time series forecasting with AR models →, what to do with an estimated model.