Complete topic guide
Econometrics
explained
A map of the core topics in econometrics, each explained concisely in plain English, with links to full guides and free video lectures. A good place to find the one concept you need, or to see how the subject fits together.
This guide explains the scope of econometrics and links core topics such as OLS, inference, time series, GMM, panel data and financial econometrics to the relevant tutoring pages and free resources.
Econometrics is the application of statistical methods to economic data: estimating relationships, testing theories, and forecasting. It rests on a sequence of ideas that build on each other. Below, each core topic is defined briefly, with links to fuller explanations where they exist.
Foundations of regression
Ordinary Least Squares (OLS)
OLS estimates a linear relationship between a dependent variable and one or more explanatory variables by choosing coefficients that minimise the sum of squared residuals, the squared vertical gaps between the observed data and the fitted line. It is the foundational method of econometrics. The estimated slope tells you how the dependent variable changes on average for a one-unit change in an explanatory variable, holding the others fixed.
The Gauss-Markov Theorem
Under the classical linear model assumptions (linearity, random sampling, no perfect collinearity, exogenous errors with zero conditional mean, and constant error variance), the OLS estimator is the Best Linear Unbiased Estimator, or BLUE. This means that among all estimators that are both linear in the data and unbiased, OLS has the smallest variance. It is the central justification for using OLS, and understanding which assumption fails in a given situation tells you what can go wrong and how to fix it.
The regression model in matrix form
Writing the regression as y = Xβ + ε, where y is the vector of outcomes, X the matrix of regressors, β the coefficient vector and ε the errors, makes the OLS estimator compact: β-hat = (X'X)-1X'y. This form is essential for deriving the estimator's properties, its variance, and the theory of multiple regression, and it is the language used throughout university econometrics.
Consistency and asymptotic normality
An estimator is consistent if it converges to the true parameter value as the sample size grows, justified by the law of large numbers. It is asymptotically normal if, suitably scaled, its sampling distribution approaches a normal distribution in large samples, justified by the central limit theorem. Together these properties are what allow us to construct confidence intervals and conduct hypothesis tests for OLS coefficients even without assuming the errors are normally distributed.
When assumptions fail
Heteroskedasticity
Heteroskedasticity is when the variance of the regression errors is not constant across observations, for instance when the spread of outcomes grows with the size of a variable. It does not bias the OLS coefficient estimates, but it makes the usual standard errors incorrect, so t-tests and confidence intervals become unreliable. The standard remedy is heteroskedasticity-robust (White) standard errors, or generalised least squares if the form of the variance is known.
Autocorrelation
Autocorrelation, or serial correlation, is when regression errors are correlated across observations, which is common in time series data where shocks persist. Like heteroskedasticity, it does not bias the coefficients but invalidates ordinary standard errors. The usual fix is HAC (heteroskedasticity and autocorrelation consistent), or Newey-West, standard errors, which remain valid in the presence of both problems.
Multicollinearity
Multicollinearity occurs when explanatory variables are highly correlated with each other. It does not violate the OLS assumptions or bias the estimates, but it inflates the variance of the affected coefficients, making them imprecise and their individual significance hard to establish even when the variables jointly matter. It is a data limitation rather than a modelling error, and is diagnosed using the variance inflation factor.
Measurement error and endogeneity
Endogeneity arises when an explanatory variable is correlated with the error term, whether through omitted variables, simultaneity, or measurement error in the regressor. It biases OLS and is the central threat to causal interpretation. Measurement error in an explanatory variable, in the classical case, biases its coefficient toward zero (attenuation bias). The standard solution to endogeneity is instrumental variables.
Time series econometrics
The complete time series guide (read in sequence)
- Part 1: Introduction, process vs realisation, white noise, stationarity, AR(1) and MA(1)
- Part 2: ARMA and the Wold decomposition, MA(q) and AR(p) properties, the lag operator, stationarity conditions
- Part 3: Estimation and model selection, estimating ARMA models, the sample ACF, diagnostics
- Reading the correlogram, using the sample ACF and PACF to identify models
- GARCH and volatility, modelling time-varying variance in financial data
Stationarity
A time series is weakly stationary if its mean and variance are constant over time and the covariance between two observations depends only on the gap between them. Stationarity is required for standard time series estimation and inference; applying ordinary regression to non-stationary series risks spurious results. Many economic series are non-stationary in levels but stationary in first differences or growth rates.
Spurious regression
When two unrelated non-stationary series both trend over time, regressing one on the other can produce a high R-squared and a highly significant coefficient despite there being no genuine relationship. This spurious regression problem is why testing for stationarity (and for unit roots) before regressing time series variables is essential, and why cointegration analysis exists for cases where a genuine long-run relationship does hold between non-stationary series.
ARIMA and forecasting
ARIMA (Autoregressive Integrated Moving Average) models combine autoregressive and moving average components with differencing to handle non-stationary series, and are a workhorse for univariate forecasting. Model selection balances fit against parsimony using information criteria such as AIC and BIC, and residual diagnostics such as the Ljung-Box test check that no predictable structure remains.
Advanced methods
Instrumental Variables (IV) and Two-Stage Least Squares
When an explanatory variable is endogenous, OLS is biased. Instrumental variables estimation uses an instrument, a variable correlated with the endogenous regressor but uncorrelated with the error, to recover the causal effect. Two-Stage Least Squares implements this by first predicting the endogenous variable from the instruments, then using that prediction in the main regression. The quality of an IV estimate depends critically on instrument strength and validity.
Generalised Method of Moments (GMM)
GMM is a general estimation framework that chooses parameters to make sample moment conditions as close to zero as possible. It nests OLS, IV and two-stage least squares as special cases and is central to modern econometrics, especially with many moment conditions or in dynamic panel models. Its efficiency depends on the weighting of the moment conditions, and subtle issues arise when the moment variance is singular, the subject of some of my own research.
Hypothesis testing and inference
Econometric inference asks whether estimated relationships are statistically distinguishable from zero or from some hypothesised value. The t-test assesses single coefficients, the F-test assesses joint restrictions on several coefficients, and the underlying logic, p-values, significance levels, and the difference between statistical and economic significance, applies throughout.
Maximum Likelihood Estimation
Maximum likelihood chooses parameter values that make the observed data most probable under an assumed distribution. It is the basis for estimating many models beyond linear regression, including limited dependent variable models (probit, logit), GARCH, and ARMA models with moving average components. Under regularity conditions the maximum likelihood estimator is consistent, asymptotically normal and asymptotically efficient.
Free video lectures
Many of these topics are covered in worked video lectures on the @economaths YouTube channel, including the Gauss-Markov theorem, the regression t-test, confidence intervals, heteroskedasticity detection and correction, autocorrelation and Newey-West standard errors, instrumental variables, and ARIMA estimation in R. See the research and videos page for the full list and embedded playlist.