Econometrics · Study notes · GMM

Generalized method of moments (GMM): an introduction

GMM is the framework that sits underneath OLS, instrumental variables and much of modern econometrics. These notes build it up from moment conditions, the criterion function and identification, through to the efficient two-step estimator and the optimal weight matrix.

Dr Nicky Grant · econometrics specialistEconometrics Study NotesIntermediate → Advanced

GMM estimates parameters by choosing them so that a set of sample moment conditions is as close to zero as possible. It nests OLS, IV and 2SLS as special cases, and the efficient two-step estimator weights the moments by the inverse of their variance to achieve the smallest asymptotic variance.

Econometrics notes: OLS | Instrumental variables | GMM | Identification-robust inference

How to read these notes

The generalized method of moments (GMM), introduced by Hansen (1982), is one of the unifying ideas of modern econometrics. It is best approached after OLS and instrumental variables, because both turn out to be special cases. These notes draw on the setup used in the author's research on GMM with singular moment variance, but keep the treatment at an intermediate-to-advanced level rather than the full research generality.

1. The starting point: moment conditions

Almost every estimator you have met can be written as the solution to a moment condition — a statement that the expectation of some function of the data and the parameter is zero at the true value. Formally, suppose we have data \(\{x_t\}_{t=1}^T\) and a known \(m\times 1\) moment function \(g(\cdot,\cdot)\) satisfying

E[ g(xt, β0) ] = 0

at the true parameter \(\beta_0\). This single equation is the engine of GMM. The moment function packages our economic and statistical assumptions into a set of expectations that should hold exactly at the truth.

Core idea

A moment condition says that, on average in the population, the moment function evaluated at the true parameter is zero. GMM estimates \(\beta_0\) by finding the value that makes the corresponding sample averages as close to zero as the data allow.

The sample counterpart of the population moment is the sample average

ĝT(β) = (1/T) Σ g(xt, β)

If we could set this exactly to zero we would, but in general there are more moments than parameters, so no value of \(\beta\) zeroes all of them simultaneously. GMM resolves this by minimising a weighted measure of how far the sample moments are from zero.

2. OLS, IV and 2SLS as moment conditions

The power of the framework comes from how many familiar estimators fit into it. Each corresponds to a particular choice of moment function.

EstimatorMoment conditionInterpretation
OLSE[xi ui] = 0Regressors uncorrelated with the error (exogeneity).
IV / 2SLSE[zi ui] = 0Instruments uncorrelated with the error.
Maximum likelihoodE[∂ log f / ∂β] = 0The score has mean zero at the truth.

In the regression model \(u_i=y_i-x_i'\beta\), so the OLS moment becomes \(E[x_i(y_i-x_i'\beta_0)]=0\) and the IV moment becomes \(E[z_i(y_i-x_i'\beta_0)]=0\). GMM treats all of these in one stroke: choose \(\beta\) to bring the sample version of the chosen moment as close to zero as possible.

3. The GMM criterion function

When there are more moments than parameters we cannot zero them all, so we measure their distance from zero with a quadratic form. The GMM estimator solves

β̂GMM = argminβ ĝT(β)′ WT ĝT(β)

where \(W_T\) is an \(m\times m\) positive definite weight matrix. The criterion is a weighted sum of squared sample moments. Different weight matrices give different estimators, and a central question of GMM is which weight matrix to use.

Identification

The model is just-identified when the number of moments equals the number of parameters (\(m=p\)); then GMM can set the sample moments exactly to zero and the weight matrix is irrelevant. It is over-identified when there are more moments than parameters (\(m>p\)); then the choice of weight matrix matters, and the leftover moments provide a specification check.

This is exactly why IV and 2SLS appear as special cases. With one instrument per endogenous regressor the model is just-identified and GMM reproduces the IV estimator. With more instruments than regressors it is over-identified, and efficient GMM generalises 2SLS.

4. The efficient two-step estimator and the optimal weight matrix

The weight matrix is not innocuous: it determines the efficiency of the estimator. The key result (Hansen, 1982) is that the asymptotically efficient choice is the inverse of the variance of the moments,

WT = Ω−1, Ω = limT→∞ Var( T−1/2 Σ g(xt, β0) )

Intuitively, moments that are noisy (high variance) should be trusted less, and moments that are precise should be weighted more heavily. Weighting by \(\Omega^{-1}\) does exactly this and delivers the smallest asymptotic variance in the class of GMM estimators.

The catch is that \(\Omega\) depends on the unknown \(\beta_0\). The standard solution is the two-step estimator, used throughout applied work:

Two-step GMM

Step 1: estimate \(\beta\) with a simple (sub-optimal but consistent) weight matrix such as the identity, and use it to form residuals.
Step 2: use those residuals to estimate the moment variance, \(\hat\Omega_T(\beta)=(1/T)\sum_t g_t(\beta)g_t(\beta)'\), and re-estimate \(\beta\) using \(\hat\Omega_T^{-1}\) as the weight matrix.

Under the identification condition and standard regularity assumptions, two-step GMM is consistent and \(\sqrt{T}(\hat\beta-\beta_0)\) is asymptotically normal at the usual root-\(T\) rate, with the smallest possible variance. The estimated moment variance is exactly the object \(\hat\Omega_T(\beta)\) and its expected first derivative \(\hat G_T\) that appear in the author's GMM research; in standard problems they behave well, but the research studies what happens when this variance matrix becomes (almost) singular.

5. The over-identification (J) test

When the model is over-identified, the leftover moment conditions give a free specification test. If the model and its moment conditions are correct, the minimised value of the efficient criterion function, scaled by the sample size, has a known limiting distribution:

J = T · ĝT(β̂)′ Ω̂T−1 ĝT(β̂) →d χ²(m−p)

The degrees of freedom equal the number of over-identifying restrictions, \(m-p\). A large J-statistic is evidence that at least one moment condition is invalid — in an IV context, that some instrument is not exogenous. This is the GMM generalisation of the idea that surplus moments should still hold at the truth.

6. Why GMM matters

GMM is valuable precisely because it asks so little. It does not require the full distribution of the data, unlike maximum likelihood; it only requires a credible set of moment conditions. That makes it the natural tool for models that are easy to specify in terms of moments but hard to specify in terms of a likelihood — dynamic panel models, rational-expectations and Euler-equation models in macro and finance, and over-identified IV systems.

It also exposes a deep unity: OLS, IV, 2SLS and many other estimators are not separate tricks but instances of one principle. Understanding GMM is therefore a turning point in a graduate econometrics course, and the entry point to research-level topics such as weak identification and — the subject of the author's own work — inference when the moment variance matrix is singular.

Econometrics tuition

GMM is one of the hardest topics in a graduate econometrics course. For 1-1 help with moment conditions, the efficient two-step estimator, the J-test or GMM coursework, see econometrics tuition or PhD econometrics tuition.

Free videos: the @economaths channel has worked videos on IV, the regression model in matrix form and asymptotics.

Related tuition

Need help with this topic?

For one-to-one help, choose the closest subject below or send a short enquiry with your level, topic and deadline.

Book consultation View all tutoring subjects