Econometrics · Study notes · Foundations

Goodness of fit: R² and adjusted R²

R-squared is the number students reach for first when judging a regression — and the one most often misunderstood. These notes explain what it actually measures, why it always rises when you add variables, how the adjusted version corrects for that, and why a high R-squared is not the goal of empirical work.

Dr Nicky Grant · econometrics specialistEconometrics Study NotesUndergraduate

R-squared (the coefficient of determination) is the share of the variation in the dependent variable that the regression explains: explained sum of squares over total sum of squares, between 0 and 1. Because it never falls when you add a regressor, the adjusted R-squared applies a penalty for the number of parameters. A high R-squared means good in-sample fit — not that the model is correctly specified.

How to read these notes

These notes are for a student who has met OLS and wants to interpret the goodness-of-fit numbers that every regression package reports. We build R-squared from the sum-of-squares decomposition, explain the adjusted version, and — most importantly — explain what these numbers do and do not tell you.

1. Splitting the variation: TSS, ESS and RSS

Goodness of fit asks a simple question: how much of the variation in the dependent variable \(y\) has the regression explained? To answer it we split the total variation in \(y\) about its mean into two pieces. Writing \(\hat y_i\) for the fitted value and \(\bar y\) for the sample mean:

$$ \underbrace{\sum_i (y_i - \bar y)^2}_{\text{TSS}} = \underbrace{\sum_i (\hat y_i - \bar y)^2}_{\text{ESS}} + \underbrace{\sum_i (y_i - \hat y_i)^2}_{\text{RSS}} $$
The three sums of squares

TSS (total sum of squares) is the total variation in \(y\). ESS (explained sum of squares) is the part the model captures. RSS (residual sum of squares) is the leftover — the part the model fails to explain, the sum of squared residuals that OLS minimises.

This decomposition holds exactly for an OLS regression that includes an intercept. It says the variation we want to explain is neatly partitioned into "explained" and "unexplained" components.

2. R-squared: the coefficient of determination

R-squared is the explained share of the total:

$$ R^2 = \frac{\text{ESS}}{\text{TSS}} = 1 - \frac{\text{RSS}}{\text{TSS}} $$

It lies between 0 and 1. An \(R^2\) of 0 means the regressors explain none of the variation in \(y\) (the model does no better than the mean); an \(R^2\) of 1 means the model fits the data perfectly, with every residual zero. An \(R^2\) of 0.6 means the model accounts for 60% of the variation in the outcome.

Interpretation

R-squared is a measure of in-sample fit: how closely the fitted line tracks the data points you used to estimate it. It is not a measure of how correct the model is, nor of how well it will predict new data.

3. The problem: R-squared never falls

Here is the catch that makes raw R-squared dangerous for comparing models. Adding any regressor to a model — even a completely irrelevant one — can only reduce the residual sum of squares (or leave it unchanged), because OLS can always set the new coefficient to zero if it does not help. Since \(R^2 = 1 - \text{RSS}/\text{TSS}\) and TSS is fixed, this means:

$$ \text{adding a regressor} \ \Rightarrow\ R^2 \text{ cannot decrease} $$

So you can always inflate R-squared by throwing in more variables, whether or not they belong in the model. This makes raw R-squared useless for choosing between models with different numbers of regressors: the bigger model will always look at least as good, even if its extra variables are pure noise.

The same logic appears in time-series model selection: a richer ARMA model always fits the sample better, which is precisely why information criteria penalise the number of parameters. R-squared has no such penalty.

4. Adjusted R-squared: charging for parameters

The adjusted R-squared fixes this by deflating the fit by the number of parameters used. With \(n\) observations and \(k\) regressors (excluding the intercept):

$$ \bar R^2 = 1 - \frac{\text{RSS}/(n-k-1)}{\text{TSS}/(n-1)} $$

The numerator and denominator are now adjusted for their degrees of freedom. The effect is a trade-off: adding a variable lowers the RSS (which raises \(\bar R^2\)) but also spends a degree of freedom (which lowers it). The adjusted R-squared therefore only rises if the new variable improves fit by more than would be expected from a useless variable.

Key differences

Unlike \(R^2\), the adjusted \(\bar R^2\) can fall when you add a variable, and it can even be negative for a very poor model. Because it penalises complexity, it is the appropriate goodness-of-fit measure when comparing models with different numbers of regressors.

5. Why a high R-squared is not the goal

Students often treat maximising R-squared as the objective of empirical work. It is not. R-squared measures fit, not correctness, and the two come apart in both directions.

What you should care about is whether the model is correctly specified and the coefficients are credibly identified — questions addressed by the diagnostic tests and identification strategies, not by R-squared. R-squared is a useful descriptive summary; it is a poor objective.

6. A note on time series and forecasting

R-squared values tend to be much higher in time-series regressions than in cross-sections, because economic time series are persistent and trending — a model can track them closely without capturing anything causal. A "spurious regression" of one trending series on another can produce an R-squared near 1 with no genuine relationship at all. For model choice in time series, lean on the information criteria and residual diagnostics rather than R-squared.

Econometrics & statistics tuition

Interpreting regression output — R-squared, adjusted R-squared, coefficients and standard errors — is a skill that pays off in every empirical course. For 1-1 help, see econometrics tuition, statistics tuition or university economics tuition.

Related tuition

Need help with this topic?

For one-to-one help, choose the closest subject below or send a short enquiry with your level, topic and deadline.

Book consultation View all tutoring subjects