- Learn how to evaluate model fit using R2 and ANOVA
- Learn how to compute them in R
- Learn about their limitations
Last updated in February 2021
(Almost) any model can be fitted to our data, but not all models will fit equally well
Three ways to evaluate model fit:
The proportion of variance of the depended variable, which can be predicted using the independent variables
Nothing about R2 is causal!
\[ R^{2} =1 - \frac{Sum \: of \: Squares_{residual} }{Sum \: of \: Squares_{total}} \]
Or perhaps in a more interpretable way:
\[ R^{2} =1 - \frac{Sum \: of \: Squares_{our\:model} }{Sum \: of \: Squares_{intercept\:only\:model}} \]
R2 tells us how much we reduced the prediction error by adding our predictors
if R2 = 0, then our model is as “good” as if we had no predictor at all
if R2 = 1, then we predict our data perfectly
There is no universal cut-off for when R2 is good or bad
We can also compare two models formally, using ANOVA/F test
Similar classic to ANOVA
Null hypothesis: All regression coefficients (except for intercept) are 0.
mod1 = lm(life_exp ~ hdi, data = countries) anova(mod1)
## Analysis of Variance Table ## ## Response: life_exp ## Df Sum Sq Mean Sq F value Pr(>F) ## hdi 1 184.65 184.654 63.047 2.441e-09 *** ## Residuals 35 102.51 2.929 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
mod1 = lm(life_exp ~ hdi, data = countries) mod2 = lm(life_exp ~ hdi + postsoviet, data = countries) anova(mod1, mod2)
## Analysis of Variance Table ## ## Model 1: life_exp ~ hdi ## Model 2: life_exp ~ hdi + postsoviet ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 35 102.509 ## 2 34 59.186 1 43.322 24.887 1.777e-05 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Nested models only
All the classic limitations of null hypothesis testing
We want to analyze the relationship between intelligence (IQ
) and work diligence (diligence
). We also know if our respondents have a university degree (degree
).
degree
is related to both IQ
and diligence
- only those who are top 20% most intelligent or the top 20% most diligent people will obtain a university degree
Should we control for degree
or not? For prediction? For explanation?
(Truth : intelligence = 0.1*diligence
, but let’s pretend we don’t know)
degree
leads to higher R2, but incorrect coefficient estimate!degree
actually provides better estimate of the relationship (remember, true value = 0.1)Consider variable \(x\) and 3 variables \(y_1\), \(y_2\), \(y_3\)
The relationship between \(x\) and all \(y_i\) is the same:
Each of \(y_i\) has a different standard deviation:
Even perfectly specified model (i.e. all relevant variables present, relationships set up correctly) can have low R2 due to random error
Low R2 does not necessarily mean the estimates are incorrect (biased)
Low R2 can simply mean that we cannot explain a social phenomenon in its entirety, but that is almost never our goal.
Conclusion: If our goal is substantive interpretation of coefficients, R2 is not a good measure of model’s quality
\[ R^{2}_{adj} = 1 - (1 - R^{2}) * \frac{no. \: of \: observations - 1}{no. \: of \: observations - no. \: of \: parameters - 1} \]
Adjusted R2 only increases when the contribution of a new predictor is bigger than what we would expect by chance
Conclusion: Use Adjusted R2 when you are comparing models with different number of predictors
R2 will naturally get lower as we restrict the range of independent variables
This does not mean the model is any less valid, just that predictive power is lower
Conclusion - Trimming data, either by filtering out subpopulations or removing outliers, will lower R2
Henri, etc T. (1961). Economic forecasts and policy (2nd edition). North-Holland Pub. Co.