Last updated in February 2021

Goals for today

  • Learn how to evaluate model fit using R2 and ANOVA
  • Learn how to compute them in R
  • Learn about their limitations

Model fit

  • (Almost) any model can be fitted to our data, but not all models will fit equally well

  • Three ways to evaluate model fit:

    • Checking assumptions the model makes using diagnostic plots (next lecture)
    • Fit indexes that summarize fit into a single number
    • Formal test of fit (ANOVA/F test)

Coefficient of determination (R2)

  • The proportion of variance of the depended variable, which can be predicted using the independent variables

    • e.g. if R2 = 0.32, we can say our model predicts 32% of variance of the dependent variable (in our data)
    • Alternatively, the depend variable shares 32% of its variance with the independent variables
  • Nothing about R2 is causal!

Coefficient of determination (R2)

  • Formally, R2 is defined as:

\[ R^{2} =1 - \frac{Sum \: of \: Squares_{residual} }{Sum \: of \: Squares_{total}} \]

Or perhaps in a more interpretable way:

\[ R^{2} =1 - \frac{Sum \: of \: Squares_{our\:model} }{Sum \: of \: Squares_{intercept\:only\:model}} \]

Coefficient of determination (R2)

  • Intercept only model is one with zero predictors, predictions are made solely based on the grand mean (mean of the dependent variable)
  • This is the “worst” possible model

Coefficient of determination

  • R2 tells us how much we reduced the prediction error by adding our predictors

  • if R2 = 0, then our model is as “good” as if we had no predictor at all

  • if R2 = 1, then we predict our data perfectly

  • There is no universal cut-off for when R2 is good or bad

    • In laboratory calibrations, R2 < 0.99 is considered bad and a sign of an equipment failure
    • In day to day stock market, R2 > 0.02 is considered good and such models are used for trading.

ANOVA

  • We can also compare two models formally, using ANOVA/F test

  • Similar classic to ANOVA

  • Null hypothesis: All regression coefficients (except for intercept) are 0.

ANOVA

ANOVA

  • We can compare against the intercept only model, i.e. is our model better than model with no predictors?
mod1 = lm(life_exp ~ hdi, data = countries)
anova(mod1)
## Analysis of Variance Table
## 
## Response: life_exp
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## hdi        1 184.65 184.654  63.047 2.441e-09 ***
## Residuals 35 102.51   2.929                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANOVA

  • We can also compare two of our models, i.e. does one predict better than the other?
mod1 = lm(life_exp ~ hdi, data = countries)
mod2 = lm(life_exp ~ hdi + postsoviet, data = countries)
anova(mod1, mod2)
## Analysis of Variance Table
## 
## Model 1: life_exp ~ hdi
## Model 2: life_exp ~ hdi + postsoviet
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1     35 102.509                                  
## 2     34  59.186  1    43.322 24.887 1.777e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R Intermezzo!

Limitations of R2 and ANOVA

Limitations of ANOVA for model comparison

  • Nested models only

  • All the classic limitations of null hypothesis testing

    • It is extremely unlikely for two models to “explain” the exact same amount of variance -> null hypothesis is almost always false by definition
    • Differences do not matter, if they are practically unimportant -> rejecting null hypothesis is by itself not particularly interesting
    • Power matters, just like with any other test -> not rejecting null hypothesis does not necessarily mean the two models predict the same amount of variance. We may just not have enough precision to identify the difference

Limitations of R2

  • R2 is fundamentally a measure of predictive strength
  • It may behave not intuitively when used for other than predictive modeling (and may mislead even for predictive modeling)
  • There are 4 “gotchas” you need to be careful about

Gotcha 1 - Model with higher R2 does not neccesarily provide a better estimate of regression coefficients

R2 and coefficient bias

  • We want to analyze the relationship between intelligence (IQ) and work diligence (diligence). We also know if our respondents have a university degree (degree).

  • degree is related to both IQ and diligence - only those who are top 20% most intelligent or the top 20% most diligent people will obtain a university degree

  • Should we control for degree or not? For prediction? For explanation?

  • (Truth : intelligence = 0.1*diligence, but let’s pretend we don’t know)

R2 and coefficient bias

  • Controlling for degree leads to higher R2, but incorrect coefficient estimate!
  • Not controlling for degree actually provides better estimate of the relationship (remember, true value = 0.1)

R2 and coefficient bias

  • Conclusion: If the goal of an analysis is the interpretation of regression coefficients, do not select variables based on R2.

Gotcha 2 - R2 depends on the variance of the dependent variable

R2 and the variance of dependent variable

  • Consider variable \(x\) and 3 variables \(y_1\), \(y_2\), \(y_3\)

  • The relationship between \(x\) and all \(y_i\) is the same:

    • \(y_i = 0 + 10*x\)
  • Each of \(y_i\) has a different standard deviation:

    • \(sd_{y_1} = 50\), \(sd_{y_2} = 100\) and \(sd_{y_3} = 150\)

R2 and the variance of dependent variable

  • Notice that R2 varies widely, despite all models being perfectly specified and describing the relationship correctly.

R2 and the variance of dependent variable

  • Even perfectly specified model (i.e. all relevant variables present, relationships set up correctly) can have low R2 due to random error

  • Low R2 does not necessarily mean the estimates are incorrect (biased)

  • Low R2 can simply mean that we cannot explain a social phenomenon in its entirety, but that is almost never our goal.

  • Conclusion: If our goal is substantive interpretation of coefficients, R2 is not a good measure of model’s quality

Gotcha 3 - R2 depends on the number of predictors

R2 depends on the number of predictors

  • Consider variable \(y\) and 15 variables \(x_i\) (\(x_1, \: x_2, \:... \: x_{15}\))
  • All of these variables are independent of each other
  • What happens to R2, if we start adding \(x_i\) variables as predictors?

R2 depends on the number of predictors

  • Notice that the more predictors in the model, the higher the R2, even if the dependent variable \(y\) is not related to any of the independent variables \(x_i\)

R2 depends on the number of predictors

  • R2 will increase (almost) every time we add a new predictor, because even if there is no correlation between two variables in the population, sample correlation will rarely be exactly 0 (due to sampling error)
  • Consequently, to some extent we are predicting random noise (this is the problem of overfitting in predictive models)

R2 depends on the number of predictors

  • Adjusted R2 (Henri, 1961) controls for the number of predictors in the model (represented by the degrees of freedom)

\[ R^{2}_{adj} = 1 - (1 - R^{2}) * \frac{no. \: of \: observations - 1}{no. \: of \: observations - no. \: of \: parameters - 1} \]

  • Adjusted R2 only increases when the contribution of a new predictor is bigger than what we would expect by chance

  • Conclusion: Use Adjusted R2 when you are comparing models with different number of predictors

Gotcha 4 - R2 depends on the range of independent variables

R2 depends on the range of independent variables

  • Consider variables \(y\) and \(x\)
  • \(x\) is normally distributed with mean of 50 and standard deviation of 15
  • The relationship between them is \(y = 3*x\) with residual standard deviation of 50
  • What would happen if we limited the range of \(x\) to <35;65> ?

R2 and the range of independent variable

  • Notice that despite the coefficients being (virtually) the same, R2 gets lower as the range of data gets narrower

R2 and the range of independent variable

  • R2 will naturally get lower as we restrict the range of independent variables

  • This does not mean the model is any less valid, just that predictive power is lower

    • e.g. model predicting attitudes based on income will have lower R2 in populations with smaller income differences (all else held constant).
  • Conclusion - Trimming data, either by filtering out subpopulations or removing outliers, will lower R2

Limitations of R2

  • To summarize:
  • If the goal is hypothesis testing or causal inference, R2 cannot be used to select which variables to include into the model. Doing so can be actively harmful
  • If the goal is to test a hypothesis or describe a relationship, R2 doesn’t indicate quality of the model
  • R2 has to be adjusted when comparing predictive power of models with different number of predictors
  • The value of R2 depends on the range of the independent variables

References

Henri, etc T. (1961). Economic forecasts and policy (2nd edition). North-Holland Pub. Co.