Lesson 3: Multiple linear regression

Last updated in March 2022

Goals for today

Understanding the concept of multiple linear regression
Building and interpreting multiple linear regression in R

Multiple linear regression

More than one predictor term
Simple linear regression: \(Y = \alpha + \beta*X + \epsilon\)
Multiple linear regression: \(Y = \beta_0 + \beta_1*X_1 + + \beta_2*X_2 + ... + \epsilon\)

Multiple = more predictor terms

Note, more predictor terms does not necessarily mean more predictor variables!

Multiple regression: estimating more complex structures of relationships

Descriptive modeling: See effects of individual variables net of the effects of other variables in the model (AKA „when all the other variables are held constant")
Predictive modeling: Improve prediction, more variables = more information on the entry = usually better predictions (but risk of overfitting!)
Modeling for causal inference: adjusting for background variables based on theory, discovering potentially spurious relationships, isolating potentially causal effects

Interpreting coefficients in multivariate regression

“The coefficient \(\beta_k\) is the average or expected difference in outcome \(y_k\), comparing two people who differ by one unit in the predictor \(x_k\) while being equal in all the other predictors. This is sometimes stated in shorthand as comparing two people (or, more generally, two observational units) that differ in \(x_k\) with all the other predictors held constant” (Gelman et al., 2020, p. 131)

MLR coefficients as conditional (or additional)

Conditional effect: contingent on the other variables in the model
Additional effect: each coefficient represents additional effect of adding the variable in the model (all the other variables in the model are already accounted for)

Difference-based vs. change-based interpretations

Difference-based: “how the outcome variable differs, on average, when comparing two groups of items that differ by 1 in the relevant predictor while being identical in all the other predictors”
Change-based: “the coefficient is the expected change in y caused by adding 1 to the relevant predictor” (i.e. changes within individuals, rather than comparisons between individuals)

Difference-based interpretation is more cautious

To be on the save side, interpret regression coefficients as comparisons between units, not about changes within units, unless you specifically claim causality.

Beta-standardized coefficients

Used to determine relative weight of independent variables: Effect of an increase in X by one standard deviation on Y, also measured in standard deviations.
Standardize all variables to z-scores
Sometimes better: standardizing by subtracting the mean and dividing by 2 standard deviations (not 1) - direct comparability with untransformed binary predictors (gelman2008?)

References

Gelman, A., Hill, J., & Vehtari, A. (2020). Regression and other stories. Cambridge University Press. https://doi.org/10.1017/9781139161879