Processing math: 100%

Lesson 3: Multiple linear regression

Jaromír Mazák & Aleš Vomáčka

Dpt. of Sociology, Faculty of Arts, Charles University

Last updated in March 2022

Goals for today

  • Understanding the concept of multiple linear regression
  • Building and interpreting multiple linear regression in R

Multiple linear regression

  • More than one predictor term
  • Simple linear regression: Y=α+βX+ϵ
  • Multiple linear regression: Y=β0+β1X1++β2X2+...+ϵ

Multiple = more predictor terms

Note, more predictor terms does not necessarily mean more predictor variables!

Multiple regression: estimating more complex structures of relationships

  • Descriptive modeling: See effects of individual variables net of the effects of other variables in the model (AKA „when all the other variables are held constant")
  • Predictive modeling: Improve prediction, more variables = more information on the entry = usually better predictions (but risk of overfitting!)
  • Modeling for causal inference: adjusting for background variables based on theory, discovering potentially spurious relationships, isolating potentially causal effects

Interpreting coefficients in multivariate regression

  • “The coefficient βk is the average or expected difference in outcome yk, comparing two people who differ by one unit in the predictor xk while being equal in all the other predictors. This is sometimes stated in shorthand as comparing two people (or, more generally, two observational units) that differ in xk with all the other predictors held constant” (Gelman et al., 2020, p. 131)

MLR coefficients as conditional (or additional)

  • Conditional effect: contingent on the other variables in the model
  • Additional effect: each coefficient represents additional effect of adding the variable in the model (all the other variables in the model are already accounted for)

Difference-based vs. change-based interpretations

  • Difference-based: “how the outcome variable differs, on average, when comparing two groups of items that differ by 1 in the relevant predictor while being identical in all the other predictors”
  • Change-based: “the coefficient is the expected change in y caused by adding 1 to the relevant predictor” (i.e. changes within individuals, rather than comparisons between individuals)

Difference-based interpretation is more cautious

To be on the save side, interpret regression coefficients as comparisons between units, not about changes within units, unless you specifically claim causality.

Beta-standardized coefficients

  • Used to determine relative weight of independent variables: Effect of an increase in X by one standard deviation on Y, also measured in standard deviations.
  • Standardize all variables to z-scores
  • Sometimes better: standardizing by subtracting the mean and dividing by 2 standard deviations (not 1) - direct comparability with untransformed binary predictors (gelman2008?)

References