Last updated in March 2022

Goals for today

  • Understanding the concept of multiple linear regression
  • Building and interpreting multiple linear regression in R

Multiple linear regression

  • More than one predictor term
  • Simple linear regression: \(Y = \alpha + \beta*X + \epsilon\)
  • Multiple linear regression: \(Y = \beta_0 + \beta_1*X_1 + + \beta_2*X_2 + ... + \epsilon\)

Multiple = more predictor terms

Note, more predictor terms does not necessarily mean more predictor variables!

Multiple regression: estimating more complex structures of relationships

  • Descriptive modeling: See effects of individual variables net of the effects of other variables in the model (AKA „when all the other variables are held constant")
  • Predictive modeling: Improve prediction, more variables = more information on the entry = usually better predictions (but risk of overfitting!)
  • Modeling for causal inference: adjusting for background variables based on theory, discovering potentially spurious relationships, isolating potentially causal effects

Interpreting coefficients in multivariate regression

  • “The coefficient \(\beta_k\) is the average or expected difference in outcome \(y_k\), comparing two people who differ by one unit in the predictor \(x_k\) while being equal in all the other predictors. This is sometimes stated in shorthand as comparing two people (or, more generally, two observational units) that differ in \(x_k\) with all the other predictors held constant” (Gelman et al., 2020, p. 131)

MLR coefficients as conditional (or additional)

  • Conditional effect: contingent on the other variables in the model
  • Additional effect: each coefficient represents additional effect of adding the variable in the model (all the other variables in the model are already accounted for)

Difference-based vs. change-based interpretations

  • Difference-based: “how the outcome variable differs, on average, when comparing two groups of items that differ by 1 in the relevant predictor while being identical in all the other predictors”
  • Change-based: “the coefficient is the expected change in y caused by adding 1 to the relevant predictor” (i.e. changes within individuals, rather than comparisons between individuals)

Difference-based interpretation is more cautious

To be on the save side, interpret regression coefficients as comparisons between units, not about changes within units, unless you specifically claim causality.

Beta-standardized coefficients

  • Used to determine relative weight of independent variables: Effect of an increase in X by one standard deviation on Y, also measured in standard deviations.
  • Standardize all variables to z-scores
  • Sometimes better: standardizing by subtracting the mean and dividing by 2 standard deviations (not 1) - direct comparability with untransformed binary predictors (gelman2008?)

References