- Understand what can regression be used for
- Learn ho to select variables for explanative models
If you wish to make an apple pie from scratch, you must first invent the universe.
- Carl Sagan
Predictive models
Explanative (causal) models
Other differences include: choosing variable, evaluating model fit, choosing sample sizes and more…
For more details, see: Shmueli, G. (2010). To Explain or To Predict? (SSRN Scholarly Paper ID 1351252). Social Science Research Network. https://doi.org/10.2139/ssrn.1351252
Descriptive Models
Inferential models
Which model are you aiming for?
We are going to be mainly interested in explanative models.
A researcher is interested in the relationship between intelligence and work self-discipline among adults, but is short on funding.
Their collegue suggests using university students as their sample.
Predictive models
Goal: Estimate unseen observations as best as possible.
Training vs testing set, crossvalidation
Explanative models
Goal: Estimate model parameters as best as possible.
Adjusting for interfering variables, randomization, DAGs
Some fields can rely on randomization of treatment (e.g. drug testing). Social sciences generally can’t.
Strong focus on theory, with help of Directed Acyclic Graphs.
Does increasing knowledge about Covid raise the probability a person gets vaccinated?
A researcher is interested in the relationship between intelligence and work self-discipline among adults, but is short on funding.
Their collegue suggests using university students as their sample.
install.packages("dagitty") # for drawing DAGs in R install.packages("ggdag") # For making them in ggplot2
dagify(y ~ x + z + q , x ~ z, q ~ x, w ~ y + x) %>% ggdag() + theme_void() # to get rid of the backround
dagify(y ~ x + z + q , x ~ z, q ~ x, w ~ y + x) %>% ggplot(aes(x = x, xend = xend, y = y, yend = yend)) + geom_dag_edges() + geom_dag_point() + geom_text(aes(label = name), color = "white") + theme_void()
dagify(y ~ x + z + q , x ~ z, q ~ x, w ~ y + x, labels = c(y = "Vaccination\nProbability", x = "Covid\nKnowledge", z = "Soc-econ. status", q = "Perceived threat", w = "Hospitalization"), coords = list(x = c(y = 2, x = 1, z = 1.5, w = 1.5, q = 1.5), y = c(y = 1, x = 1, z = 1.15, w = 0.85, q = 0.94))) %>% ggplot(aes(x = x, xend = xend, y = y, yend = yend)) + geom_dag_edges() + geom_dag_point() + geom_text(aes(label = label), color = "red") + theme_void()
our_dag <- dagify(y ~ x + z + q , x ~ z, q ~ x, w ~ y + x, exposure = "x", outcome = "y") ggdag_adjustment_set(our_dag, shadow = T) + theme_void() # red ones are confounders
our_dag <- dagify(y ~ x + z + q , x ~ z, q ~ x, w ~ y + x, exposure = "x", outcome = "y") ggdag_collider(our_dag) + theme_void()