Learn how to model model nonlinearity using using:
- categorization
- Simple polynomials
- Linear and natural splines
Last updated in April 2021
Learn how to model model nonlinearity using using:
mod1 = lm(vote ~ agea, data = vote)
How can we change our model to capture the nonlinear relationship?
Three popular options:
Most basic way of dealing with nonlinearity
There are many different ways a numerical variable can be cut into categories:
The main advantage of categorizations is that the output is easily interpretable
However, there are many technical drawbacks (Harrell, 2001):
Estimated values will have reduced precision, and associated tests will have reduced power
Categorization assumes that the relationship between the predictor and the response is flat within intervals
Categorization assumes that there is a discontinuity in response as interval boundaries are crossed.
Cutpoints are arbitrary and manipulatable; cutpoints can be found that can result in both positive and negative associations
\[ vote = \beta_0 + \beta_1*age + \beta_2*age^2 \]
There are two types of polynomials: raw and orthogonal ones
Simple polynomials alleviates some of the problems of categorization (arbitrary cutpoints, assumptions of flat intervals)
However, two problems:
\[
vote = \beta_0 + \beta_1*age_{<25} + \beta_2*age_{25-50} + \beta_3*age_{50-75} + \beta_3*age_{>75}
\]
We can think of linear splines as a more flexible version of categorization
How to choose the number and position of cutpoints?
Linear splines
Natural splines
knots | Quantiles | ||||||
---|---|---|---|---|---|---|---|
3 | 0.1 | 0.5 | 0.9 | ||||
4 | 0.05 | 0.35 | 0.65 | 0.95 | |||
5 | 0.05 | 0.275 | 0.5 | 0.725 | 0.95 | ||
6 | 0.05 | 0.23 | 0.41 | 0.59 | 0.77 | 0.95 | |
7 | 0.025 | 0.1833 | 0.33417 | 0.5 | 0.6583 | 0.8167 | 0.975 |
Linear splines:
Natural splines
Harrell, F. (2001). Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. Springer-Verlag. https://doi.org/10.1007/978-1-4757-3462-1