- Learn how to
- Model nonlinear relationships
- Perform nonparametric regression w/ basis functions
- Use nonlinear models for prediction
- Model nonlinear relationships
- Readings
- ISLR ch 7.1-3
Two ways to estimate \(f(\cdot)\)
gap07 %>% mutate( log_GDP = log(gdpPercap) ) %>% lm( lifeExp ~ log_GDP, data = .) %>% tidy() ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 4.95 3.86 1.28 2.02e- 1 ## 2 log_GDP 7.20 0.442 16.3 4.12e-34
gap07 %>% ggplot( aes(y=lifeExp, x=gdpPercap) ) + geom_point() + geom_smooth(method = "gam", formula = y ~ s(x))
More generally, for set of basis functions \(\{h_i(\cdot)\}\) \[f(x) = \beta_0 + \beta_1 h_1(x) + \ldots + \beta_m h_m(x)\]
gap07 %>% mutate( X1 = gdpPercap, X2 = X1^2, X3 = X1^3) %>% lm( lifeExp ~ X1 + X2 +X3, data = .) %>% tidy() ## # A tibble: 4 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 5.30e+ 1 1.31e+ 0 40.4 2.33e-78 ## 2 X1 2.73e- 3 3.55e- 4 7.69 2.47e-12 ## 3 X2 -9.27e- 8 1.99e- 8 -4.65 7.76e- 6 ## 4 X3 1.01e-12 3.00e-13 3.37 9.83e- 4
Generalised additive model (GAM) describes \(Y\) as sum of (smooth) functions of \(X\)'s
\[Y \sim s_1(X_1) + s_2(X_2) + \ldots\]
gam()
function in mgcv
package
lm()
s()
for smooth function (spline-based)library(mgcv) gam_out = gam( lifeExp ~ s(gdpPercap) + continent, data = gap07)
gap07 %>% mutate( pred = predict( gam_out, gap07 ) ) %>% ggplot( aes(y=lifeExp, x=gdpPercap, col = continent) ) + geom_point() + geom_line( aes(y = pred) )
library(rpart) tree = rpart( lifeExp ~ gdpPercap + continent, data = gap07)
library(rpart.plot); rpart.plot(tree)
tree = rpart( lifeExp ~ gdpPercap, data = gap07, control = rpart.control(cp = 0, minbucket=3))
predict()