- Understand concept of causality
- Distinguish causation from association
- Beware of confounding in observational studies
- Use control variables to limit confounding
Use experiments to assess causal relations
- Readings
- ISRS: ch. 1.3-5
Use experiments to assess causal relations
Type of relation determines its practical use
Regression model describes "relationship" between response (\(Y\)) and explanatory (\(X\)) variables
Plug-in nature of regression function (\(X\rightarrow f(X)=\hat{Y}\)) suggests that \(X\) causes \(Y\)
and (linear) causal relationships
\[X = a + b Z + \textrm{err}_X, \quad Y = c + d Z + \textrm{err}_Y\]
sim = tibble( Z = 30 + 10 * rnorm(100), X = 20 + 3 * Z + 20*rnorm(100), Y = 50 + 1 * Z + 10*rnorm(100))
rnorm()
generates real random numberstibble(x=rnorm(1000)) %>% ggplot(aes(x=x)) + geom_density()
lm( Y ~ X, data = sim) %>% tidy() ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 60.1 3.71 16.2 1.75e-29 ## 2 X 0.199 0.0322 6.19 1.44e- 8
sim %>% mutate( Z = seq( from = min(Z), to = max(Z), length.out = 100), X = 20 + 3 * Z + 20*rnorm(100), Y = 50 + 1 * Z + 10*rnorm(100) ) %>% lm(Y ~ Z, data = .) %>% tidy() ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 50.0 2.43 20.6 2.51e-37 ## 2 Z 1.03 0.0755 13.7 1.98e-24
sim %>% mutate( X = seq(min(X), max(X), length.out = 100)) %>% lm(Y ~ X, data = .) %>% tidy() ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 77.3 3.39 22.8 5.84e-41 ## 2 X 0.0422 0.0280 1.50 1.36e- 1
sim %>% mutate( X = 20 + 3 * Z + 20*rnorm(100) ) %>% lm(Y ~ X + Z, data = .) %>% tidy() ## # A tibble: 3 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 56.0 3.48 16.1 4.21e-29 ## 2 X -0.0374 0.0538 -0.695 4.89e- 1 ## 3 Z 0.983 0.198 4.96 3.05e- 6
Gold standard of scientific evidence is Randomized Controlled Trial (RCT)