- Perform comparisons of multiple groups
- Compare means/proportions
- Examine notion of variable (in)dependence
- Test for independence of catergorical variables
- Readings
- ISRS: ch. 3.4, 4.4
educ value | description |
---|---|
0 | 0 to 8 years |
1 | Some secondary |
2 | Gr 11 to 13 |
3 | Some post secondary |
4 | Post secondary certificate or diploma |
5 | University: bachelors degree |
6 | University: graduate degree |
Test equality of #\(m\) group means \(H_0: \mu_1 = \mu_2 = \cdots = \mu_m\)
library(coin) lfs %>% mutate( educ = factor(educ) ) %>% kruskal_test( hrlyearn ~ educ , data = ., distribution = "approx" ) ## ## Approximative Kruskal-Wallis Test ## ## data: hrlyearn by educ (0, 1, 2, 3, 4, 5, 6) ## chi-squared = 5350.5, p-value < 2.2e-16
Is unemployment rate the same for different levels of education?
Contingency table contains frequencies of combinations of two categorical variables
lfs %>% filter(lfsstat != 4 ) %>% mutate( educ = factor(educ), empl = factor(lfsstat != 3)) %>% xtabs( ~ empl + educ, data = .) ## educ ## empl 0 1 2 3 4 5 6 ## FALSE 33 251 502 200 574 683 296 ## TRUE 459 1816 6122 2237 10500 11457 5905
lfs %>% filter(lfsstat != 4 ) %>% mutate( educ = factor(educ), empl = (lfsstat != 3)) %>% ggplot( aes(x = educ, fill = empl)) + geom_bar()
prop.table()
for relative proportionsaddmargins()
for table totalsmargin = 1/2
)lfs %>% filter(lfsstat != 4 ) %>% mutate( educ = factor(educ), empl = factor(lfsstat != 3)) %>% xtabs( ~ empl + educ, data = .) %>% prop.table( margin = 2 ) %>% addmargins( 1 ) %>% round(4) ## educ ## empl 0 1 2 3 4 5 6 ## FALSE 0.0671 0.1214 0.0758 0.0821 0.0518 0.0563 0.0477 ## TRUE 0.9329 0.8786 0.9242 0.9179 0.9482 0.9437 0.9523 ## Sum 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
lfs %>% filter(lfsstat != 4 ) %>% mutate( educ = factor(educ), empl = (lfsstat != 3)) %>% ggplot( aes(x = educ, fill = empl)) + geom_bar(position = "fill")
library(ggmosaic) lfs %>% filter(lfsstat != 4 ) %>% mutate( educ = factor(educ), empl = (lfsstat != 3)) %>% ggplot() + geom_mosaic(aes(x = product(educ), fill = empl)) + xlab("educ")
Test equality of #\(m\) group proportions \(H_0: p_1 = p_2 = \cdots = p_m\)
library(coin) lfs %>% filter(lfsstat != 4 ) %>% mutate( educ = factor(educ), empl = factor(lfsstat != 3)) %>% chisq_test( empl ~ educ, data= ., distribution = "approx") ## ## Approximative Pearson Chi-Squared Test ## ## data: empl by educ (0, 1, 2, 3, 4, 5, 6) ## chi-squared = 212.93, p-value < 2.2e-16
lfs %>% mutate( educ = factor(educ), marstat = factor(marstat)) %>% chisq_test( marstat ~ educ, data= ., distribution = "approx") ## ## Approximative Pearson Chi-Squared Test ## ## data: marstat by educ (0, 1, 2, 3, 4, 5, 6) ## chi-squared = 10696, p-value < 2.2e-16
coin::independence_test()
provides general framework for comparing two or more groups
Y ~ X
Y
values across levels of factor X
independence_test()
for tests of equality of 2+ means/proportions