Goal: Perform hypothesis tests using simulation and randomization

Use the Labour Fource Survey (LFS) data for Toronto.

rr

lfs = read_csv('./data/LFS_Toronto.csv') 
Parsed with column specification:
cols(
  .default = col_integer(),
  uhrsmain = col_double(),
  ahrsmain = col_double(),
  utothrs = col_double(),
  atothrs = col_double(),
  hrsaway = col_double(),
  paidot = col_double(),
  unpaidot = col_double(),
  xtrahrs = col_double(),
  hrlyearn = col_double(),
  fweighta = col_double()
)
See spec(...) for full column specifications.
  1. You want to test if the unemployment rate is different for men & women. Formulate the null and alternate hypothesis in terms of the population unemployment rate parameters \(p_F\) and \(p_M\).

  2. Use the coin::independence_test() function to conduct the relevant hypothesis test. Filter observations outside the labour force, and create factor variables representing gender and employment to pass to the function’s formula (when both variables are factors, the formula Y ~ X will test if Y proportions are equal across values of X). Report the test P-value, and state your conclusion at the 5% significance level.


Consider the following subset of the LFS data, containing hourly earnings (when available) of university-educated individuals aged 25-26.

rr

lfs_subset = lfs %>% 
  filter( educ %in% 5:6, age_6 == 5 ) %>% 
  mutate( educ = factor(educ, levels = 5:6, labels = c(\BSc\,\MSc_PhD\)) ) %>%
  select( hrlyearn, educ) %>% drop_na()
  1. Create side-by-side boxplots of the hourly earnings of BSc vs graduate (MSc_PhD) degree holders in the subset. Does there seem to be a significant difference in hourly wages.

  2. You want to test if there is a difference in average hourly earning between BSc and MSc/PhD holder, i.e. \(H_0: \mu_{BSc} - \mu_{MSc/PhD} =0\) vs \(H_A: \mu_{BSc} - \mu_{MSc/PhD} \neq 0\). Use coin::independence_test() to perform the test, and report the P-value.


The sample average difference in hourly earning (sample_pay_diff) between BSc and MSc/PhD for the 25-26yr demographic is less than $1/hr:

rr

sample_pay_diff = lfs_subset %>% 
  group_by( educ ) %>% 
  summarise( avg_pay = mean(hrlyearn, na.rm = TRUE) ) %>% 
  spread(educ, avg_pay) %>% 
  mutate( pay_diff = MSc_PhD - BSc ) %>% 
  pull( pay_diff )
sample_pay_diff
[1] 0.9420844

The code below generates 1,000 permutations of the data, i.e. 1,000 versions with randomly shuffled values of educ, and saves them in the data-frame lfs_perm.

rr

set.seed(123) # fixes random numbers (permutations)
lfs_perm = 
  data_frame( perm = 1:1000 ) %>% # create data-frame w/ permutation ID#
  group_by(perm) %>% # for each permutation #
  do( lfs_subset %>% mutate( educ = sample(educ) ) ) %>% # shuffle educ labels 
  ungroup()
glimpse(lfs_perm)
Observations: 638,000
Variables: 3
$ perm     <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ hrlyearn <dbl> 17.00, 29.74, 46.15, 35.71, 20.00, 32.05, 18.75, 15.60,...
$ educ     <fct> BSc, MSc_PhD, BSc, MSc_PhD, BSc, BSc, BSc, BSc, BSc, BS...

For the next two parts you will perform a permutation test by hand, i.e. you will run the steps that are performed under the hood of independence_test(), and compare results.

  1. Caclulate the statistic values, i.e. the differences in avg hourly earnings, for each of the 1,000 permutations. Plot the histogram of the permuted values, together with a vertical line at the original sample value. Does the sample value seem likely to have come from \(H_0\)?

  2. Calculate the P-value based on the permutations (i.e. the proportion of differences that are equally or further away from zero than the observed one). Is it consistent with the P-value you got from independence_test()?

  3. Do you think that earnings for BSc and MSc/PhD holders are the same in general? What could be misleading with the particular demographic we focused on?

LS0tDQp0aXRsZTogIlNUQUE1NyAtIFdvcmtTaGVldCAxMSINCmF1dGhvcjogJ05hbWU6ICAgICwgSUQjOiAgICcNCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KZWRpdG9yX29wdGlvbnM6IA0KICBjaHVua19vdXRwdXRfdHlwZTogaW5saW5lDQotLS0NCg0KKioqDQogDQoqKkdvYWwqKjogUGVyZm9ybSBoeXBvdGhlc2lzIHRlc3RzIHVzaW5nIHNpbXVsYXRpb24gYW5kIHJhbmRvbWl6YXRpb24NCg0KDQpgYGB7ciwgaW5jbHVkZT1GQUxTRX0NCmxpYnJhcnkodGlkeXZlcnNlKQ0KYGBgDQoNClVzZSB0aGUgTGFib3VyIEZvdXJjZSBTdXJ2ZXkgKExGUykgZGF0YSBmb3IgVG9yb250by4NCmBgYHtyLCBjb2xsYXBzZT1UUlVFfQ0KbGZzID0gcmVhZF9jc3YoJy4vZGF0YS9MRlNfVG9yb250by5jc3YnKSANCmBgYA0KDQoxLiBZb3Ugd2FudCB0byB0ZXN0IGlmIHRoZSAqdW5lbXBsb3ltZW50IHJhdGUqIGlzICpkaWZmZXJlbnQqIGZvciBtZW4gJiB3b21lbi4gRm9ybXVsYXRlIHRoZSBudWxsIGFuZCBhbHRlcm5hdGUgaHlwb3RoZXNpcyBpbiB0ZXJtcyBvZiB0aGUgcG9wdWxhdGlvbiB1bmVtcGxveW1lbnQgcmF0ZSBwYXJhbWV0ZXJzICRwX0YkIGFuZCAkcF9NJC4NCg0KDQoyLiBVc2UgdGhlIGBjb2luOjppbmRlcGVuZGVuY2VfdGVzdCgpYCBmdW5jdGlvbiB0byBjb25kdWN0IHRoZSByZWxldmFudCBoeXBvdGhlc2lzIHRlc3QuIEZpbHRlciBvYnNlcnZhdGlvbnMgb3V0c2lkZSB0aGUgbGFib3VyIGZvcmNlLCBhbmQgY3JlYXRlICpmYWN0b3IqIHZhcmlhYmxlcyByZXByZXNlbnRpbmcgZ2VuZGVyIGFuZCBlbXBsb3ltZW50IHRvIHBhc3MgdG8gdGhlIGZ1bmN0aW9uJ3MgZm9ybXVsYSAod2hlbiBib3RoIHZhcmlhYmxlcyBhcmUgZmFjdG9ycywgdGhlIGZvcm11bGEgYFkgfiBYYCB3aWxsIHRlc3QgaWYgYFlgIHByb3BvcnRpb25zIGFyZSBlcXVhbCBhY3Jvc3MgdmFsdWVzIG9mIGBYYCkuIFJlcG9ydCB0aGUgdGVzdCBQLXZhbHVlLCBhbmQgc3RhdGUgeW91ciBjb25jbHVzaW9uIGF0IHRoZSA1JSBzaWduaWZpY2FuY2UgbGV2ZWwuDQoNCioqKiANCkNvbnNpZGVyIHRoZSBmb2xsb3dpbmcgc3Vic2V0IG9mIHRoZSBMRlMgZGF0YSwgY29udGFpbmluZyBob3VybHkgZWFybmluZ3MgKHdoZW4gYXZhaWxhYmxlKSBvZiB1bml2ZXJzaXR5LWVkdWNhdGVkIGluZGl2aWR1YWxzIGFnZWQgMjUtMjYuDQoNCmBgYHtyfQ0KbGZzX3N1YnNldCA9IGxmcyAlPiUgDQogIGZpbHRlciggZWR1YyAlaW4lIDU6NiwgYWdlXzYgPT0gNSApICU+JSANCiAgbXV0YXRlKCBlZHVjID0gZmFjdG9yKGVkdWMsIGxldmVscyA9IDU6NiwgbGFiZWxzID0gYygiQlNjIiwiTVNjX1BoRCIpKSApICU+JQ0KICBzZWxlY3QoIGhybHllYXJuLCBlZHVjKSAlPiUgZHJvcF9uYSgpDQpgYGANCg0KMy4gQ3JlYXRlIHNpZGUtYnktc2lkZSBib3hwbG90cyBvZiB0aGUgaG91cmx5IGVhcm5pbmdzIG9mIEJTYyB2cyBncmFkdWF0ZSAoYE1TY19QaERgKSBkZWdyZWUgaG9sZGVycyBpbiB0aGUgc3Vic2V0LiBEb2VzIHRoZXJlIHNlZW0gdG8gYmUgYSBzaWduaWZpY2FudCBkaWZmZXJlbmNlIGluIGhvdXJseSB3YWdlcy4gDQoNCjQuIFlvdSB3YW50IHRvIHRlc3QgaWYgdGhlcmUgaXMgYSBkaWZmZXJlbmNlIGluICphdmVyYWdlKiBob3VybHkgZWFybmluZyBiZXR3ZWVuIEJTYyBhbmQgTVNjL1BoRCBob2xkZXIsIGkuZS4gJEhfMDogXG11X3tCU2N9IC0gXG11X3tNU2MvUGhEfSA9MCQgdnMgJEhfQTogXG11X3tCU2N9IC0gXG11X3tNU2MvUGhEfSBcbmVxIDAkLiBVc2UgYGNvaW46OmluZGVwZW5kZW5jZV90ZXN0KClgIHRvIHBlcmZvcm0gdGhlIHRlc3QsIGFuZCByZXBvcnQgdGhlIFAtdmFsdWUuDQoNCg0KKioqDQpUaGUgc2FtcGxlIGF2ZXJhZ2UgZGlmZmVyZW5jZSBpbiBob3VybHkgZWFybmluZyAoYHNhbXBsZV9wYXlfZGlmZmApIGJldHdlZW4gYEJTY2AgYW5kIGBNU2MvUGhEYCBmb3IgdGhlIDI1LTI2eXIgZGVtb2dyYXBoaWMgaXMgbGVzcyB0aGFuIFwkMS9ocjoNCmBgYHtyfQ0Kc2FtcGxlX3BheV9kaWZmID0gbGZzX3N1YnNldCAlPiUgDQogIGdyb3VwX2J5KCBlZHVjICkgJT4lIA0KICBzdW1tYXJpc2UoIGF2Z19wYXkgPSBtZWFuKGhybHllYXJuLCBuYS5ybSA9IFRSVUUpICkgJT4lIA0KICBzcHJlYWQoZWR1YywgYXZnX3BheSkgJT4lIA0KICBtdXRhdGUoIHBheV9kaWZmID0gTVNjX1BoRCAtIEJTYyApICU+JSANCiAgcHVsbCggcGF5X2RpZmYgKQ0Kc2FtcGxlX3BheV9kaWZmDQpgYGANCg0KVGhlIGNvZGUgYmVsb3cgZ2VuZXJhdGVzIDEsMDAwICpwZXJtdXRhdGlvbnMqIG9mIHRoZSBkYXRhLCBpLmUuIDEsMDAwIHZlcnNpb25zIHdpdGggcmFuZG9tbHkgc2h1ZmZsZWQgdmFsdWVzIG9mIGBlZHVjYCwgYW5kIHNhdmVzIHRoZW0gaW4gdGhlIGRhdGEtZnJhbWUgYGxmc19wZXJtYC4gIA0KDQpgYGB7cn0NCnNldC5zZWVkKDEyMykgIyBmaXhlcyByYW5kb20gbnVtYmVycyAocGVybXV0YXRpb25zKQ0KbGZzX3Blcm0gPSANCiAgdGliYmxlKCBwZXJtID0gMToxMDAwICkgJT4lICMgY3JlYXRlIGRhdGEtZnJhbWUgdy8gcGVybXV0YXRpb24gSUQjDQogIGdyb3VwX2J5KHBlcm0pICU+JSAjIGZvciBlYWNoIHBlcm11dGF0aW9uICMNCiAgZG8oIGxmc19zdWJzZXQgJT4lIG11dGF0ZSggZWR1YyA9IHNhbXBsZShlZHVjKSApICkgJT4lICMgc2h1ZmZsZSBlZHVjIGxhYmVscyANCiAgdW5ncm91cCgpDQpnbGltcHNlKGxmc19wZXJtKQ0KYGBgDQoNCkZvciB0aGUgbmV4dCB0d28gcGFydHMgeW91IHdpbGwgcGVyZm9ybSBhIHBlcm11dGF0aW9uIHRlc3QgKmJ5IGhhbmQqLCBpLmUuIHlvdSB3aWxsIHJ1biB0aGUgc3RlcHMgdGhhdCBhcmUgcGVyZm9ybWVkIHVuZGVyIHRoZSBob29kIG9mIGBpbmRlcGVuZGVuY2VfdGVzdCgpYCwgYW5kIGNvbXBhcmUgcmVzdWx0cy4NCg0KNS4gQ2FjbHVsYXRlIHRoZSBzdGF0aXN0aWMgdmFsdWVzLCBpLmUuIHRoZSAqZGlmZmVyZW5jZXMqIGluIGF2ZyBob3VybHkgZWFybmluZ3MsIGZvciBlYWNoIG9mIHRoZSAxLDAwMCBwZXJtdXRhdGlvbnMuIFBsb3QgdGhlIGhpc3RvZ3JhbSBvZiB0aGUgcGVybXV0ZWQgdmFsdWVzLCB0b2dldGhlciB3aXRoIGEgdmVydGljYWwgbGluZSBhdCB0aGUgb3JpZ2luYWwgc2FtcGxlIHZhbHVlLiBEb2VzIHRoZSBzYW1wbGUgdmFsdWUgc2VlbSBsaWtlbHkgdG8gaGF2ZSBjb21lIGZyb20gJEhfMCQ/DQoNCjYuIENhbGN1bGF0ZSB0aGUgUC12YWx1ZSBiYXNlZCBvbiB0aGUgcGVybXV0YXRpb25zIChpLmUuIHRoZSBwcm9wb3J0aW9uIG9mIGRpZmZlcmVuY2VzIHRoYXQgYXJlIGVxdWFsbHkgb3IgZnVydGhlciBhd2F5IGZyb20gemVybyB0aGFuIHRoZSBvYnNlcnZlZCBvbmUpLiBJcyBpdCBjb25zaXN0ZW50IHdpdGggdGhlIFAtdmFsdWUgeW91IGdvdCBmcm9tIGBpbmRlcGVuZGVuY2VfdGVzdCgpYD8gDQoNCjcuIERvIHlvdSB0aGluayB0aGF0IGVhcm5pbmdzIGZvciBCU2MgYW5kIE1TYy9QaEQgaG9sZGVycyBhcmUgdGhlIHNhbWUgKmluIGdlbmVyYWwqPyBXaGF0IGNvdWxkIGJlIG1pc2xlYWRpbmcgd2l0aCB0aGUgcGFydGljdWxhciBkZW1vZ3JhcGhpYyB3ZSBmb2N1c2VkIG9uPw0KDQo=