Goal: Explore topics in hypothesis testing (power, effect size, data snooping) using simulation

  1. Using LFS data, we found that women earn less than men on average, with an effect size of \(d=0.336\). Verify the value of the effect size directly from the data using the formula \[d = \frac{ \hat{\mu}_1 - \hat{\mu}_2 }{ S_{pooled} } = \frac{ \hat{\mu}_1 - \hat{\mu}_2 }{ \sqrt{ S^2_{pooled} } }\] The pooled variance is given by \[ S^2_{pooled} = \frac{ S^2_1 (n_1-1) + S^2_2 (n_2-1)}{(n_1-1) + (n_2-1)} \] where \(n_{1/2}, S^2_{1/2}\) are the sample sizes and variances of the two groups (men/women) respectively.
    (Hint: group the relevant data by sex, apply summary functions for means/variances/counts, and combine the resulting values accordingly)

We will now explore the notion of power through a simulation experiment. The following code generates \(100,000\) flips of a fair coin, grouped into 1000 repetitions (rep) of 100 flips each.

set.seed(123)
sim_H0 = 
  tibble( rep = 1:1000 ) %>% 
  group_by( rep ) %>% 
  do( tibble( flip = sample( c("H","T"), size = 100, replace = TRUE) ) ) %>%
  ungroup()
glimpse(sim_H0)
Observations: 100,000
Variables: 2
$ rep  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
$ flip <chr> "H", "T", "H", "T", "T", "H", "T", "T", "T", "H", "T", "H",...
  1. Calculate the proportion of heads for each group; you can think of the resulting values as \(1000\) samples of the test statistic (\(\hat{p}\)) under the null hypothesis \(H_0: p = .5\), for a fixed sample size of \(n=100\). Draw the density plot of the proportion values; this represents the sampling distribution under \(H_0\). Calculate the 95% quantile and overlay it as a vertical line on the plot.

  2. Suppose we want to test \(H_0: p \leq .5\) vs \(H_A: p>.5\) at significance level \(\alpha = 5\%\). For power calculations, we have to consider specific alternatives in \(H_A\). Assume we are interested in the alternative \(p=.6\) (i.e. an “effect size” of .6-.5=.1). Simulate a new set of flips under \(p=.6\), with 1000 repetitions of \(n=100\) samples, and overlay the resulting statistic density on the previous plot.

  3. Estimate the power of the test, i.e. the probability that a sample under \(H_A: p=.6\) would reject \(H_0\) at \(\alpha = 5\%\). The power is estimated by the proportion of samples under \(p=.6\) that give a test statistic greater than the 95% quantile under \(H_0: p=.5\).

  4. Repeat the previous calculation under the alternative \(p = .7\), i.e. for a larger effect size. What happens to the power?

  5. Repeat the previous calculation for a sample size of \(n=50\) (and 1000 repetitions); i.e. calculate the power for a smaller sample size, keeping the same \(p=.7\). Note that you will have to recalculate the quantile under \(H_0:p=.5\) as well. What happens to the power?

  6. Repeat the previous calculation for a significance level of \(\alpha = 1\%\), all other things being the same. What happens to the power?

  7. [EXTRA] Assume you are testing \(H_0: p \leq .5\) vs \(H_A: p > .5\) with \(n=50\) and a test statistic cut-off of .6 (i.e. you reject \(H_0\) if \(\hat{p}>.6\)). From the 1000 samples you generated from \(H_0: p=.5\), what proportion reject the null \(H_0\)? How many independent samples would you have to check, on average, before you find a significant one?

LS0tDQp0aXRsZTogIlNUQUE1NyAtIFdvcmtTaGVldCAxMiINCmF1dGhvcjogJ05hbWU6ICAgICwgSUQjOiAgICcNCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KZWRpdG9yX29wdGlvbnM6IA0KICBjaHVua19vdXRwdXRfdHlwZTogaW5saW5lDQotLS0NCg0KKioqIA0KDQoqKkdvYWwqKjogRXhwbG9yZSB0b3BpY3MgaW4gaHlwb3RoZXNpcyB0ZXN0aW5nIChwb3dlciwgZWZmZWN0IHNpemUsIGRhdGEgc25vb3BpbmcpIHVzaW5nIHNpbXVsYXRpb24NCg0KYGBge3IsIGluY2x1ZGU9RkFMU0V9DQpsaWJyYXJ5KHRpZHl2ZXJzZSkNCmxmcyA9IHJlYWRfY3N2KCcuL2RhdGEvTEZTX1Rvcm9udG8uY3N2JykgDQpgYGANCg0KDQoNCjEuIFVzaW5nIExGUyBkYXRhLCB3ZSBmb3VuZCB0aGF0IHdvbWVuIGVhcm4gbGVzcyB0aGFuIG1lbiBvbiBhdmVyYWdlLCB3aXRoIGFuIGVmZmVjdCBzaXplIG9mICRkPTAuMzM2JC4gVmVyaWZ5IHRoZSB2YWx1ZSBvZiB0aGUgZWZmZWN0IHNpemUgZGlyZWN0bHkgZnJvbSB0aGUgZGF0YSB1c2luZyB0aGUgZm9ybXVsYQ0KJCRkID0gXGZyYWN7IFxoYXR7XG11fV8xIC0gXGhhdHtcbXV9XzIgfXsgU197cG9vbGVkfSB9ID0gXGZyYWN7IFxoYXR7XG11fV8xIC0gXGhhdHtcbXV9XzIgfXsgXHNxcnR7IFNeMl97cG9vbGVkfSB9IH0kJCANClRoZSBwb29sZWQgdmFyaWFuY2UgaXMgZ2l2ZW4gYnkNCiQkIFNeMl97cG9vbGVkfSA9IFxmcmFjeyBTXjJfMSAgKG5fMS0xKSArIFNeMl8yICAobl8yLTEpfXsobl8xLTEpICsgKG5fMi0xKX0gJCQgDQp3aGVyZSAkbl97MS8yfSwgU14yX3sxLzJ9JCBhcmUgdGhlIHNhbXBsZSBzaXplcyBhbmQgdmFyaWFuY2VzIG9mIHRoZSB0d28gZ3JvdXBzIChtZW4vd29tZW4pIHJlc3BlY3RpdmVseS4gICANCihIaW50OiBncm91cCB0aGUgcmVsZXZhbnQgZGF0YSBieSBgc2V4YCwgYXBwbHkgc3VtbWFyeSBmdW5jdGlvbnMgZm9yIG1lYW5zL3ZhcmlhbmNlcy9jb3VudHMsIGFuZCBjb21iaW5lIHRoZSByZXN1bHRpbmcgdmFsdWVzIGFjY29yZGluZ2x5KQ0KDQoqKioNCg0KV2Ugd2lsbCBub3cgZXhwbG9yZSB0aGUgbm90aW9uIG9mIHBvd2VyIHRocm91Z2ggYSBzaW11bGF0aW9uIGV4cGVyaW1lbnQuIFRoZSBmb2xsb3dpbmcgY29kZSBnZW5lcmF0ZXMgJDEwMCwwMDAkIGZsaXBzIG9mIGEgKmZhaXIqIGNvaW4sIGdyb3VwZWQgaW50byAxMDAwIHJlcGV0aXRpb25zIChgcmVwYCkgb2YgMTAwIGZsaXBzIGVhY2guIA0KDQpgYGB7cn0NCnNldC5zZWVkKDEyMykNCnNpbV9IMCA9IA0KICB0aWJibGUoIHJlcCA9IDE6MTAwMCApICU+JSANCiAgZ3JvdXBfYnkoIHJlcCApICU+JSANCiAgZG8oIHRpYmJsZSggZmxpcCA9IHNhbXBsZSggYygiSCIsIlQiKSwgc2l6ZSA9IDEwMCwgcmVwbGFjZSA9IFRSVUUpICkgKSAlPiUNCiAgdW5ncm91cCgpDQpnbGltcHNlKHNpbV9IMCkNCmBgYA0KDQoyLiBDYWxjdWxhdGUgdGhlICpwcm9wb3J0aW9uIG9mIGhlYWRzKiBmb3IgZWFjaCBncm91cDsgeW91IGNhbiB0aGluayBvZiB0aGUgcmVzdWx0aW5nIHZhbHVlcyBhcyAkMTAwMCQgc2FtcGxlcyBvZiB0aGUgdGVzdCBzdGF0aXN0aWMgKCRcaGF0e3B9JCkgdW5kZXIgdGhlIG51bGwgaHlwb3RoZXNpcyAkSF8wOiBwID0gLjUkLCBmb3IgYSBmaXhlZCBzYW1wbGUgc2l6ZSBvZiAkbj0xMDAkLiANCkRyYXcgdGhlIGRlbnNpdHkgcGxvdCBvZiB0aGUgcHJvcG9ydGlvbiB2YWx1ZXM7IHRoaXMgcmVwcmVzZW50cyB0aGUgc2FtcGxpbmcgZGlzdHJpYnV0aW9uIHVuZGVyICRIXzAkLiBDYWxjdWxhdGUgdGhlIDk1JSBxdWFudGlsZSBhbmQgb3ZlcmxheSBpdCBhcyBhIHZlcnRpY2FsIGxpbmUgb24gdGhlIHBsb3QuIA0KDQozLiBTdXBwb3NlIHdlIHdhbnQgdG8gdGVzdCAkSF8wOiBwIFxsZXEgLjUkIHZzICRIX0E6IHA+LjUkIGF0IHNpZ25pZmljYW5jZSBsZXZlbCAkXGFscGhhID0gNVwlJC4gRm9yIHBvd2VyIGNhbGN1bGF0aW9ucywgd2UgaGF2ZSB0byBjb25zaWRlciBzcGVjaWZpYyBhbHRlcm5hdGl2ZXMgaW4gJEhfQSQuIEFzc3VtZSB3ZSBhcmUgaW50ZXJlc3RlZCBpbiB0aGUgYWx0ZXJuYXRpdmUgJHA9LjYkIChpLmUuIGFuICJlZmZlY3Qgc2l6ZSIgb2YgLjYtLjU9LjEpLiBTaW11bGF0ZSBhICpuZXcqIHNldCBvZiBmbGlwcyB1bmRlciAkcD0uNiQsIHdpdGggMTAwMCByZXBldGl0aW9ucyBvZiAkbj0xMDAkIHNhbXBsZXMsIGFuZCBvdmVybGF5IHRoZSByZXN1bHRpbmcgc3RhdGlzdGljIGRlbnNpdHkgb24gdGhlIHByZXZpb3VzIHBsb3QuIA0KDQo0LiBFc3RpbWF0ZSB0aGUgKnBvd2VyKiBvZiB0aGUgdGVzdCwgaS5lLiB0aGUgcHJvYmFiaWxpdHkgdGhhdCBhIHNhbXBsZSB1bmRlciAkSF9BOiBwPS42JCAgd291bGQgKnJlamVjdCogJEhfMCQgYXQgJFxhbHBoYSA9IDVcJSQuIFRoZSBwb3dlciBpcyBlc3RpbWF0ZWQgYnkgdGhlIHByb3BvcnRpb24gb2Ygc2FtcGxlcyB1bmRlciAkcD0uNiQgdGhhdCBnaXZlIGEgdGVzdCBzdGF0aXN0aWMgZ3JlYXRlciB0aGFuIHRoZSA5NSUgcXVhbnRpbGUgdW5kZXIgJEhfMDogcD0uNSQuIA0KDQo1LiBSZXBlYXQgdGhlIHByZXZpb3VzIGNhbGN1bGF0aW9uIHVuZGVyIHRoZSBhbHRlcm5hdGl2ZSAkcCA9IC43JCwgaS5lLiBmb3IgYSAqbGFyZ2VyKiBlZmZlY3Qgc2l6ZS4gV2hhdCBoYXBwZW5zIHRvIHRoZSBwb3dlcj8NCg0KNi4gUmVwZWF0IHRoZSBwcmV2aW91cyBjYWxjdWxhdGlvbiBmb3IgYSBzYW1wbGUgc2l6ZSBvZiAkbj01MCQgKGFuZCAxMDAwIHJlcGV0aXRpb25zKTsgaS5lLiBjYWxjdWxhdGUgdGhlIHBvd2VyIGZvciBhICpzbWFsbGVyKiBzYW1wbGUgc2l6ZSwga2VlcGluZyB0aGUgc2FtZSAkcD0uNyQuIE5vdGUgdGhhdCB5b3Ugd2lsbCBoYXZlIHRvICpyZWNhbGN1bGF0ZSogdGhlIHF1YW50aWxlIHVuZGVyICRIXzA6cD0uNSQgYXMgd2VsbC4gV2hhdCBoYXBwZW5zIHRvIHRoZSBwb3dlcj8gIA0KDQo3LiBSZXBlYXQgdGhlIHByZXZpb3VzIGNhbGN1bGF0aW9uIGZvciBhIHNpZ25pZmljYW5jZSBsZXZlbCBvZiAkXGFscGhhID0gMVwlJCwgYWxsIG90aGVyIHRoaW5ncyBiZWluZyB0aGUgc2FtZS4gV2hhdCBoYXBwZW5zIHRvIHRoZSBwb3dlcj8gIA0KDQo4LiBbRVhUUkFdIEFzc3VtZSB5b3UgYXJlIHRlc3RpbmcgJEhfMDogcCBcbGVxIC41JCB2cyAkSF9BOiBwID4gLjUkIHdpdGggJG49NTAkIGFuZCBhIHRlc3Qgc3RhdGlzdGljIGN1dC1vZmYgb2YgLjYgKGkuZS4geW91IHJlamVjdCAkSF8wJCBpZiAkXGhhdHtwfT4uNiQpLiBGcm9tIHRoZSAxMDAwIHNhbXBsZXMgeW91IGdlbmVyYXRlZCBmcm9tICRIXzA6IHA9LjUkLCB3aGF0IHByb3BvcnRpb24gcmVqZWN0IHRoZSBudWxsICRIXzAkPyBIb3cgbWFueSBpbmRlcGVuZGVudCBzYW1wbGVzIHdvdWxkIHlvdSBoYXZlIHRvIGNoZWNrLCBvbiBhdmVyYWdlLCBiZWZvcmUgeW91IGZpbmQgYSBzaWduaWZpY2FudCBvbmU/DQoNCg==