Goal: Explore topics in hypothesis testing (power, effect size, data snooping) using simulation
- Using LFS data, we found that women earn less than men on average, with an effect size of \(d=0.336\). Verify the value of the effect size directly from the data using the formula \[d = \frac{ \hat{\mu}_1 - \hat{\mu}_2 }{ S_{pooled} } = \frac{ \hat{\mu}_1 - \hat{\mu}_2 }{ \sqrt{ S^2_{pooled} } }\] The pooled variance is given by \[ S^2_{pooled} = \frac{ S^2_1 (n_1-1) + S^2_2 (n_2-1)}{(n_1-1) + (n_2-1)} \] where \(n_{1/2}, S^2_{1/2}\) are the sample sizes and variances of the two groups (men/women) respectively.
(Hint: group the relevant data by sex
, apply summary functions for means/variances/counts, and combine the resulting values accordingly)
We will now explore the notion of power through a simulation experiment. The following code generates \(100,000\) flips of a fair coin, grouped into 1000 repetitions (rep
) of 100 flips each.
set.seed(123)
sim_H0 =
tibble( rep = 1:1000 ) %>%
group_by( rep ) %>%
do( tibble( flip = sample( c("H","T"), size = 100, replace = TRUE) ) ) %>%
ungroup()
glimpse(sim_H0)
Observations: 100,000
Variables: 2
$ rep <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
$ flip <chr> "H", "T", "H", "T", "T", "H", "T", "T", "T", "H", "T", "H",...
Calculate the proportion of heads for each group; you can think of the resulting values as \(1000\) samples of the test statistic (\(\hat{p}\)) under the null hypothesis \(H_0: p = .5\), for a fixed sample size of \(n=100\). Draw the density plot of the proportion values; this represents the sampling distribution under \(H_0\). Calculate the 95% quantile and overlay it as a vertical line on the plot.
Suppose we want to test \(H_0: p \leq .5\) vs \(H_A: p>.5\) at significance level \(\alpha = 5\%\). For power calculations, we have to consider specific alternatives in \(H_A\). Assume we are interested in the alternative \(p=.6\) (i.e. an “effect size” of .6-.5=.1). Simulate a new set of flips under \(p=.6\), with 1000 repetitions of \(n=100\) samples, and overlay the resulting statistic density on the previous plot.
Estimate the power of the test, i.e. the probability that a sample under \(H_A: p=.6\) would reject \(H_0\) at \(\alpha = 5\%\). The power is estimated by the proportion of samples under \(p=.6\) that give a test statistic greater than the 95% quantile under \(H_0: p=.5\).
Repeat the previous calculation under the alternative \(p = .7\), i.e. for a larger effect size. What happens to the power?
Repeat the previous calculation for a sample size of \(n=50\) (and 1000 repetitions); i.e. calculate the power for a smaller sample size, keeping the same \(p=.7\). Note that you will have to recalculate the quantile under \(H_0:p=.5\) as well. What happens to the power?
Repeat the previous calculation for a significance level of \(\alpha = 1\%\), all other things being the same. What happens to the power?
[EXTRA] Assume you are testing \(H_0: p \leq .5\) vs \(H_A: p > .5\) with \(n=50\) and a test statistic cut-off of .6 (i.e. you reject \(H_0\) if \(\hat{p}>.6\)). From the 1000 samples you generated from \(H_0: p=.5\), what proportion reject the null \(H_0\)? How many independent samples would you have to check, on average, before you find a significant one?
LS0tDQp0aXRsZTogIlNUQUE1NyAtIFdvcmtTaGVldCAxMiINCmF1dGhvcjogJ05hbWU6ICAgICwgSUQjOiAgICcNCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KZWRpdG9yX29wdGlvbnM6IA0KICBjaHVua19vdXRwdXRfdHlwZTogaW5saW5lDQotLS0NCg0KKioqIA0KDQoqKkdvYWwqKjogRXhwbG9yZSB0b3BpY3MgaW4gaHlwb3RoZXNpcyB0ZXN0aW5nIChwb3dlciwgZWZmZWN0IHNpemUsIGRhdGEgc25vb3BpbmcpIHVzaW5nIHNpbXVsYXRpb24NCg0KYGBge3IsIGluY2x1ZGU9RkFMU0V9DQpsaWJyYXJ5KHRpZHl2ZXJzZSkNCmxmcyA9IHJlYWRfY3N2KCcuL2RhdGEvTEZTX1Rvcm9udG8uY3N2JykgDQpgYGANCg0KDQoNCjEuIFVzaW5nIExGUyBkYXRhLCB3ZSBmb3VuZCB0aGF0IHdvbWVuIGVhcm4gbGVzcyB0aGFuIG1lbiBvbiBhdmVyYWdlLCB3aXRoIGFuIGVmZmVjdCBzaXplIG9mICRkPTAuMzM2JC4gVmVyaWZ5IHRoZSB2YWx1ZSBvZiB0aGUgZWZmZWN0IHNpemUgZGlyZWN0bHkgZnJvbSB0aGUgZGF0YSB1c2luZyB0aGUgZm9ybXVsYQ0KJCRkID0gXGZyYWN7IFxoYXR7XG11fV8xIC0gXGhhdHtcbXV9XzIgfXsgU197cG9vbGVkfSB9ID0gXGZyYWN7IFxoYXR7XG11fV8xIC0gXGhhdHtcbXV9XzIgfXsgXHNxcnR7IFNeMl97cG9vbGVkfSB9IH0kJCANClRoZSBwb29sZWQgdmFyaWFuY2UgaXMgZ2l2ZW4gYnkNCiQkIFNeMl97cG9vbGVkfSA9IFxmcmFjeyBTXjJfMSAgKG5fMS0xKSArIFNeMl8yICAobl8yLTEpfXsobl8xLTEpICsgKG5fMi0xKX0gJCQgDQp3aGVyZSAkbl97MS8yfSwgU14yX3sxLzJ9JCBhcmUgdGhlIHNhbXBsZSBzaXplcyBhbmQgdmFyaWFuY2VzIG9mIHRoZSB0d28gZ3JvdXBzIChtZW4vd29tZW4pIHJlc3BlY3RpdmVseS4gICANCihIaW50OiBncm91cCB0aGUgcmVsZXZhbnQgZGF0YSBieSBgc2V4YCwgYXBwbHkgc3VtbWFyeSBmdW5jdGlvbnMgZm9yIG1lYW5zL3ZhcmlhbmNlcy9jb3VudHMsIGFuZCBjb21iaW5lIHRoZSByZXN1bHRpbmcgdmFsdWVzIGFjY29yZGluZ2x5KQ0KDQoqKioNCg0KV2Ugd2lsbCBub3cgZXhwbG9yZSB0aGUgbm90aW9uIG9mIHBvd2VyIHRocm91Z2ggYSBzaW11bGF0aW9uIGV4cGVyaW1lbnQuIFRoZSBmb2xsb3dpbmcgY29kZSBnZW5lcmF0ZXMgJDEwMCwwMDAkIGZsaXBzIG9mIGEgKmZhaXIqIGNvaW4sIGdyb3VwZWQgaW50byAxMDAwIHJlcGV0aXRpb25zIChgcmVwYCkgb2YgMTAwIGZsaXBzIGVhY2guIA0KDQpgYGB7cn0NCnNldC5zZWVkKDEyMykNCnNpbV9IMCA9IA0KICB0aWJibGUoIHJlcCA9IDE6MTAwMCApICU+JSANCiAgZ3JvdXBfYnkoIHJlcCApICU+JSANCiAgZG8oIHRpYmJsZSggZmxpcCA9IHNhbXBsZSggYygiSCIsIlQiKSwgc2l6ZSA9IDEwMCwgcmVwbGFjZSA9IFRSVUUpICkgKSAlPiUNCiAgdW5ncm91cCgpDQpnbGltcHNlKHNpbV9IMCkNCmBgYA0KDQoyLiBDYWxjdWxhdGUgdGhlICpwcm9wb3J0aW9uIG9mIGhlYWRzKiBmb3IgZWFjaCBncm91cDsgeW91IGNhbiB0aGluayBvZiB0aGUgcmVzdWx0aW5nIHZhbHVlcyBhcyAkMTAwMCQgc2FtcGxlcyBvZiB0aGUgdGVzdCBzdGF0aXN0aWMgKCRcaGF0e3B9JCkgdW5kZXIgdGhlIG51bGwgaHlwb3RoZXNpcyAkSF8wOiBwID0gLjUkLCBmb3IgYSBmaXhlZCBzYW1wbGUgc2l6ZSBvZiAkbj0xMDAkLiANCkRyYXcgdGhlIGRlbnNpdHkgcGxvdCBvZiB0aGUgcHJvcG9ydGlvbiB2YWx1ZXM7IHRoaXMgcmVwcmVzZW50cyB0aGUgc2FtcGxpbmcgZGlzdHJpYnV0aW9uIHVuZGVyICRIXzAkLiBDYWxjdWxhdGUgdGhlIDk1JSBxdWFudGlsZSBhbmQgb3ZlcmxheSBpdCBhcyBhIHZlcnRpY2FsIGxpbmUgb24gdGhlIHBsb3QuIA0KDQozLiBTdXBwb3NlIHdlIHdhbnQgdG8gdGVzdCAkSF8wOiBwIFxsZXEgLjUkIHZzICRIX0E6IHA+LjUkIGF0IHNpZ25pZmljYW5jZSBsZXZlbCAkXGFscGhhID0gNVwlJC4gRm9yIHBvd2VyIGNhbGN1bGF0aW9ucywgd2UgaGF2ZSB0byBjb25zaWRlciBzcGVjaWZpYyBhbHRlcm5hdGl2ZXMgaW4gJEhfQSQuIEFzc3VtZSB3ZSBhcmUgaW50ZXJlc3RlZCBpbiB0aGUgYWx0ZXJuYXRpdmUgJHA9LjYkIChpLmUuIGFuICJlZmZlY3Qgc2l6ZSIgb2YgLjYtLjU9LjEpLiBTaW11bGF0ZSBhICpuZXcqIHNldCBvZiBmbGlwcyB1bmRlciAkcD0uNiQsIHdpdGggMTAwMCByZXBldGl0aW9ucyBvZiAkbj0xMDAkIHNhbXBsZXMsIGFuZCBvdmVybGF5IHRoZSByZXN1bHRpbmcgc3RhdGlzdGljIGRlbnNpdHkgb24gdGhlIHByZXZpb3VzIHBsb3QuIA0KDQo0LiBFc3RpbWF0ZSB0aGUgKnBvd2VyKiBvZiB0aGUgdGVzdCwgaS5lLiB0aGUgcHJvYmFiaWxpdHkgdGhhdCBhIHNhbXBsZSB1bmRlciAkSF9BOiBwPS42JCAgd291bGQgKnJlamVjdCogJEhfMCQgYXQgJFxhbHBoYSA9IDVcJSQuIFRoZSBwb3dlciBpcyBlc3RpbWF0ZWQgYnkgdGhlIHByb3BvcnRpb24gb2Ygc2FtcGxlcyB1bmRlciAkcD0uNiQgdGhhdCBnaXZlIGEgdGVzdCBzdGF0aXN0aWMgZ3JlYXRlciB0aGFuIHRoZSA5NSUgcXVhbnRpbGUgdW5kZXIgJEhfMDogcD0uNSQuIA0KDQo1LiBSZXBlYXQgdGhlIHByZXZpb3VzIGNhbGN1bGF0aW9uIHVuZGVyIHRoZSBhbHRlcm5hdGl2ZSAkcCA9IC43JCwgaS5lLiBmb3IgYSAqbGFyZ2VyKiBlZmZlY3Qgc2l6ZS4gV2hhdCBoYXBwZW5zIHRvIHRoZSBwb3dlcj8NCg0KNi4gUmVwZWF0IHRoZSBwcmV2aW91cyBjYWxjdWxhdGlvbiBmb3IgYSBzYW1wbGUgc2l6ZSBvZiAkbj01MCQgKGFuZCAxMDAwIHJlcGV0aXRpb25zKTsgaS5lLiBjYWxjdWxhdGUgdGhlIHBvd2VyIGZvciBhICpzbWFsbGVyKiBzYW1wbGUgc2l6ZSwga2VlcGluZyB0aGUgc2FtZSAkcD0uNyQuIE5vdGUgdGhhdCB5b3Ugd2lsbCBoYXZlIHRvICpyZWNhbGN1bGF0ZSogdGhlIHF1YW50aWxlIHVuZGVyICRIXzA6cD0uNSQgYXMgd2VsbC4gV2hhdCBoYXBwZW5zIHRvIHRoZSBwb3dlcj8gIA0KDQo3LiBSZXBlYXQgdGhlIHByZXZpb3VzIGNhbGN1bGF0aW9uIGZvciBhIHNpZ25pZmljYW5jZSBsZXZlbCBvZiAkXGFscGhhID0gMVwlJCwgYWxsIG90aGVyIHRoaW5ncyBiZWluZyB0aGUgc2FtZS4gV2hhdCBoYXBwZW5zIHRvIHRoZSBwb3dlcj8gIA0KDQo4LiBbRVhUUkFdIEFzc3VtZSB5b3UgYXJlIHRlc3RpbmcgJEhfMDogcCBcbGVxIC41JCB2cyAkSF9BOiBwID4gLjUkIHdpdGggJG49NTAkIGFuZCBhIHRlc3Qgc3RhdGlzdGljIGN1dC1vZmYgb2YgLjYgKGkuZS4geW91IHJlamVjdCAkSF8wJCBpZiAkXGhhdHtwfT4uNiQpLiBGcm9tIHRoZSAxMDAwIHNhbXBsZXMgeW91IGdlbmVyYXRlZCBmcm9tICRIXzA6IHA9LjUkLCB3aGF0IHByb3BvcnRpb24gcmVqZWN0IHRoZSBudWxsICRIXzAkPyBIb3cgbWFueSBpbmRlcGVuZGVudCBzYW1wbGVzIHdvdWxkIHlvdSBoYXZlIHRvIGNoZWNrLCBvbiBhdmVyYWdlLCBiZWZvcmUgeW91IGZpbmQgYSBzaWduaWZpY2FudCBvbmU/DQoNCg==