Goal: Practice identifying causal and spurious relationships (confounding, Simpson’s paradox) with simulated and real data.

library(tidyverse)

Migraine & acupuncture case study (adapted from ISRS) A migraine is a particularly painful type of headache, which patients sometimes wish to treat with acupuncture. To determine whether acupuncture relieves migraine pain, researchers conducted a randomized controlled study where 89 females diagnosed with migraine headaches were randomly assigned to one of two groups: treatment or control. 43 patients in the treatment group received acupuncture that is specifically designed to treat migraines. 46 patients in the control group received placebo acupuncture (needle insertion at nonacupoint locations). 24 hours after patients received acupuncture, they were asked if they were pain free. Results are summarized in the contingency table below

(migraine = tibble(
  group = c("treatment", "control", "treatment", "control"),
  pain_free = c("yes", "yes", "no", "no"),
  n_cases= c( 10, 2, 33, 44) ))
  1. What percent of patients in the treatment group were pain-free 24hrs after receiving accupuncture? What percent in the control group?

  2. Replicate the barplot below. Does acupuncture look like an effective treatment for migraines?

  1. Perform a hypothesis test to check if acupuncture actually reduces migraines at the 5% significance level. Based on the study description, do you believe acupuncture has a causal effect on migraines? Justify your answer.
    (Hint: use coin::independence_test() with weights = ~ n_cases argument for defining the number of cases for each variable combination.)

Kidney stone data ( Brit. Med. Journal) Charig et al undertook a comparison of historical success rates in removing kidney stones. Open surgery (treatment A) had a success rate of 78% (273/350), while percutaneous nephrolithotomy (treatment B) had a success rate of 83% (289/350), an improvement over the use of open surgery. However, the success rates looked rather different when stone size (small/large; based on diameter </> 2 cm) was taken into account. The data are given below.

(kidney = tibble( 
  size = factor( c( rep("large",4), rep("small",4) ) ),
  treatment = factor( rep( c( "A", "A", "B", "B"), 2 ) ), 
  outcome = factor( rep( c("failure", "success"), 4 ) ), 
  n_cases= c(71, 192, 25, 55, 6, 81, 36, 234) ))
  1. Verify the overall success rates for treatments A and B (78% and 83%, respectively).

  2. Recreate the barplots below, showing the number of cases and success rates for different treatments and stone sizes. Which treatment has the highest success rate for both stone size?

  1. Explain (in words) why treatment A has lower overall success rate, even though it has higher individual success rates for either stone size.

  2. Based on the study desing, do you believe that treatment A should always be prescribed instead of treatment B? Justify your answer.


Consider a model with three binary (TRUE/FALSE) variables \(X, Y, Z\), where: - \(Z\)=TRUE with probability (w.p.) 50% + when \(Z\)=FALSE, then \(X\)=TRUE w.p. 25% and \(Y\)=TRUE w.p. 75% + when \(Z\)=TRUE, then \(X\)=TRUE w.p. 75% and \(Y\)=TRUE w.p. 25% The code below generates 100 observations from this model:

N = 200
sim = 
  tibble( Z = runif(N), X = runif(N), Y = runif(N)) %>% 
  mutate( Z = ifelse( Z <.5, TRUE, FALSE) ) %>% 
  mutate( X = ifelse( X < .25 + Z*.5, TRUE, FALSE) ) %>% 
  mutate( Y = ifelse( Y < .75 - Z*.5, TRUE, FALSE) ) %>% 
  mutate_all( factor )
  1. Consider the relationship between the two pairs of variables \(Z,Y\) and \(X,Y\). Based on the model, which pair is causaly related?

  2. Test the pairs for independence using coin::independence_test(). Is there evidence of a relationship between the pairs?

  3. Repeat the previous tests within each level of the non-pair variable. I.e. test the independence of pair \(X,Y\) based on \(Z\)=TRUE data only, and based on \(Z\)=FALSE data only. Do the same thing for the independence of pair \(Z,Y\) based on \(X\)=TRUE and \(X\)=FALSE, i.e. run 4 tests in total. Describe which relationship is signifinant and which is not. Based on your results, which variable would you say is a confounder? Justify your answer.
    (Hint: use the subset argument in independence_test())


  1. For each of the following relationships, try to identify a potential confounding variable.
  1. [Extra] Read the following article on smoking and cancer
LS0tDQp0aXRsZTogIlNUQUE1NyAtIFdvcmtTaGVldCAxNiINCmF1dGhvcjogJ05hbWU6ICAgICwgSUQjOiAgICcNCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KZWRpdG9yX29wdGlvbnM6IA0KICBjaHVua19vdXRwdXRfdHlwZTogaW5saW5lDQotLS0NCg0KKioqDQoNCioqR29hbCoqOiBQcmFjdGljZSBpZGVudGlmeWluZyBjYXVzYWwgYW5kIHNwdXJpb3VzIHJlbGF0aW9uc2hpcHMgKGNvbmZvdW5kaW5nLCBTaW1wc29uJ3MgcGFyYWRveCkgd2l0aCBzaW11bGF0ZWQgYW5kIHJlYWwgZGF0YS4NCg0KDQpgYGB7cn0NCmxpYnJhcnkodGlkeXZlcnNlKQ0KYGBgDQoNCg0KKioqIA0KDQoqKk1pZ3JhaW5lICYgYWN1cHVuY3R1cmUgY2FzZSBzdHVkeSoqIChhZGFwdGVkIGZyb20gW0lTUlNdKGh0dHBzOi8vd3d3Lm9wZW5pbnRyby5vcmcvc3RhdC90ZXh0Ym9vay5waHA/c3RhdF9ib29rPWlzcnMpKQ0KQSBtaWdyYWluZSBpcyBhIHBhcnRpY3VsYXJseSBwYWluZnVsIHR5cGUgb2YgaGVhZGFjaGUsIHdoaWNoIHBhdGllbnRzIHNvbWV0aW1lcyB3aXNoIHRvIHRyZWF0IHdpdGggYWN1cHVuY3R1cmUuIFRvIGRldGVybWluZSB3aGV0aGVyIGFjdXB1bmN0dXJlIHJlbGlldmVzIG1pZ3JhaW5lIHBhaW4sIHJlc2VhcmNoZXJzIGNvbmR1Y3RlZCBhICpyYW5kb21pemVkIGNvbnRyb2xsZWQgc3R1ZHkqIHdoZXJlIDg5IGZlbWFsZXMgZGlhZ25vc2VkDQp3aXRoIG1pZ3JhaW5lIGhlYWRhY2hlcyB3ZXJlIHJhbmRvbWx5IGFzc2lnbmVkIHRvIG9uZSBvZiB0d28gZ3JvdXBzOiB0cmVhdG1lbnQgb3IgY29udHJvbC4gNDMgcGF0aWVudHMgaW4gdGhlIHRyZWF0bWVudCBncm91cCByZWNlaXZlZCBhY3VwdW5jdHVyZSB0aGF0IGlzIHNwZWNpZmljYWxseSBkZXNpZ25lZCB0byB0cmVhdCBtaWdyYWluZXMuIDQ2IHBhdGllbnRzIGluIHRoZSBjb250cm9sIGdyb3VwIHJlY2VpdmVkIHBsYWNlYm8gYWN1cHVuY3R1cmUgKG5lZWRsZSBpbnNlcnRpb24gYXQgbm9uYWN1cG9pbnQgbG9jYXRpb25zKS4gMjQgaG91cnMgYWZ0ZXIgcGF0aWVudHMgcmVjZWl2ZWQgYWN1cHVuY3R1cmUsIHRoZXkgd2VyZSBhc2tlZCBpZiB0aGV5IHdlcmUgcGFpbiBmcmVlLiBSZXN1bHRzIGFyZSBzdW1tYXJpemVkIGluIHRoZSBjb250aW5nZW5jeSB0YWJsZSBiZWxvdw0KDQpgYGB7cn0NCihtaWdyYWluZSA9IHRpYmJsZSgNCiAgZ3JvdXAgPSBjKCJ0cmVhdG1lbnQiLCAiY29udHJvbCIsICJ0cmVhdG1lbnQiLCAiY29udHJvbCIpLA0KICBwYWluX2ZyZWUgPSBjKCJ5ZXMiLCAieWVzIiwgIm5vIiwgIm5vIiksDQogIG5fY2FzZXM9IGMoIDEwLCAyLCAzMywgNDQpICkpDQpgYGANCg0KDQoxLiBXaGF0IHBlcmNlbnQgb2YgcGF0aWVudHMgKmluIHRoZSB0cmVhdG1lbnQqIGdyb3VwIHdlcmUgcGFpbi1mcmVlIDI0aHJzIGFmdGVyIHJlY2VpdmluZyBhY2N1cHVuY3R1cmU/IFdoYXQgcGVyY2VudCAqaW4gdGhlIGNvbnRyb2wqIGdyb3VwPyANCg0KDQoyLiBSZXBsaWNhdGUgdGhlIGJhcnBsb3QgYmVsb3cuIERvZXMgYWN1cHVuY3R1cmUgbG9vayBsaWtlIGFuIGVmZmVjdGl2ZSB0cmVhdG1lbnQgZm9yIG1pZ3JhaW5lcz8gDQoNCiFbXSguL2ltZy9hY3VwdW5jdHVyZS5QTkcpDQoNCjMuIFBlcmZvcm0gYSBoeXBvdGhlc2lzIHRlc3QgdG8gY2hlY2sgaWYgYWN1cHVuY3R1cmUgYWN0dWFsbHkgcmVkdWNlcyBtaWdyYWluZXMgYXQgdGhlIDUlIHNpZ25pZmljYW5jZSBsZXZlbC4gQmFzZWQgb24gdGhlIHN0dWR5IGRlc2NyaXB0aW9uLCBkbyB5b3UgYmVsaWV2ZSBhY3VwdW5jdHVyZSBoYXMgYSAqY2F1c2FsKiBlZmZlY3Qgb24gbWlncmFpbmVzPyBKdXN0aWZ5IHlvdXIgYW5zd2VyLiAgICAgICAgICANCihIaW50OiB1c2UgYGNvaW46OmluZGVwZW5kZW5jZV90ZXN0KClgIHdpdGggYHdlaWdodHMgPSB+IG5fY2FzZXNgIGFyZ3VtZW50IGZvciBkZWZpbmluZyB0aGUgbnVtYmVyIG9mIGNhc2VzIGZvciBlYWNoIHZhcmlhYmxlIGNvbWJpbmF0aW9uLikNCg0KDQoqKioNCg0KKipLaWRuZXkgc3RvbmUgZGF0YSoqICggW0JyaXQuIE1lZC4gSm91cm5hbF0oaHR0cHM6Ly93d3cubmNiaS5ubG0ubmloLmdvdi9wdWJtZWQvMzA4MzkyMikpDQpDaGFyaWcgZXQgYWwgdW5kZXJ0b29rIGEgY29tcGFyaXNvbiBvZiAqaGlzdG9yaWNhbCogc3VjY2VzcyByYXRlcyBpbiByZW1vdmluZyBraWRuZXkgc3RvbmVzLiBPcGVuIHN1cmdlcnkgKCp0cmVhdG1lbnQgQSopIGhhZCBhIHN1Y2Nlc3MgcmF0ZSBvZiA3OCUgKDI3My8zNTApLCB3aGlsZSBwZXJjdXRhbmVvdXMgbmVwaHJvbGl0aG90b215ICgqdHJlYXRtZW50IEIqKSBoYWQgYSBzdWNjZXNzIHJhdGUgb2YgODMlICgyODkvMzUwKSwgYW4gaW1wcm92ZW1lbnQgb3ZlciB0aGUgdXNlIG9mIG9wZW4gc3VyZ2VyeS4gSG93ZXZlciwgdGhlIHN1Y2Nlc3MgcmF0ZXMgbG9va2VkIHJhdGhlciBkaWZmZXJlbnQgd2hlbiAqc3RvbmUgc2l6ZSogKHNtYWxsL2xhcmdlOyBiYXNlZCBvbiBkaWFtZXRlciA8Lz4gMiBjbSkgd2FzIHRha2VuIGludG8gYWNjb3VudC4gVGhlIGRhdGEgYXJlIGdpdmVuIGJlbG93Lg0KDQpgYGB7cn0NCihraWRuZXkgPSB0aWJibGUoIA0KICBzaXplID0gZmFjdG9yKCBjKCByZXAoImxhcmdlIiw0KSwgcmVwKCJzbWFsbCIsNCkgKSApLA0KICB0cmVhdG1lbnQgPSBmYWN0b3IoIHJlcCggYyggIkEiLCAiQSIsICJCIiwgIkIiKSwgMiApICksIA0KICBvdXRjb21lID0gZmFjdG9yKCByZXAoIGMoImZhaWx1cmUiLCAic3VjY2VzcyIpLCA0ICkgKSwgDQogIG5fY2FzZXM9IGMoNzEsIDE5MiwgMjUsIDU1LCA2LCA4MSwgMzYsIDIzNCkgKSkNCmBgYA0KDQo0LiBWZXJpZnkgdGhlIG92ZXJhbGwgc3VjY2VzcyByYXRlcyBmb3IgdHJlYXRtZW50cyBBIGFuZCBCICg3OCUgYW5kIDgzJSwgcmVzcGVjdGl2ZWx5KS4NCg0KDQo1LiBSZWNyZWF0ZSB0aGUgYmFycGxvdHMgYmVsb3csIHNob3dpbmcgdGhlIG51bWJlciBvZiBjYXNlcyBhbmQgc3VjY2VzcyByYXRlcyBmb3IgZGlmZmVyZW50IHRyZWF0bWVudHMgYW5kIHN0b25lIHNpemVzLiBXaGljaCB0cmVhdG1lbnQgaGFzIHRoZSBoaWdoZXN0IHN1Y2Nlc3MgcmF0ZSBmb3IgKmJvdGgqIHN0b25lIHNpemU/DQoNCiFbXSguL2ltZy9raWRuZXlfY2FzZXMuUE5HKQ0KIVtdKC4vaW1nL2tpZG5leV9yYXRlcy5QTkcpDQoNCg0KNi4gRXhwbGFpbiAoaW4gd29yZHMpIHdoeSB0cmVhdG1lbnQgQSBoYXMgKmxvd2VyKiBvdmVyYWxsIHN1Y2Nlc3MgcmF0ZSwgZXZlbiB0aG91Z2ggaXQgaGFzICpoaWdoZXIqIGluZGl2aWR1YWwgc3VjY2VzcyByYXRlcyBmb3IgZWl0aGVyIHN0b25lIHNpemUuIA0KDQo3LiBCYXNlZCBvbiB0aGUgc3R1ZHkgZGVzaW5nLCBkbyB5b3UgYmVsaWV2ZSB0aGF0IHRyZWF0bWVudCBBIHNob3VsZCBhbHdheXMgYmUgcHJlc2NyaWJlZCBpbnN0ZWFkIG9mIHRyZWF0bWVudCBCPyBKdXN0aWZ5IHlvdXIgYW5zd2VyLg0KDQoqKioNCg0KQ29uc2lkZXIgYSBtb2RlbCB3aXRoIHRocmVlICpiaW5hcnkqIChUUlVFL0ZBTFNFKSB2YXJpYWJsZXMgJFgsIFksIFokLCB3aGVyZToNCi0gJFokPVRSVUUgd2l0aCBwcm9iYWJpbGl0eSAody5wLikgNTAlIA0KICAgICsgd2hlbiAkWiQ9RkFMU0UsIHRoZW4gJFgkPVRSVUUgdy5wLiAyNSUgYW5kICRZJD1UUlVFIHcucC4gNzUlIA0KICAgICsgd2hlbiAkWiQ9VFJVRSwgdGhlbiAkWCQ9VFJVRSB3LnAuIDc1JSBhbmQgJFkkPVRSVUUgdy5wLiAyNSUgDQpUaGUgY29kZSBiZWxvdyBnZW5lcmF0ZXMgMTAwIG9ic2VydmF0aW9ucyBmcm9tIHRoaXMgbW9kZWw6ICANCg0KYGBge3J9DQpOID0gMjAwDQpzaW0gPSANCiAgdGliYmxlKCBaID0gcnVuaWYoTiksIFggPSBydW5pZihOKSwgWSA9IHJ1bmlmKE4pKSAlPiUgDQogIG11dGF0ZSggWiA9IGlmZWxzZSggWiA8LjUsIFRSVUUsIEZBTFNFKSApICU+JSANCiAgbXV0YXRlKCBYID0gaWZlbHNlKCBYIDwgLjI1ICsgWiouNSwgVFJVRSwgRkFMU0UpICkgJT4lIA0KICBtdXRhdGUoIFkgPSBpZmVsc2UoIFkgPCAuNzUgLSBaKi41LCBUUlVFLCBGQUxTRSkgKSAlPiUgDQogIG11dGF0ZV9hbGwoIGZhY3RvciApDQpgYGANCg0KOC4gQ29uc2lkZXIgdGhlIHJlbGF0aW9uc2hpcCBiZXR3ZWVuIHRoZSB0d28gcGFpcnMgb2YgdmFyaWFibGVzICRaLFkkIGFuZCAkWCxZJC4gQmFzZWQgb24gdGhlIG1vZGVsLCB3aGljaCBwYWlyIGlzIGNhdXNhbHkgcmVsYXRlZD8gDQoNCjkuIFRlc3QgdGhlIHBhaXJzIGZvciBpbmRlcGVuZGVuY2UgdXNpbmcgYGNvaW46OmluZGVwZW5kZW5jZV90ZXN0KClgLiBJcyB0aGVyZSBldmlkZW5jZSBvZiBhIHJlbGF0aW9uc2hpcCBiZXR3ZWVuIHRoZSBwYWlycz8gDQoNCg0KDQo5LiBSZXBlYXQgdGhlIHByZXZpb3VzIHRlc3RzICp3aXRoaW4qIGVhY2ggbGV2ZWwgb2YgdGhlIG5vbi1wYWlyIHZhcmlhYmxlLiBJLmUuIHRlc3QgdGhlIGluZGVwZW5kZW5jZSBvZiBwYWlyICRYLFkkIGJhc2VkIG9uICRaJD1UUlVFIGRhdGEgb25seSwgYW5kIGJhc2VkIG9uICRaJD1GQUxTRSBkYXRhIG9ubHkuIERvIHRoZSBzYW1lIHRoaW5nIGZvciB0aGUgaW5kZXBlbmRlbmNlIG9mIHBhaXIgJFosWSQgYmFzZWQgb24gJFgkPVRSVUUgYW5kICRYJD1GQUxTRSwgaS5lLiBydW4gNCB0ZXN0cyBpbiB0b3RhbC4gRGVzY3JpYmUgd2hpY2ggcmVsYXRpb25zaGlwIGlzIHNpZ25pZmluYW50IGFuZCB3aGljaCBpcyBub3QuIEJhc2VkIG9uIHlvdXIgcmVzdWx0cywgd2hpY2ggdmFyaWFibGUgd291bGQgeW91IHNheSBpcyBhICpjb25mb3VuZGVyKj8gSnVzdGlmeSB5b3VyIGFuc3dlci4gICAgDQooSGludDogdXNlIHRoZSBgc3Vic2V0YCBhcmd1bWVudCBpbiBgaW5kZXBlbmRlbmNlX3Rlc3QoKWApDQoNCg0KKioqDQoNCg0KMTAuIEZvciBlYWNoIG9mIHRoZSBmb2xsb3dpbmcgcmVsYXRpb25zaGlwcywgdHJ5IHRvIGlkZW50aWZ5IGEgcG90ZW50aWFsIGNvbmZvdW5kaW5nIHZhcmlhYmxlLg0KDQorICJTdHVkaWVzIGhhdmUgc2hvd24gYSByZWxhdGlvbiBiZXR3ZWVuIGthbGUgY29uc3VtcHRpb24gYW5kIHJlZHVjZWQgY2FuY2VyIHJhdGVzIi4NCg0KDQorICJUaGVyZSBpcyBhbiBhc3NvY2lhdGlvbiBiZXR3ZWVuIHVyYmFuaXphdGlvbiAodXJiYW4gJSkgYW5kIHR5cGUtMiBkaWFiZXRlcyBpbiBkaWZmZXJlbnQgY291bnRyaWVzIi4NCg0KDQorICJTaG9lIHNpemUgaXMgYW4gaW5kaWNhdG9yIG9mIHJlYWRpbmcgcGVyZm9ybWFuY2UgZm9yIGVsZW1lbnRhcnkgc2Nob29sIGNoaWxkcmVuIg0KDQoNCg0KMTEuIFtFeHRyYV0gUmVhZCB0aGUgZm9sbG93aW5nIGFydGljbGUgb24gW3Ntb2tpbmcgYW5kIGNhbmNlcl0oaHR0cHM6Ly9wcmljZW9ub21pY3MuY29tL3doeS10aGUtZmF0aGVyLW9mLW1vZGVybi1zdGF0aXN0aWNzLWRpZG50LWJlbGlldmUvKQ==