Goal: Practice multi-feature classification with logistic regression and classification trees.

You will work with the following toy data.

set.seed(123)
toy = tibble( X1 = rnorm(100), X2 = rnorm(100) ) %>% 
  mutate( actual = factor( X1^2 + X2^2 < 1.4 ) )
ggplot( toy, aes(x = X1, y = X2, col = actual )) + geom_point()

  1. Fit a logistic regression model to the toy data and report the confusion matrix.

  2. Plot the decision boundary of your linear classifier from the previous part. Use geom_abline(), with intercept and slope arguments given by the following representation of the linear decision boundary \[\beta_0 + \beta_1 x_1 + \beta_2 x_2 = 0 \Rightarrow x_2 = -\frac{\beta_0}{\beta_2} - \frac{\beta_1}{\beta_2} x_1\]

  3. The following code fits a classification tree to the data.

library(rpart)
rpart_out = rpart( actual ~ X1 + X2, data = toy, method = "class" )

Report the modelโ€™s confusion matrix.

  1. Plot the decision tree from the previous part.

  2. Find the predicted class for the point \((X1=.5, X2=1)\).

  3. Which of the following plots, if any, represents your decision tree?

  1. Create a new distance variable as follows: \(D = \sqrt{ X_1^2 + X_2^2}\). Create a colored histogram for the distance variable; what is the best classification accuracy you can achieve by thresholding it?

For the remainder, consider the same data but with completely random labels, with equal probability.

set.seed(1234)
toy = toy %>% 
  mutate( actual.rand = factor( sample( c(T,F), size = 100, replace = T ) ) )
ggplot(toy, aes(x = X1, y = X2, col = actual.rand) ) + geom_point()

  1. What should be the accuracy of any classifier that tries to predict these random labels?

  2. Report the accuracy of a) logistic regression, and b) classification tree, on the random labels. Which one has higher accuracy?

LS0tDQp0aXRsZTogIlNUQUE1NyAtIFdvcmtzaGVldCAxOCINCmF1dGhvcjogJ05hbWU6ICAgICwgSUQjOiAgICcNCmRhdGU6ICcgRHVlICcNCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KZWRpdG9yX29wdGlvbnM6IA0KICBjaHVua19vdXRwdXRfdHlwZTogaW5saW5lDQotLS0NCg0KKioqDQoNCioqR29hbCoqOiBQcmFjdGljZSBtdWx0aS1mZWF0dXJlIGNsYXNzaWZpY2F0aW9uIHdpdGggbG9naXN0aWMgcmVncmVzc2lvbiBhbmQgY2xhc3NpZmljYXRpb24gdHJlZXMuIA0KDQoNCmBgYHtyLCBpbmNsdWRlPUZBTFNFfQ0KbGlicmFyeSh0aWR5dmVyc2UpDQpsaWJyYXJ5KHJwYXJ0KQ0KYGBgDQoNCllvdSB3aWxsIHdvcmsgd2l0aCB0aGUgZm9sbG93aW5nIHRveSBkYXRhLiANCmBgYHtyfQ0Kc2V0LnNlZWQoMTIzKQ0KdG95ID0gdGliYmxlKCBYMSA9IHJub3JtKDEwMCksIFgyID0gcm5vcm0oMTAwKSApICU+JSANCiAgbXV0YXRlKCBhY3R1YWwgPSBmYWN0b3IoIFgxXjIgKyBYMl4yIDwgMS40ICkgKQ0KZ2dwbG90KCB0b3ksIGFlcyh4ID0gWDEsIHkgPSBYMiwgY29sID0gYWN0dWFsICkpICsgZ2VvbV9wb2ludCgpDQpgYGANCg0KMS4gRml0IGEgbG9naXN0aWMgcmVncmVzc2lvbiBtb2RlbCB0byB0aGUgdG95IGRhdGEgYW5kIHJlcG9ydCB0aGUgY29uZnVzaW9uIG1hdHJpeC4gIA0KDQoyLiBQbG90IHRoZSBkZWNpc2lvbiBib3VuZGFyeSBvZiB5b3VyIGxpbmVhciBjbGFzc2lmaWVyIGZyb20gdGhlIHByZXZpb3VzIHBhcnQuIFVzZSBgZ2VvbV9hYmxpbmUoKWAsIHdpdGggKmludGVyY2VwdCogYW5kICpzbG9wZSogYXJndW1lbnRzIGdpdmVuIGJ5IHRoZSBmb2xsb3dpbmcgcmVwcmVzZW50YXRpb24gb2YgdGhlIGxpbmVhciBkZWNpc2lvbiBib3VuZGFyeSAkJFxiZXRhXzAgKyBcYmV0YV8xIHhfMSArICBcYmV0YV8yIHhfMiA9IDAgXFJpZ2h0YXJyb3cgIHhfMiA9ICAtXGZyYWN7XGJldGFfMH17XGJldGFfMn0gLSBcZnJhY3tcYmV0YV8xfXtcYmV0YV8yfSB4XzEkJA0KDQozLiBUaGUgZm9sbG93aW5nIGNvZGUgZml0cyBhIGNsYXNzaWZpY2F0aW9uIHRyZWUgdG8gdGhlIGRhdGEuDQpgYGB7cn0NCmxpYnJhcnkocnBhcnQpDQpycGFydF9vdXQgPSBycGFydCggYWN0dWFsIH4gWDEgKyBYMiwgZGF0YSA9IHRveSwgbWV0aG9kID0gImNsYXNzIiApDQpgYGANClJlcG9ydCB0aGUgbW9kZWwncyBjb25mdXNpb24gbWF0cml4Lg0KDQo0LiBQbG90IHRoZSBkZWNpc2lvbiB0cmVlIGZyb20gdGhlIHByZXZpb3VzIHBhcnQuDQoNCjUuIEZpbmQgdGhlIHByZWRpY3RlZCBjbGFzcyBmb3IgdGhlIHBvaW50ICQoWDE9LjUsIFgyPTEpJC4NCg0KNi4gV2hpY2ggb2YgdGhlIGZvbGxvd2luZyBwbG90cywgaWYgYW55LCByZXByZXNlbnRzIHlvdXIgZGVjaXNpb24gdHJlZT8NCg0KIVtdKGltZy90b3lfdHJlZS5QTkcpDQoNCjcuIENyZWF0ZSBhIG5ldyAqZGlzdGFuY2UqIHZhcmlhYmxlIGFzIGZvbGxvd3M6ICREID0gXHNxcnR7IFhfMV4yICsgWF8yXjJ9JC4gQ3JlYXRlIGEgY29sb3JlZCBoaXN0b2dyYW0gZm9yIHRoZSBkaXN0YW5jZSB2YXJpYWJsZTsgd2hhdCBpcyB0aGUgYmVzdCBjbGFzc2lmaWNhdGlvbiBhY2N1cmFjeSB5b3UgY2FuIGFjaGlldmUgYnkgdGhyZXNob2xkaW5nIGl0Pw0KDQoqKiogDQpGb3IgdGhlIHJlbWFpbmRlciwgY29uc2lkZXIgdGhlIHNhbWUgZGF0YSBidXQgd2l0aCBjb21wbGV0ZWx5ICpyYW5kb20gbGFiZWxzKiwgd2l0aCBlcXVhbCBwcm9iYWJpbGl0eS4NCg0KYGBge3J9DQpzZXQuc2VlZCgxMjM0KQ0KdG95ID0gdG95ICU+JSANCiAgbXV0YXRlKCBhY3R1YWwucmFuZCA9IGZhY3Rvciggc2FtcGxlKCBjKFQsRiksIHNpemUgPSAxMDAsIHJlcGxhY2UgPSBUICkgKSApDQpnZ3Bsb3QodG95LCBhZXMoeCA9IFgxLCB5ID0gWDIsIGNvbCA9IGFjdHVhbC5yYW5kKSApICsgZ2VvbV9wb2ludCgpDQpgYGANCg0KDQo4LiBXaGF0IHNob3VsZCBiZSB0aGUgYWNjdXJhY3kgb2YgKmFueSogY2xhc3NpZmllciB0aGF0IHRyaWVzIHRvIHByZWRpY3QgdGhlc2UgcmFuZG9tIGxhYmVscz8NCg0KOS4gUmVwb3J0IHRoZSBhY2N1cmFjeSBvZiBhKSBsb2dpc3RpYyByZWdyZXNzaW9uLCBhbmQgYikgY2xhc3NpZmljYXRpb24gdHJlZSwgb24gdGhlIHJhbmRvbSBsYWJlbHMuIFdoaWNoIG9uZSBoYXMgaGlnaGVyIGFjY3VyYWN5Pw0KDQoNCg0K