Goal: Practice binary classification w/ thresholding, and measure classifier performance.

  1. Replicate the plot below, showing colored histograms of the 10 mean features (variables ending in .m).
  1. Based on the histograms, pick a variable that would make a good classifier feature (but not smoothness.m). Use the plot to find a good threshold value, and report the confusion matrix for your classifier.

  2. What is the accuracy and F1-measure of your classifier?

  3. Plot the ROC curve of your classifier.

  4. Overlay the ROC curves from thresholding the feature you selected and smoothness.m. Based on your plot, would you use smoothness.m or the feature variable you selected? (Hint: use ggroc( list(ROC_1, ROC_2) ), providing a list of roc() outputs)

  5. Find the best feature (out of all 30) for threshold classification, using the area under curve criterion (auc()). Plot the coloured histogram and the ROC curve of the best feature.

  6. Compare the ROC curves of the features you selected from the previous two parts. Which one would you use?

  7. Would you ever use a classifier that has TPR = .5 and FPR = .6? Justify your answer.

LS0tDQp0aXRsZTogIlNUQUE1NyAtIFdvcmtTaGVldCAxNyINCmF1dGhvcjogJ05hbWU6ICAgICwgSUQjOiAgICcNCmRhdGU6ICcgRHVlICcNCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KZWRpdG9yX29wdGlvbnM6IA0KICBjaHVua19vdXRwdXRfdHlwZTogaW5saW5lDQotLS0NCg0KKioqDQoNCioqR29hbCoqOiBQcmFjdGljZSBiaW5hcnkgY2xhc3NpZmljYXRpb24gdy8gdGhyZXNob2xkaW5nLCBhbmQgbWVhc3VyZSBjbGFzc2lmaWVyIHBlcmZvcm1hbmNlLg0KDQpgYGB7ciBzZXR1cCwgaW5jbHVkZT1GQUxTRX0NCmxpYnJhcnkodGlkeXZlcnNlKQ0Kd2RiYyA9IHJlYWRfY3N2KCJkYXRhL3dkYmMuY3N2IikNCmBgYA0KDQoNCjEuIFJlcGxpY2F0ZSB0aGUgcGxvdCBiZWxvdywgc2hvd2luZyBjb2xvcmVkIGhpc3RvZ3JhbXMgb2YgdGhlIDEwIG1lYW4gZmVhdHVyZXMgKHZhcmlhYmxlcyBlbmRpbmcgaW4gYC5tYCkuIA0KDQohW10oaW1nL2hpc3RvZ3JhbXMuUE5HKSANCg0KDQoyLiBCYXNlZCBvbiB0aGUgaGlzdG9ncmFtcywgcGljayBhIHZhcmlhYmxlIHRoYXQgd291bGQgbWFrZSBhIGdvb2QgY2xhc3NpZmllciBmZWF0dXJlIChidXQgKm5vdCogc21vb3RobmVzcy5tKS4gVXNlIHRoZSBwbG90IHRvIGZpbmQgYSBnb29kIHRocmVzaG9sZCB2YWx1ZSwgYW5kIHJlcG9ydCB0aGUgKmNvbmZ1c2lvbiBtYXRyaXgqIGZvciB5b3VyIGNsYXNzaWZpZXIuDQoNCjMuIFdoYXQgaXMgdGhlIGFjY3VyYWN5IGFuZCBGMS1tZWFzdXJlIG9mIHlvdXIgY2xhc3NpZmllcj8NCg0KNC4gUGxvdCB0aGUgUk9DIGN1cnZlIG9mIHlvdXIgY2xhc3NpZmllci4gDQoNCjUuICpPdmVybGF5KiB0aGUgUk9DIGN1cnZlcyBmcm9tIHRocmVzaG9sZGluZyB0aGUgZmVhdHVyZSB5b3Ugc2VsZWN0ZWQgYW5kIHNtb290aG5lc3MubS4gQmFzZWQgb24geW91ciBwbG90LCB3b3VsZCB5b3UgdXNlIHNtb290aG5lc3MubSBvciB0aGUgZmVhdHVyZSB2YXJpYWJsZSB5b3Ugc2VsZWN0ZWQ/DQooSGludDogdXNlIGBnZ3JvYyggbGlzdChST0NfMSwgUk9DXzIpIClgLCBwcm92aWRpbmcgYSBsaXN0IG9mIGByb2MoKWAgb3V0cHV0cykNCg0KNi4gRmluZCB0aGUgYmVzdCBmZWF0dXJlIChvdXQgb2YgYWxsIDMwKSBmb3IgdGhyZXNob2xkIGNsYXNzaWZpY2F0aW9uLCB1c2luZyB0aGUgKmFyZWEgdW5kZXIgY3VydmUqIGNyaXRlcmlvbiAoYGF1YygpYCkuIFBsb3QgdGhlIGNvbG91cmVkIGhpc3RvZ3JhbSBhbmQgdGhlIFJPQyBjdXJ2ZSBvZiB0aGUgYmVzdCBmZWF0dXJlLg0KDQo3LiBDb21wYXJlIHRoZSBST0MgY3VydmVzIG9mIHRoZSBmZWF0dXJlcyB5b3Ugc2VsZWN0ZWQgZnJvbSB0aGUgcHJldmlvdXMgdHdvIHBhcnRzLiBXaGljaCBvbmUgd291bGQgeW91IHVzZT8NCg0KOC4gV291bGQgeW91IGV2ZXIgdXNlIGEgY2xhc3NpZmllciB0aGF0IGhhcyBUUFIgPSAuNSBhbmQgRlBSID0gLjY/IEp1c3RpZnkgeW91ciBhbnN3ZXIuDQoNCg==