Goal: Practice working with string data and regular expressions.

Use the dinesafe data for all questions.

library(tidyverse)
dinesafe = read_csv("../data/dinesafe.csv")
  1. Find all distinct establishments that contain the word “kebab” or its variations “kabob”, “kabab”, “kebob” in their name. Use a single regular expression for all variations, and consider both cases (upper & lower).

  2. Find all distinct establishments that contain the exact word “pho” in their name (in either case). Ensure your regular expression excludes words that just contain the string “pho”, such as “chophouse”.

Some of the INFRACTION_DETAILS entries contain a reference to the relevant regulation that was broken. It typically appears at the end of the string and looks like this “o. reg 562/90 sec. 74(c)”. This refers to Ontario Regulation 562/1990 revision, section 74(c), which you can view at https://www.ontario.ca/laws/regulation/900562.

  1. Find what proportion of infraction details contain a reference to Ontario Regulation 562/90. You will have to formulate a regular expression that detects the presence of such a reference.

  2. Extract the relevant section number for all infractions that refer to Ontario regulation 562/90. Use this information to create a frequency barplot of the number of infractions by section.

  3. For each distinct ESTABLISHMENT_ID, combine all non-NA INFRACTION_DETAILS into a single string. Then count the number of times the word “contamination” appears in each establishment’s combined infraction details. Create a barplot of the top-10 contamination offenders (i.e. establishment name vs number of occurances of word “contamination”).

LS0tDQp0aXRsZTogIlNUQUE1NyAtIFdvcmtTaGVldCA3Ig0KYXV0aG9yOiAnTmFtZTogICAgLCBJRCM6ICAgJw0Kb3V0cHV0OiBodG1sX25vdGVib29rDQotLS0NCg0KYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0V9DQpsaWJyYXJ5KHRpZHl2ZXJzZSkNCmxpYnJhcnkoZ2FwbWluZGVyKQ0KYGBgDQoNCioqR29hbCoqOiBQcmFjdGljZSB3b3JraW5nIHdpdGggc3RyaW5nIGRhdGEgYW5kIHJlZ3VsYXIgZXhwcmVzc2lvbnMuDQoNCg0KVXNlIHRoZSAqZGluZXNhZmUqIGRhdGEgZm9yIGFsbCBxdWVzdGlvbnMuIA0KYGBge3J9DQpsaWJyYXJ5KHRpZHl2ZXJzZSkNCmRpbmVzYWZlID0gcmVhZF9jc3YoIi4uL2RhdGEvZGluZXNhZmUuY3N2IikNCmBgYA0KDQoNCjEuIEZpbmQgYWxsIGRpc3RpbmN0IGVzdGFibGlzaG1lbnRzIHRoYXQgY29udGFpbiB0aGUgd29yZCAia2ViYWIiIG9yIGl0cyB2YXJpYXRpb25zICJrYWJvYiIsICJrYWJhYiIsICJrZWJvYiIgaW4gdGhlaXIgbmFtZS4gVXNlIGEgKnNpbmdsZSByZWd1bGFyIGV4cHJlc3Npb24qIGZvciBhbGwgdmFyaWF0aW9ucywgYW5kIGNvbnNpZGVyIGJvdGggY2FzZXMgKHVwcGVyICYgbG93ZXIpLg0KDQoNCjIuIEZpbmQgYWxsIGRpc3RpbmN0IGVzdGFibGlzaG1lbnRzIHRoYXQgY29udGFpbiB0aGUgKmV4YWN0IHdvcmQqICJwaG8iIGluIHRoZWlyIG5hbWUgKGluIGVpdGhlciBjYXNlKS4gRW5zdXJlIHlvdXIgcmVndWxhciBleHByZXNzaW9uIGV4Y2x1ZGVzIHdvcmRzIHRoYXQganVzdCAqY29udGFpbiogdGhlIHN0cmluZyAicGhvIiwgc3VjaCBhcyAiY2hvKipwaG8qKnVzZSIuIA0KDQpTb21lIG9mIHRoZSBgSU5GUkFDVElPTl9ERVRBSUxTYCBlbnRyaWVzIGNvbnRhaW4gYSByZWZlcmVuY2UgdG8gdGhlIHJlbGV2YW50IHJlZ3VsYXRpb24gdGhhdCB3YXMgYnJva2VuLiBJdCB0eXBpY2FsbHkgYXBwZWFycyBhdCB0aGUgZW5kIG9mIHRoZSBzdHJpbmcgYW5kIGxvb2tzIGxpa2UgdGhpcyAiby4gcmVnICA1NjIvOTAgc2VjLiA3NChjKSIuIFRoaXMgcmVmZXJzIHRvIE9udGFyaW8gUmVndWxhdGlvbiA1NjIvMTk5MCByZXZpc2lvbiwgc2VjdGlvbiA3NChjKSwgd2hpY2ggeW91IGNhbiB2aWV3IGF0IGh0dHBzOi8vd3d3Lm9udGFyaW8uY2EvbGF3cy9yZWd1bGF0aW9uLzkwMDU2Mi4NCg0KDQozLiBGaW5kIHdoYXQgKnByb3BvcnRpb24qIG9mIGluZnJhY3Rpb24gZGV0YWlscyBjb250YWluIGEgcmVmZXJlbmNlIHRvIE9udGFyaW8gUmVndWxhdGlvbiA1NjIvOTAuIFlvdSB3aWxsIGhhdmUgdG8gIGZvcm11bGF0ZSBhIHJlZ3VsYXIgZXhwcmVzc2lvbiB0aGF0IGRldGVjdHMgdGhlIHByZXNlbmNlIG9mIHN1Y2ggYSByZWZlcmVuY2UuIA0KDQoNCjQuIEV4dHJhY3QgdGhlIHJlbGV2YW50ICpzZWN0aW9uIG51bWJlciogZm9yIGFsbCBpbmZyYWN0aW9ucyB0aGF0IHJlZmVyIHRvIE9udGFyaW8gcmVndWxhdGlvbiA1NjIvOTAuIFVzZSB0aGlzIGluZm9ybWF0aW9uIHRvIGNyZWF0ZSBhIGZyZXF1ZW5jeSBiYXJwbG90IG9mIHRoZSBudW1iZXIgb2YgaW5mcmFjdGlvbnMgYnkgc2VjdGlvbi4NCg0KDQo1LiBGb3IgZWFjaCBkaXN0aW5jdCBgRVNUQUJMSVNITUVOVF9JRGAsIGNvbWJpbmUgYWxsIG5vbi1OQSBgSU5GUkFDVElPTl9ERVRBSUxTYCBpbnRvIGEgc2luZ2xlIHN0cmluZy4gVGhlbiBjb3VudCB0aGUgbnVtYmVyIG9mIHRpbWVzIHRoZSB3b3JkICJjb250YW1pbmF0aW9uIiBhcHBlYXJzIGluIGVhY2ggZXN0YWJsaXNobWVudCdzIGNvbWJpbmVkIGluZnJhY3Rpb24gZGV0YWlscy4gQ3JlYXRlIGEgYmFycGxvdCBvZiB0aGUgdG9wLTEwIGNvbnRhbWluYXRpb24gb2ZmZW5kZXJzIChpLmUuIGVzdGFibGlzaG1lbnQgbmFtZSB2cyBudW1iZXIgb2Ygb2NjdXJhbmNlcyBvZiB3b3JkICJjb250YW1pbmF0aW9uIikuIA0KDQo=