Goal: Practice with simple linear regression.
Natural Resources Canada publishes fuel consumption and CO2 emissions ratings for all cars sold in Canada; you can find this information on new car stickers like this one:
The relevant data are available through the Gov’t of Canada’s Open Data portal. The purpose of this exercise is to model fuel consumption based on engine characteristics. You can load the data using:
library(tidyverse)
fcr = read_csv("../data/2018 Fuel Consumption Ratings.csv") %>%
mutate( FUEL = factor( FUEL, levels = c("X", "Z", "D", "E", "N"),
labels = c("Regular", "Premium", "Diesel", "Ethanol", "Nat Gas")) )
You will try to model fuel consumption based on engine size.
- Recreate the following plot.
- Keep only vehicles that use gasoline (either Regular or Premium).
fcr_gsln = fcr %>% filter( FUEL %in% c("Regular", "Premium"))
Fit a linear regression model for CO2 emissions (in gr/km) based on engine size (in L). Create a scatteplot of the data together with the regression line.
- Check the model parameters to answer the following questions:
- How much do CO2 emissions increase, on average, for an additional litre of engine size?
- What percent of CO2 emissions variability can be attributed to differences in engine size?
- What is the strandard deviation of the errors around the regression line?
Assume a new car is being launched with a 4.2L engine. Create a prediction and approximate 95% prediction interval for the CO2 emissions of the new car.
Consider only the three manufacturers (MAKE
): FORD, HONDA, and VOLKSWAGEN. Recreate the following CO2 emissions vs Engine Size scatterplot, overlaid with the regression lines and confidence bands for each manufacturer. Which manufacturer seems to be producing the most efficient cars? Do the differences look statistically significant? Justify your answer.
Create a prediction interval for a new HONDA car with a 4.2L engine (i.e. using HONDA data only).
Instead of using regression, we could have treated each engine size as a separate category and fitted an ANOVA model; the side-by-side boxplots below demonstrate this approach.
fcr %>% filter( FUEL %in% c("Regular", "Premium")) %>%
ggplot(aes( x = ENGINE_SIZE, y = CO2_EMISSIONS)) +
geom_boxplot( aes(group = ENGINE_SIZE)) + geom_smooth( method = "lm")
Answer the following questions: a) How many parameters (group means) would you have to estimate with the ANOVA approach? b) Which approach, ANOVA or linear regression, is more flexible for describing average CO2 emissions at different engine sizes? Justify your answer. c) Based on the plot, do you think the two approaches are very different? Justify your answer. d) Could you use the ANOVA model to predict the CO2 emissions for a 4.2L engine? Justify your answer.
- Below is a plot of the combined city/highway fuel consumption (
COMB
, in L/100km), versus CO2 emissions (in g/km). It seems there is an almost perfect relationship between the two, which implies that emissions are almost exlusively determined by how much gas is used.
fcr_gsln %>% ggplot(aes(x = COMB, y = CO2_EMISSIONS)) + geom_point()
Estimate the amount of CO2 (in g) emitted by burning 1L of gasoline, by fitting an appropriate regression model to the data.
LS0tDQp0aXRsZTogIlNUQUE1NyAtIFdvcmtTaGVldCAxNCINCmF1dGhvcjogJ05hbWU6ICAgICwgSUQjOiAgICcNCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KZWRpdG9yX29wdGlvbnM6IA0KICBjaHVua19vdXRwdXRfdHlwZTogaW5saW5lDQotLS0NCg0KKioqDQoNCioqR29hbCoqOiBQcmFjdGljZSB3aXRoIHNpbXBsZSBsaW5lYXIgcmVncmVzc2lvbi4NCg0KDQpbTmF0dXJhbCBSZXNvdXJjZXMgQ2FuYWRhXShodHRwczovL3d3dy5ucmNhbi5nYy5jYS9lbmVyZ3kvZWZmaWNpZW5jeS90cmFuc3BvcnRhdGlvbi8yMTAwOCkgcHVibGlzaGVzIGZ1ZWwgY29uc3VtcHRpb24gYW5kIENPMiBlbWlzc2lvbnMgcmF0aW5ncyBmb3IgYWxsIGNhcnMgc29sZCBpbiBDYW5hZGE7IHlvdSBjYW4gZmluZCB0aGlzIGluZm9ybWF0aW9uIG9uIG5ldyBjYXIgc3RpY2tlcnMgbGlrZSB0aGlzIG9uZToNCg0KIVtdKGltZy9lbmVyR3VpZGUuUE5HKQ0KDQpUaGUgcmVsZXZhbnQgZGF0YSBhcmUgYXZhaWxhYmxlIHRocm91Z2ggdGhlIEdvdid0IG9mIENhbmFkYSdzIFtPcGVuIERhdGEgcG9ydGFsXShodHRwczovL29wZW4uY2FuYWRhLmNhL2RhdGEvZW4vZGF0YXNldC85OGYxYTEyOS1mNjI4LTRjZTQtYjI0ZC02ZjE2YmYyNGRkNjQpLiBUaGUgcHVycG9zZSBvZiB0aGlzIGV4ZXJjaXNlIGlzIHRvIG1vZGVsIGZ1ZWwgY29uc3VtcHRpb24gYmFzZWQgb24gZW5naW5lIGNoYXJhY3RlcmlzdGljcy4gWW91IGNhbiBsb2FkIHRoZSBkYXRhIHVzaW5nOg0KDQpgYGB7ciwgbWVzc2FnZT1GQUxTRX0NCmxpYnJhcnkodGlkeXZlcnNlKQ0KZmNyID0gcmVhZF9jc3YoIi4uL2RhdGEvMjAxOCBGdWVsIENvbnN1bXB0aW9uIFJhdGluZ3MuY3N2IikgJT4lIA0KICBtdXRhdGUoIEZVRUwgPSBmYWN0b3IoIEZVRUwsIGxldmVscyA9IGMoIlgiLCAiWiIsICJEIiwgIkUiLCAiTiIpLCANCiAgICAgICAgIGxhYmVscyA9IGMoIlJlZ3VsYXIiLCAiUHJlbWl1bSIsICJEaWVzZWwiLCAiRXRoYW5vbCIsICJOYXQgR2FzIikpICkNCmBgYA0KWW91IHdpbGwgdHJ5IHRvIG1vZGVsIGZ1ZWwgY29uc3VtcHRpb24gYmFzZWQgb24gZW5naW5lIHNpemUuDQoNCg0KMS4gUmVjcmVhdGUgdGhlIGZvbGxvd2luZyBwbG90Lg0KDQohW10oLi9pbWcvRU5HSU5FdnNDTzJieUZVRUwuUE5HKQ0KDQoNCjIuIEtlZXAgb25seSB2ZWhpY2xlcyB0aGF0IHVzZSBnYXNvbGluZSAoZWl0aGVyICpSZWd1bGFyKiBvciAqUHJlbWl1bSopLg0KYGBge3J9DQpmY3JfZ3NsbiA9IGZjciAlPiUgZmlsdGVyKCBGVUVMICVpbiUgYygiUmVndWxhciIsICJQcmVtaXVtIikpDQpgYGANCkZpdCBhIGxpbmVhciByZWdyZXNzaW9uIG1vZGVsIGZvciBDTzIgZW1pc3Npb25zIChpbiBnci9rbSkgYmFzZWQgb24gZW5naW5lIHNpemUgKGluIEwpLiBDcmVhdGUgYSBzY2F0dGVwbG90IG9mIHRoZSBkYXRhIHRvZ2V0aGVyIHdpdGggdGhlIHJlZ3Jlc3Npb24gbGluZS4NCg0KDQozLiBDaGVjayB0aGUgbW9kZWwgcGFyYW1ldGVycyB0byBhbnN3ZXIgdGhlIGZvbGxvd2luZyBxdWVzdGlvbnM6IA0KYSkgSG93IG11Y2ggZG8gQ08yIGVtaXNzaW9ucyBpbmNyZWFzZSwgb24gYXZlcmFnZSwgZm9yIGFuIGFkZGl0aW9uYWwgbGl0cmUgb2YgZW5naW5lIHNpemU/IA0KYikgV2hhdCBwZXJjZW50IG9mIENPMiBlbWlzc2lvbnMgdmFyaWFiaWxpdHkgY2FuIGJlIGF0dHJpYnV0ZWQgdG8gZGlmZmVyZW5jZXMgaW4gZW5naW5lIHNpemU/DQpjKSBXaGF0IGlzIHRoZSBzdHJhbmRhcmQgZGV2aWF0aW9uIG9mIHRoZSBlcnJvcnMgYXJvdW5kIHRoZSByZWdyZXNzaW9uIGxpbmU/DQoNCg0KNC4gQXNzdW1lIGEgbmV3IGNhciBpcyBiZWluZyBsYXVuY2hlZCB3aXRoIGEgNC4yTCBlbmdpbmUuIENyZWF0ZSBhIHByZWRpY3Rpb24gYW5kIGFwcHJveGltYXRlIDk1JSBwcmVkaWN0aW9uICppbnRlcnZhbCogZm9yIHRoZSBDTzIgZW1pc3Npb25zIG9mIHRoZSBuZXcgY2FyLiANCg0KDQo1LiBDb25zaWRlciBvbmx5IHRoZSB0aHJlZSBtYW51ZmFjdHVyZXJzIChgTUFLRWApOiBGT1JELCBIT05EQSwgYW5kIFZPTEtTV0FHRU4uIFJlY3JlYXRlIHRoZSBmb2xsb3dpbmcgQ08yIGVtaXNzaW9ucyB2cyBFbmdpbmUgU2l6ZSBzY2F0dGVycGxvdCwgb3ZlcmxhaWQgd2l0aCB0aGUgcmVncmVzc2lvbiBsaW5lcyBhbmQgICpjb25maWRlbmNlKiBiYW5kcyBmb3IgZWFjaCBtYW51ZmFjdHVyZXIuIFdoaWNoIG1hbnVmYWN0dXJlciBzZWVtcyB0byBiZSBwcm9kdWNpbmcgdGhlIG1vc3QgZWZmaWNpZW50IGNhcnM/IERvIHRoZSBkaWZmZXJlbmNlcyAqbG9vayogc3RhdGlzdGljYWxseSBzaWduaWZpY2FudD8gSnVzdGlmeSB5b3VyIGFuc3dlci4NCg0KIVtdKGltZy9DTzJfM01BS0UuUE5HKQ0KDQoNCjYuIENyZWF0ZSBhIHByZWRpY3Rpb24gaW50ZXJ2YWwgZm9yIGEgbmV3IEhPTkRBIGNhciB3aXRoIGEgNC4yTCBlbmdpbmUgKGkuZS4gdXNpbmcgSE9OREEgZGF0YSBvbmx5KS4NCg0KDQo3LiBJbnN0ZWFkIG9mIHVzaW5nIHJlZ3Jlc3Npb24sIHdlIGNvdWxkIGhhdmUgdHJlYXRlZCBlYWNoIGVuZ2luZSBzaXplIGFzIGEgKnNlcGFyYXRlIGNhdGVnb3J5KiBhbmQgZml0dGVkIGFuIEFOT1ZBIG1vZGVsOyB0aGUgc2lkZS1ieS1zaWRlIGJveHBsb3RzIGJlbG93IGRlbW9uc3RyYXRlIHRoaXMgYXBwcm9hY2guIA0KDQpgYGB7cn0NCmZjciAlPiUgZmlsdGVyKCBGVUVMICVpbiUgYygiUmVndWxhciIsICJQcmVtaXVtIikpICU+JSANCiAgZ2dwbG90KGFlcyggeCA9IEVOR0lORV9TSVpFLCB5ID0gQ08yX0VNSVNTSU9OUykpICsNCiAgZ2VvbV9ib3hwbG90KCBhZXMoZ3JvdXAgPSBFTkdJTkVfU0laRSkpICsgZ2VvbV9zbW9vdGgoIG1ldGhvZCA9ICJsbSIpDQpgYGANCg0KQW5zd2VyIHRoZSBmb2xsb3dpbmcgcXVlc3Rpb25zOg0KYSkgSG93IG1hbnkgcGFyYW1ldGVycyAoZ3JvdXAgbWVhbnMpIHdvdWxkIHlvdSBoYXZlIHRvIGVzdGltYXRlIHdpdGggdGhlIEFOT1ZBIGFwcHJvYWNoPw0KYikgV2hpY2ggYXBwcm9hY2gsIEFOT1ZBIG9yIGxpbmVhciByZWdyZXNzaW9uLCBpcyBtb3JlICpmbGV4aWJsZSogZm9yIGRlc2NyaWJpbmcgYXZlcmFnZSBDTzIgZW1pc3Npb25zIGF0IGRpZmZlcmVudCBlbmdpbmUgc2l6ZXM/IEp1c3RpZnkgeW91ciBhbnN3ZXIuDQpjKSBCYXNlZCBvbiB0aGUgcGxvdCwgZG8geW91IHRoaW5rIHRoZSB0d28gYXBwcm9hY2hlcyBhcmUgdmVyeSBkaWZmZXJlbnQ/IEp1c3RpZnkgeW91ciBhbnN3ZXIuDQpkKSBDb3VsZCB5b3UgdXNlIHRoZSBBTk9WQSBtb2RlbCB0byBwcmVkaWN0IHRoZSBDTzIgZW1pc3Npb25zIGZvciBhIDQuMkwgZW5naW5lPyBKdXN0aWZ5IHlvdXIgYW5zd2VyLg0KDQoNCjguIEJlbG93IGlzIGEgcGxvdCBvZiB0aGUgKmNvbWJpbmVkKiBjaXR5L2hpZ2h3YXkgZnVlbCBjb25zdW1wdGlvbiAoYENPTUJgLCBpbiBMLzEwMGttKSwgdmVyc3VzIENPMiBlbWlzc2lvbnMgKGluIGcva20pLiBJdCBzZWVtcyB0aGVyZSBpcyBhbiBhbG1vc3QgcGVyZmVjdCByZWxhdGlvbnNoaXAgYmV0d2VlbiB0aGUgdHdvLCB3aGljaCBpbXBsaWVzIHRoYXQgZW1pc3Npb25zIGFyZSBhbG1vc3QgZXhsdXNpdmVseSBkZXRlcm1pbmVkIGJ5IGhvdyBtdWNoIGdhcyBpcyB1c2VkLiANCmBgYHtyfQ0KZmNyX2dzbG4gJT4lIGdncGxvdChhZXMoeCA9IENPTUIsIHkgPSBDTzJfRU1JU1NJT05TKSkgKyBnZW9tX3BvaW50KCkNCmBgYA0KRXN0aW1hdGUgdGhlIGFtb3VudCBvZiBDTzIgKGluIGcpIGVtaXR0ZWQgYnkgYnVybmluZyAxTCBvZiBnYXNvbGluZSwgYnkgZml0dGluZyBhbiAqYXBwcm9wcmlhdGUqIHJlZ3Jlc3Npb24gbW9kZWwgdG8gdGhlIGRhdGEuDQo=