Goal: Practice nonlinear/nonparametric regression.


For this question, you will measure the acceleration of gravity using data from this simple experiment.

gravity = tibble( time = (0:10)*.033, 
                  distance = c(0, .005, .02, .04, .07, .11, 
                               .16, .23, .31, .40, .51) )

The data consists of the distance covered from a free-falling object at regular time intervals.

  1. Under Newton’s laws of motion, the object will behave as: \[ \text{distance} = \frac{g}{2} \times \text{time}^2\] where \(g\) is the acceleration of gravity. Note that this is a quadratic function of time, without a constant or linear term. Fit this model using lm() and report the estimate of \(g\). Compare your answer to the earth’s actual gravitational acceleration at sea level of \(9.8m/s^2\).

  2. Create a scatterplot of the data, and ovelay it with your fitted function from the previous part, and the theoretical values from Newton’s law with \(g=9.8m/s^2\).

  3. [EXTRA] (Non-statistical question) Why do you think there is such a discrepancy between our estimate and the actual value of \(g\)?


Use the fuel consumption data for the remaining questions. A nonparametric model of combined mileage vs engine size is shown belom.

fcr = read_csv("../data/2018 Fuel Consumption Ratings.csv") %>%
  mutate( MAKE = factor(MAKE), CLASS = factor(CLASS) )
fcr %>% ggplot(aes(x = ENGINE_SIZE, y = COMB)) +
  geom_point() + geom_smooth( )

There seems to be a slight nonlinearity in the model. You will explore if this makes a difference for prediction.

  1. Fit a GAM model with specification: COMB ~ s(ENGINE_SIZE) + CYLINDERS + MAKE + CLASS, (where we use a smooth function of ENGINE_SIZE. Report the standard deviation of the model residuals.

  2. Fit a linear model with specification: COMB ~ ENGINE_SIZE + CYLINDERS + MAKE + CLASS, and report the standard deviation of the model’s residuals.

  3. Fit a regression tree model with specification: COMB ~ ENGINE_SIZE + CYLINDERS + MAKE + CLASS and control = list(cp = .001), and report the standard deviation of the model’s residuals.

  4. The code below loads the fuel consumption data for 2019 models.

fcr_new = read_csv("../data/2019 Fuel Consumption Ratings.csv") %>% 
   mutate( MAKE = factor(MAKE), CLASS = factor(CLASS) )

Use each of the three models you fit (gam, linear, tree), to make predictions for the 2019 data, and report the standard deviation of the prediction error for each one. Which model has the best out-of-sample performance?

LS0tDQp0aXRsZTogIlNUQUE1NyAtIFdvcmtzaGVldCAyMSINCmF1dGhvcjogJ05hbWU6ICAgICwgSUQjOiAgICcNCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KZWRpdG9yX29wdGlvbnM6IA0KICBjaHVua19vdXRwdXRfdHlwZTogaW5saW5lDQotLS0NCg0KYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0V9DQpsaWJyYXJ5KHRpZHl2ZXJzZSkNCmxpYnJhcnkobW9kZWxyKQ0KbGlicmFyeShicm9vbSkNCmBgYA0KDQoqKkdvYWwqKjogUHJhY3RpY2Ugbm9ubGluZWFyL25vbnBhcmFtZXRyaWMgcmVncmVzc2lvbi4gDQoNCioqKg0KDQpGb3IgdGhpcyBxdWVzdGlvbiwgeW91IHdpbGwgbWVhc3VyZSB0aGUgKmFjY2VsZXJhdGlvbiBvZiBncmF2aXR5KiB1c2luZyBkYXRhIGZyb20gdGhpcyBzaW1wbGUgW2V4cGVyaW1lbnRdKGh0dHBzOi8vd3d3LmV4cGxvcmF0b3JpdW0uZWR1L3NuYWNrcy9mYWxsaW5nLWdyYXZpdHkpLiANCg0KYGBge3J9DQpncmF2aXR5ID0gdGliYmxlKCB0aW1lID0gKDA6MTApKi4wMzMsIA0KICAgICAgICAgICAgICAgICAgZGlzdGFuY2UgPSBjKDAsIC4wMDUsIC4wMiwgLjA0LCAuMDcsIC4xMSwgDQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgLjE2LCAuMjMsIC4zMSwgLjQwLCAuNTEpICkNCmBgYA0KVGhlIGRhdGEgY29uc2lzdHMgb2YgdGhlIGBkaXN0YW5jZWAgY292ZXJlZCBmcm9tIGEgZnJlZS1mYWxsaW5nIG9iamVjdCBhdCByZWd1bGFyIGB0aW1lYCBpbnRlcnZhbHMuIA0KDQoNCjEuIFVuZGVyIE5ld3RvbidzIGxhd3Mgb2YgbW90aW9uLCB0aGUgb2JqZWN0IHdpbGwgYmVoYXZlIGFzOg0KJCQgXHRleHR7ZGlzdGFuY2V9ID0gXGZyYWN7Z317Mn0gXHRpbWVzIFx0ZXh0e3RpbWV9XjIkJA0Kd2hlcmUgJGckIGlzIHRoZSAqYWNjZWxlcmF0aW9uIG9mIGdyYXZpdHkqLiBOb3RlIHRoYXQgdGhpcyBpcyBhIHF1YWRyYXRpYyBmdW5jdGlvbiBvZiB0aW1lLCB3aXRob3V0IGEgY29uc3RhbnQgb3IgbGluZWFyIHRlcm0uIEZpdCB0aGlzIG1vZGVsIHVzaW5nIGBsbSgpYCBhbmQgcmVwb3J0IHRoZSBlc3RpbWF0ZSBvZiAkZyQuIENvbXBhcmUgeW91ciBhbnN3ZXIgdG8gdGhlIGVhcnRoJ3MgYWN0dWFsIFtncmF2aXRhdGlvbmFsIGFjY2VsZXJhdGlvbl0oaHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvR3Jhdml0YXRpb25hbF9hY2NlbGVyYXRpb24pIGF0IHNlYSBsZXZlbCBvZiAkOS44bS9zXjIkLiANCg0KDQoNCjIuIENyZWF0ZSBhIHNjYXR0ZXJwbG90IG9mIHRoZSBkYXRhLCBhbmQgb3ZlbGF5IGl0IHdpdGggeW91ciBmaXR0ZWQgZnVuY3Rpb24gZnJvbSB0aGUgcHJldmlvdXMgcGFydCwgYW5kIHRoZSB0aGVvcmV0aWNhbCB2YWx1ZXMgZnJvbSBOZXd0b24ncyBsYXcgd2l0aCAkZz05LjhtL3NeMiQuIA0KDQoNCjMuIFtFWFRSQV0gKE5vbi1zdGF0aXN0aWNhbCBxdWVzdGlvbikgV2h5IGRvIHlvdSB0aGluayB0aGVyZSBpcyBzdWNoIGEgIGRpc2NyZXBhbmN5IGJldHdlZW4gb3VyIGVzdGltYXRlIGFuZCB0aGUgYWN0dWFsIHZhbHVlIG9mICRnJD8NCg0KDQoqKioNCg0KVXNlIHRoZSBmdWVsIGNvbnN1bXB0aW9uIGRhdGEgZm9yIHRoZSByZW1haW5pbmcgcXVlc3Rpb25zLiBBIG5vbnBhcmFtZXRyaWMgbW9kZWwgb2YgY29tYmluZWQgbWlsZWFnZSB2cyBlbmdpbmUgc2l6ZSBpcyBzaG93biBiZWxvbS4gDQpgYGB7ciwgbWVzc2FnZSA9IEZBTFNFfQ0KZmNyID0gcmVhZF9jc3YoIi4uL2RhdGEvMjAxOCBGdWVsIENvbnN1bXB0aW9uIFJhdGluZ3MuY3N2IikgJT4lDQogIG11dGF0ZSggTUFLRSA9IGZhY3RvcihNQUtFKSwgQ0xBU1MgPSBmYWN0b3IoQ0xBU1MpICkNCg0KZmNyICU+JSBnZ3Bsb3QoYWVzKHggPSBFTkdJTkVfU0laRSwgeSA9IENPTUIpKSArDQogIGdlb21fcG9pbnQoKSArIGdlb21fc21vb3RoKCApDQpgYGANCg0KVGhlcmUgc2VlbXMgdG8gYmUgYSBzbGlnaHQgbm9ubGluZWFyaXR5IGluIHRoZSBtb2RlbC4gWW91IHdpbGwgZXhwbG9yZSBpZiB0aGlzIG1ha2VzIGEgZGlmZmVyZW5jZSBmb3IgcHJlZGljdGlvbi4gDQoNCjQuIEZpdCBhICpHQU0qIG1vZGVsIHdpdGggc3BlY2lmaWNhdGlvbjogYENPTUIgfiBzKEVOR0lORV9TSVpFKSArIENZTElOREVSUyArIE1BS0UgKyBDTEFTU2AsICh3aGVyZSB3ZSB1c2UgYSBzbW9vdGggZnVuY3Rpb24gb2YgYEVOR0lORV9TSVpFYC4gUmVwb3J0IHRoZSBzdGFuZGFyZCBkZXZpYXRpb24gb2YgdGhlIG1vZGVsIHJlc2lkdWFscy4NCg0KDQoNCjUuIEZpdCBhICpsaW5lYXIqIG1vZGVsIHdpdGggc3BlY2lmaWNhdGlvbjogYENPTUIgfiBFTkdJTkVfU0laRSArIENZTElOREVSUyArIE1BS0UgKyBDTEFTU2AsIGFuZCByZXBvcnQgdGhlIHN0YW5kYXJkIGRldmlhdGlvbiBvZiB0aGUgbW9kZWwncyByZXNpZHVhbHMuDQoNCg0KDQoNCg0KNi4gRml0IGEgKnJlZ3Jlc3Npb24gdHJlZSogbW9kZWwgd2l0aCBzcGVjaWZpY2F0aW9uOiBgQ09NQiB+IEVOR0lORV9TSVpFICsgQ1lMSU5ERVJTICsgTUFLRSArIENMQVNTYCBhbmQgYGNvbnRyb2wgPSBsaXN0KGNwID0gLjAwMSlgLCBhbmQgcmVwb3J0IHRoZSBzdGFuZGFyZCBkZXZpYXRpb24gb2YgdGhlIG1vZGVsJ3MgcmVzaWR1YWxzLg0KDQoNCg0KNy4gVGhlIGNvZGUgYmVsb3cgbG9hZHMgdGhlIGZ1ZWwgY29uc3VtcHRpb24gZGF0YSBmb3IgMjAxOSBtb2RlbHMuDQpgYGB7ciwgbWVzc2FnZSA9IEZBTFNFfQ0KZmNyX25ldyA9IHJlYWRfY3N2KCIuLi9kYXRhLzIwMTkgRnVlbCBDb25zdW1wdGlvbiBSYXRpbmdzLmNzdiIpICU+JSANCiAgIG11dGF0ZSggTUFLRSA9IGZhY3RvcihNQUtFKSwgQ0xBU1MgPSBmYWN0b3IoQ0xBU1MpICkNCmBgYA0KVXNlIGVhY2ggb2YgdGhlIHRocmVlIG1vZGVscyB5b3UgZml0IChnYW0sIGxpbmVhciwgdHJlZSksIHRvIG1ha2UgcHJlZGljdGlvbnMgZm9yIHRoZSAyMDE5IGRhdGEsIGFuZCByZXBvcnQgdGhlIHN0YW5kYXJkIGRldmlhdGlvbiBvZiB0aGUgcHJlZGljdGlvbiBlcnJvciBmb3IgZWFjaCBvbmUuIFdoaWNoIG1vZGVsIGhhcyB0aGUgYmVzdCBvdXQtb2Ytc2FtcGxlIHBlcmZvcm1hbmNlPw0KDQoNCg==