- Understand fundamentals of statistical estimation
- Sampling distribution
- Point & interval estimates
- Apply resampling methods for estimation
- Bootstrap confidence intervals
- Readings
- ISRS: ch. 4.5
- ModernDive: ch. 9
Interested in estimating value of parameter based on sample
Specific sample gives one out of all possible statistic values
Access LFS microdata (individual responses) through UofT's CHASS Data Centre
File LFS_Toronto.csv
contains 2018 LFS data for Toronto's Census Metropolitan Area (CMA)
lfs = read_csv('./data/LFS_Toronto.csv')
lfsstat
lfs %>% summarise( UNEMPL = sum(lfsstat == 3) / sum(lfsstat != 4) )
## # A tibble: 1 x 1 ## UNEMPL ## <dbl> ## 1 0.0619
Statistic gives single value, a.k.a. point estimate
infer
package to bootstrap data-frames
specify()
selects variable(s)generate()
resamples datacalculate()
calculates statisticlibrary(infer) lfs_boot = lfs %>% filter( lfsstat %in% 1:3) %>% mutate( unemployed = (lfsstat == 3) ) %>% specify( response = unemployed, success = "TRUE" ) %>% generate( reps = 500, type = "bootstrap" ) %>% calculate( stat = "prop" ) %>% rename( UNEMPL = stat ) save(lfs_boot, file = "./data/lfsboot.R")
(CI = lfs_boot %>% summarise( lower = quantile(UNEMPL, .025), upper = quantile(UNEMPL, .975)))
## # A tibble: 1 x 2 ## lower upper ## <dbl> <dbl> ## 1 0.0595 0.0645
Most sampling distributions are symmetric with single peak around mean
# margin of error (ME = (CI$upper - CI$lower)/2) ## 97.5% ## 0.002455526 # standard error (SE = sd( lfs_boot$UNEMPL )) ## [1] 0.001206671 2*SE ## [1] 0.002413342