- Understand fundamental concepts of statistical inference
- Population vs Sample
- Parameter vs Statistic
- Sampling Variability & Bias
- Apply basic sampling strategies
- Simple and Stratified Random Sampling
- Readings
- ISRS: ch. 1.3-1.4
Sample is subset of population that helps answer question "approximately"
Answer quality depends critically on how sample is collected
Description | Parameter | Statistic |
---|---|---|
Mean/Avg | \(\mu\) | \(\hat{\mu}/\bar{X}\) |
Std Deviation | \(\sigma\) | \(S\) |
Median | \(\mu_{1/2}\) | \(M\) |
Description | Parameter | Statistic |
---|---|---|
Proportion | \(p\) | \(\hat{p}\) |
pop = dinesafe %>% group_by(ESTABLISHMENT_ID) %>% distinct(INSPECTION_ID) %>% summarise( N_INSPECTIONS = n() ) pop %>% summarise( mean(N_INSPECTIONS) ) ## # A tibble: 1 x 1 ## `mean(N_INSPECTIONS)` ## <dbl> ## 1 3.41 pop %>% sample_n(100) %>% summarise( mean(N_INSPECTIONS) ) ## # A tibble: 1 x 1 ## `mean(N_INSPECTIONS)` ## <dbl> ## 1 3.38
Statistic changes with different samples, so how do we pick our sample?
To avoid selection bias & improve representativeness, most sampling methods involve randomness
Two common sources of bias are:
Participation or Non-response bias: respondents are not representative of entire population
Coverage bias: sampling frame does not align well with population
Population often divided into groups, called strata
## # A tibble: 3 x 3 ## MINIMUM_INSPECTIONS_PERYEAR `mean(N_INSPECTIONS)` `n()` ## <int> <dbl> <int> ## 1 1 1.66 3887 ## 2 2 3.45 8775 ## 3 3 5.21 3629