syllabus
About the course
STAA57 course provides an overview of Data Science, i.e. the systematic study of extracting knowledge from data. We will examine the fundamental concepts, principles, and methods of Data Science, using computing as our primary tool. In particular, we will be using the R language and RStudio environment for analyzing data and creating reports. The course content is divided along three themes:
- Data Management & Visualization
- Statistical Inference
- Predictive Methods (Machine Learning)
By the end of this course, you should be able to carry out a statistical investigation using data, including formulating relevant questions, selecting appropriate methods to address them, processing and analyzing the data, and communicating the results.
Textbooks
We will use parts of the following (free) textbooks:
- R for Data Science , by Garrett Grolemund & Hadley Wickham
- Introductory Statistics with Randomization and Simulation, by David M Diez, Christopher D Barr, and Mine Cetinkaya-Rundel
- An Introduction to Statistical Learning, by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
Marking Scheme
Item | Weight |
---|---|
Worksheets | 15% (best 18/22) |
Midterm Exam | 25% |
Course Project | 20% |
Final Exam | 40% |