About the course

STAA57 course provides an overview of Data Science, i.e. the systematic study of extracting knowledge from data. We will examine the fundamental concepts, principles, and methods of Data Science, using computing as our primary tool. In particular, we will be using the R language and RStudio environment for analyzing data and creating reports. The course content is divided along three themes:

  • Data Management & Visualization
  • Statistical Inference
  • Predictive Methods (Machine Learning)

By the end of this course, you should be able to carry out a statistical investigation using data, including formulating relevant questions, selecting appropriate methods to address them, processing and analyzing the data, and communicating the results.

Textbooks

We will use parts of the following (free) textbooks:

Marking Scheme

Item Weight
Worksheets 15% (best 18/22)
Midterm Exam 25%
Course Project 20%
Final Exam 40%