To succesfuly complete this course, students are required to do the following:

  1. Pick a dataset featured on the TidyTuesday project (any year).
  2. Formulate a research problem related to the data. This research problem can be either predictive or inferential in nature (e.g. Can we predict the popularity of a song on Spotify based on its characteristics? Does the gender wage gap in the US depend on the proportion of women in the field? Are more expensive video games rated better?).
  3. Analyze the data using a linear regression model and write a report on your findings. This report should include clear definition of your research problems, description of your data (including descriptive statistics), description of your regression model (both tables and graphs where appropriate), diagnostics of your regression model and overall conclusion. You can transform and filter data as necessary, but clearly describe all data transformations.
  4. Prepare two documents for submission: (1) a script which must be fully operational: it has to run without error from start (including downloading data from TidyTuesday website) to finish without any need for outside interference and produce all analytic outputs (models, charts) used for the assignment, (2) final report (e.g. Word or Pdf) as described above.
  5. Send both document to .

The due date for the report is 5th September so that we can grade it till 20th September (the last day when SIS is open for entries for the summer semester). We recommend that you try to submit (much) earlier. If you get stuck don’t be afraid to ask for a consultation.