To succesfuly complete this course, students are required to do the
following:
- Pick a dataset featured on the TidyTuesday
project (any year).
- Formulate a research problem related to the data. This research
problem can be either predictive or inferential in nature (e.g. Can we
predict the popularity of a song on Spotify based on its
characteristics? Does the gender wage gap in the US depend on the
proportion of women in the field? Are more expensive video games rated
better?).
- Analyze the data using a linear regression model and write a report
on your findings. This report should include clear definition of your
research problems, description of your data (including descriptive
statistics), description of your regression model (both tables and
graphs where appropriate), diagnostics of your regression model and
overall conclusion. You can transform and filter data as necessary, but
clearly describe all data transformations.
- Prepare two documents for submission: (1) a script which must be
fully operational: it has to run without error from start (including
downloading data from TidyTuesday website) to finish without any need
for outside interference and produce all analytic outputs (models,
charts) used for the assignment, (2) final report (e.g. Word or Pdf) as
described above.
- Send both document to ales@vomacka.io.
The due date for the report is 5th September so that we can grade it
till 20th September (the last day when SIS is open for entries for the
summer semester). We recommend that you try to submit (much) earlier. If
you get stuck don’t be afraid to ask for a consultation.