DataFest2021 Project Example
December 2020
What is this?
The materials in this repository serve as a presenter-facing example of one possible project workflow for DataFest2021. Presenters should feel free to use, modify, extend, or otherwise wrangle this material into a form that suits their preferences. Or, they should feel free to ignore the material entirely if there is a particular direction they have in mind for their session.
Untimately, each presenter should create their own R or Python script that they will work through during their session (source code, Rmarkdown, or notebook formats are all fine). The materials should not be shared with participants ahead of time, but we will upload materials to the main IQSS/datafest
Github repo at the end of each day, so that participants attending only a subset of days can orientate themselves to what has been presented previously.
Project summary
The ‘research project’ focuses on investigating COVID-19 case rates over time and space. The goal is for participants to learn about best practices for handling and using data throughout the data life-cycle, which includes: data acquisition and cleaning, data visualization and analysis, as well as data archiving and dissemination. They will learn how to programmatically extract COVID-19 data from online sources using APIs, wrangle those data into a clean state, visualize the data temporally and spatially, analyze the data using a variety of statistical models, and finally archive the project replication files (code and data) into an online data repository (Harvard Dataverse).