Data Day Project 2

Instructions

One thing we’ve seen recently is “dirty data” – data that is not in a tidy form and, therefore, is not suitable for processing with many of our dplyr/tidyverse functions or for plotting with ggplot.

Your task is…

  1. Find a dirty dataset.  For example, the US Federal government has lots of problematic data, such as this example.

  2. Explain the ways in which the data is dirty or unsuitable for processing, highlighting the issues in one or more screenshots of the problematic parts of the data set.

  3. Discuss the steps necessary to fix the data, including the R functions that you would need to use.  (Note that you do not have to actually clean the dataset though you are welcome to give that a try!).

  4. Upload a .PPT, PDF, or .html file containing a small number of slides:

    1. Slide 1 - title and your name

    2. One or more slides showing problems with the data

    3. A final slide explaining the R functions you would use to clean/fix the data.

  5. Be prepared to talk for no more than 4 minutes about the above.


How this will work in class

Your slides must be submitted by midnight the night before Data Day 2. The slides you submit will be put into one big file for presentation during class. When your name slide comes up, you will come to the front of the room and give your presentation.