ggplot activity with built in datasets

Load packages and data

library(tidyverse)

In this exercise you will be using one of the datasets that is built in to R. The most commonly used of these are:

  • airquality
  • AirPassengers
  • mtcars
  • iris

You can get a preliminary view of each of these by using any one of the following commands in the Console window. Try them all out to see how they differ. I’ve shown them below for the mtcars dataset, you should try them for all four datasets.

  • print(mtcars)
  • glimpse(mtcars)
  • View(mtcars)
  • ?mtcars

Because you will be using a built-in data set, you don’t need to worry about how to read in the data and set up a data frame (phew!).

Plot relationships

Create a scatter plot

  1. Pick three columns of interest from your chosen data set. You will basically follow the example we presented in class to create a scatterplot.
  • First, set up a simple ggplot command for the data set.
  • Second, set up the mapping, specifying the column for the x axis.
  • Third, add to the mapping, specifying the column for the y axis.
  • Next, add geom_point so that you get a scatterplot.
  • If it makes sense, given your data set, use a third column as a way of adding color for various graph elements.
  • Alternatively, specify a color for all the dots.
  • Finally, add a title that explains the graph.

No need to turn this in, but I’d like to see your final rendered document before we move on.

Plot distributions

Histogram

  1. Use geom_histogram to create a histogram plot for a continuous variable in your chosen dataset. Interpret the result.

Density plot

  1. Use geom_density to create a smooth density plot for another continuous variable in your chosen dataset. Interpret the result.

Box plot

  1. Use geom_boxplot to create a box plot plot for another continuous variable in your chosen dataset. Interpret the result.

Plot categorical variables

  1. Use geom_bar to create a bar plot of a categorical variable in your chosen dataset.

If you finish early, try out another built-in dataset.