ggplot activity 1

Load packages and data

library(tidyverse)

In this exercise you will be using one of the datasets that is built in to R. The most commonly used of these are:

  • airquality
  • AirPassengers
  • mtcars
  • iris

You can get a preliminary view of each of these by using any one of the following commands in the Console window. Try them all out to see how they differ. I’ve shown them below for the mtcars dataset, you should try them for all four datasets.

  • print(mtcars)
  • glimpse(mtcars)
  • View(mtcars)
  • ?mtcars

Because you will be using a built-in data set, you don’t need to worry about how to read in the data and set up a data frame (phew!).

Plot relationships

Create a scatter plot

Next, pick three columns of interest from your chosen data set. You will basically follow the example just presented in class to create a scatterplot (see the handout for the sequence of code used in the example).

  • First, set up a simple ggplot command for the data set.
  • Second, set up the mapping, specifying the column for the x axis.
  • Third, add to the mapping, specifying the column for the y axis.
  • Next, add geom_point so that you get a scatterplot.
  • If it makes sense, given your data set, use a third column as a way of adding color for various graph elements.
  • Alternatively, specify a color for all the dots.
  • Finally, add a title that explains the graph.

No need to turn this in, but I’d like to see your final rendered document before the end of class! Add all the R code in this document.

Plot distributions

Histogram

Use geom_histogram to create a histogram plot for a continuous variable in your chosen dataset. Interpret the result.

Density plot

Use geom_density to create a smooth density plot for another continuous variable in your chosen dataset. Interpret the result.

Box plot

Use geom_boxplot to create a box plot plot for another continuous variable in your chosen dataset. Interpret the result.

Plot categorical variables

Use geom_bar to create a bar plot of a categorical variable in your chosen dataset.