library(tidyverse)ggplot activity with built in datasets
Load packages and data
In this exercise you will be using one of the datasets that is built in to R. The most commonly used of these are:
- airquality
- AirPassengers
- mtcars
- iris
You can get a preliminary view of each of these by using any one of the following commands in the Console window. Try them all out to see how they differ. I’ve shown them below for the mtcars dataset, you should try them for all four datasets.
- print(mtcars)
- glimpse(mtcars)
- View(mtcars)
- ?mtcars
Because you will be using a built-in data set, you don’t need to worry about how to read in the data and set up a data frame (phew!).
Plot relationships
Create a scatter plot
- Pick three columns of interest from your chosen data set. You will basically follow the example we presented in class to create a scatterplot.
- First, set up a simple ggplot command for the data set.
- Second, set up the mapping, specifying the column for the x axis.
- Third, add to the mapping, specifying the column for the y axis.
- Next, add
geom_pointso that you get a scatterplot. - If it makes sense, given your data set, use a third column as a way of adding color for various graph elements.
- Alternatively, specify a color for all the dots.
- Finally, add a title that explains the graph.
No need to turn this in, but I’d like to see your final rendered document before we move on.
Plot distributions
Histogram
- Use
geom_histogramto create a histogram plot for a continuous variable in your chosen dataset. Interpret the result.
Density plot
- Use
geom_densityto create a smooth density plot for another continuous variable in your chosen dataset. Interpret the result.
Box plot
- Use
geom_boxplotto create a box plot plot for another continuous variable in your chosen dataset. Interpret the result.
Plot categorical variables
- Use
geom_barto create a bar plot of a categorical variable in your chosen dataset.
If you finish early, try out another built-in dataset.