Lab: Introducing Tidyverse
Lab 4
Why are we here?
Up to now we have been manipulating vectors by reordering and subsetting them through indexing. However, once we start more advanced analyses, the preferred unit for data storage is not the vector but the data frame. We will focus on a specific data format referred to as tidy and on specific collection of packages that are particularly helpful for working with tidy data referred to as the tidyverse. We will begin with a review of base R.
This lab covers:
Tidyverse cheat sheet https://rstudio.github.io/cheatsheets/html/tidyr.html?_gl=1
Tidyverse packages https://www.tidyverse.org/packages/
R for Data Science https://r4ds.had.co.nz/tidy-data.html
Lab goals
The purpose of this lab is to review base R and introduce you to working with the tidyverse.
Lab instructions
Part 1
Preparation
If you have not done so already, create a new folder within your CMSC 121 RStudio project folder called
labs
, or a similar name, to store all of your course lab assignments.Download the files
test1.csv
,GSS1991.sav
,GSS2016.sav
from the Brightspace Assignment page. Place these files in a new folder calleddata
within yourlabs
folder.Create a new Quarto document called
lab4.qmd
in yourlabs
folder.Load the
tidyverse
package.
Exercises
(6 points) In your Quarto file, create a new code chunk and write a line of code that will assign to the object
df
the result of calling theread.csv()
function with the argumenttest1.csv
. Be sure to put quotes around the file name in the function call. Note: check your working directory so that you know what folders need to be included in the relative file path, e.g.,data/test1.csv
.
Run your code and make sure that the data framedf
shows up in your workspace Environment pane with 6 observations and 3 variables.Write a line of code to print the names of the columns of the data frame.
Write a line of code to print out the first column of data using the
$
notation and the name of the first column.Write code to print out the sum and mean of each column of data.
Before continuing to #2: Install and load the foreign
package. You should install the package using the Console, but load the package in a code chunk within your Quarto file.
Read a description of all the columns in the GSS data
- (5 points) Write a line of code that will assign to the object
gss91
the result of calling theread.spss()
function with theGSS1991.sav
filename as the first argument, and provide the additional argumentsto.data.frame=TRUE, use.value.labels = FALSE, trim_values=TRUE
- (5 points) Write code to print the result of calling
names(gss91)
. These are the names of all of the columns of the data frame. - (5 points) Write a line of code to assign to the object
gss16
the result of using theread.spss()
function to read theGSS2016.sav
file. Except for the file name, use the same arguments as when reading the 1991 file. - (15 points) The
table
function takes a vector as its argument. If you pass it one column of a data frame (extracted with$
notation), it will produce a table that shows the number of items that fall into each factor level.Assign to the object
tab91
the result of using the table function on theCONEDUC
field of the 1991 data. The results shows the number of countries with a ranking (1, 2, or 3) of the survey participant’s confidence in education. 1 is low confidence, 3 is high confidence.Repeat this step, creating the variable
tab16
using the 2016 data.Create new tables (
tab91perc
andtab16perc
, respectively) that convert the table values to percentages using the following two lines of code:tab91perc <- 100 * tab91 / sum(tab91)
tab16perc <- 100 * tab16 / sum(tab16)
What does the result tell us? Write a very brief response in your Quarto document.
Part 2
Preparation
If you are starting a new R session:
Load the
tidyverse
packageLoad the
foreign
packageRead the two data files
GSS1991.sav
andGSS2016.sav
into suitably named data frames.
Exercises
All code should be written in your .qmd file created in Part 1. Number your responses and place in the order they appear below. The exercises ask you to explore datasets from the packages you have loaded in R.
- (10 points) Using the dplyr
filter()
function to create a new data frame calledgss91Women
which contains only the rows for women (usingSEX == 2
). Similarly, creategss91Men
(usingSEX == 1
). Apologies, most official population data we’ll explore will be based on gender binary. - (4 points) Because you loaded tidyverse, you have access to a data set called
presidential
. In order to learn a bit about this data frame, write a command that, when executed, will show you just the name and party columns of the data frame. - (8 points) Write a command to subset the
presidential
data frame so that you’ll see only those presidents who are Republicans. - (6 points) There’s another good data set with information about flights that departed from New York City in 2013. To use this you must install and load the
nycflights13
package. Note that the actual name of the data frame isflights
. You can learn more about this data frame by:Use the
names
function to see the variable names.Use the
glimpse
function to see the structure of the data frame.Run
View(flights)
in order to see the data in its own tab. This produces the same result as selecting the object in the Environment window.Run a command that will cause the help window about the data frame to open up in the Output pane.
- (12 points) Mutate the data frame by adding a new column that shows the speed in miles per hour. Pipe the results of your
mutate
to a function that will show only the new column. - (12 points) Mutate the data frame by adding a new variable called
gain
that shows how much time a delayed flight made up in the air (which columns should you use to compute this?). Pipe the results of yourmutate
to a function that will show only the new column. (Hint: see Ch 4 section 4.10) - (12 points) Render your Quarto as an html page using the steps below:
- Save a copy of your
lab4.qmd
document in yourlabs
folder, calledlab4_final.qmd
. - Quarto documents have the own environment. In other words, the working directory for your Quarto document starts with the folder where the Quarto document resides, regardless of the file path you see when you run
getwd()
.- Edit your
read.csv()
,read_csv
, andread.spss
function calls as needed to render the document.
- Edit your
- Save a copy of your
Submit on Brightspace
- Submit your
lab4.qmd
and renderedlab4_final.html
files.
NOTE: A previous version of this page asked you to complete Part 2 in an .R scripts file. Please place all of your code for Part 2 into the same Quarto .qmd file.