Lab: Introducing Tidyverse

Lab 4

Why are we here?

Up to now we have been manipulating vectors by reordering and subsetting them through indexing. However, once we start more advanced analyses, the preferred unit for data storage is not the vector but the data frame. We will focus on a specific data format referred to as tidy and on specific collection of packages that are particularly helpful for working with tidy data referred to as the tidyverse. We will begin with a review of base R.

This lab covers:

Supplemental resources for review and practice

Lab goals

The purpose of this lab is to review base R and introduce you to working with the tidyverse.

Lab instructions

Part 1

Preparation

  • If you have not done so already, create a new folder within your CMSC 121 RStudio project folder called labs, or a similar name, to store all of your course lab assignments.

  • Download the files test1.csv, GSS1991.sav, GSS2016.sav from the Brightspace Assignment page. Place these files in a new folder called data within your labs folder.

  • Create a new Quarto document called lab4.qmd in your labs folder.

  • Load the tidyverse package.

Exercises

  1. (6 points) In your Quarto file, create a new code chunk and write a line of code that will assign to the object df the result of calling the read.csv() function with the argument test1.csv. Be sure to put quotes around the file name in the function call. Note: check your working directory so that you know what folders need to be included in the relative file path, e.g., data/test1.csv.

    Run your code and make sure that the data frame df shows up in your workspace Environment pane with 6 observations and 3 variables.

    • Write a line of code to print the names of the columns of the data frame.

    • Write a line of code to print out the first column of data using the $ notation and the name of the first column.

    • Write code to print out the sum and mean of each column of data.

Before continuing to #2: Install and load the foreign package. You should install the package using the Console, but load the package in a code chunk within your Quarto file.

Read a description of all the columns in the GSS data

  1. (5 points) Write a line of code that will assign to the object gss91 the result of calling the read.spss() function with the GSS1991.sav filename as the first argument, and provide the additional arguments to.data.frame=TRUE, use.value.labels = FALSE, trim_values=TRUE
  2. (5 points) Write code to print the result of calling names(gss91). These are the names of all of the columns of the data frame.
  3. (5 points) Write a line of code to assign to the object gss16 the result of using the read.spss() function to read the GSS2016.sav file. Except for the file name, use the same arguments as when reading the 1991 file.
  4. (15 points) The table function takes a vector as its argument. If you pass it one column of a data frame (extracted with $ notation), it will produce a table that shows the number of items that fall into each factor level.
    • Assign to the object tab91 the result of using the table function on the CONEDUC field of the 1991 data. The results shows the number of countries with a ranking (1, 2, or 3) of the survey participant’s confidence in education. 1 is low confidence, 3 is high confidence.

    • Repeat this step, creating the variable tab16 using the 2016 data.

    • Create new tables (tab91perc and tab16perc, respectively) that convert the table values to percentages using the following two lines of code:

      tab91perc <- 100 * tab91 / sum(tab91)

      tab16perc <- 100 * tab16 / sum(tab16)

    • What does the result tell us? Write a very brief response in your Quarto document.

Part 2

Preparation

  • If you are starting a new R session:

    • Load the tidyverse package

    • Load the foreign package

    • Read the two data files GSS1991.sav and GSS2016.sav into suitably named data frames.

Exercises

All code should be written in your .qmd file created in Part 1. Number your responses and place in the order they appear below. The exercises ask you to explore datasets from the packages you have loaded in R.

  1. (10 points) Using the dplyr filter() function to create a new data frame called gss91Women which contains only the rows for women (using SEX == 2). Similarly, create gss91Men (using SEX == 1). Apologies, most official population data we’ll explore will be based on gender binary.
  2. (4 points) Because you loaded tidyverse, you have access to a data set called presidential. In order to learn a bit about this data frame, write a command that, when executed, will show you just the name and party columns of the data frame.
  3. (8 points) Write a command to subset the presidential data frame so that you’ll see only those presidents who are Republicans.
  4. (6 points) There’s another good data set with information about flights that departed from New York City in 2013. To use this you must install and load the nycflights13 package. Note that the actual name of the data frame is flights. You can learn more about this data frame by:
    • Use the names function to see the variable names.

    • Use the glimpse function to see the structure of the data frame.

    • Run View(flights) in order to see the data in its own tab. This produces the same result as selecting the object in the Environment window.

    • Run a command that will cause the help window about the data frame to open up in the Output pane.

  5. (12 points) Mutate the data frame by adding a new column that shows the speed in miles per hour. Pipe the results of your mutate to a function that will show only the new column.
  6. (12 points) Mutate the data frame by adding a new variable called gain that shows how much time a delayed flight made up in the air (which columns should you use to compute this?). Pipe the results of your mutate to a function that will show only the new column. (Hint: see Ch 4 section 4.10)
  7. (12 points) Render your Quarto as an html page using the steps below:
    1. Save a copy of your lab4.qmd document in your labs folder, called lab4_final.qmd.
    2. Quarto documents have the own environment. In other words, the working directory for your Quarto document starts with the folder where the Quarto document resides, regardless of the file path you see when you run getwd().
      • Edit your read.csv(), read_csv, and read.spss function calls as needed to render the document.

Submit on Brightspace

  1. Submit your lab4.qmd and rendered lab4_final.html files.

NOTE: A previous version of this page asked you to complete Part 2 in an .R scripts file. Please place all of your code for Part 2 into the same Quarto .qmd file.