Practicing Tidyverse Functions

Why are we here?

The purpose of this activity is to become more familiar with Tidyverse functions and the basics of data visualization.

Supplemental resources for review and practice

Tidyverse review

Preparation

Tidyverse review

This part of the activity includes exercises from Chapter 4.

The dslabs package is used for exercises 1-3.

  1. based on IDS Chapter 4 Question 5

Use pipe to add another step to the code below so that you use mutate() again to add a rate column with the per 100,000 murder rate.

library(dplyr) 
library(dslabs) 
murders <- murders %>% 
  mutate(population_in_millions = population/10^6)
  1. based on IDS Chapter 4 Question 7

Use the select function to show the state names and abbreviations in murders.

  1. based on IDS Chapter 4 Question 22

Use the group_by function to convert murders into a tibble that is grouped by region.

  1. Dplyr practice with presidential data.

Remember the presidential data frame? We’ll get access to that by loading tidyverse, and the ability to compute some things, by adding a few libraries, and then we’ll add one column to the presidential data frame.

library(mdsr)
library(ggplot2)
library(lubridate)
my_presidents <- presidential |>
  mutate(term_length = interval(start, end) / dyears(1))

Write code that will arrange the presidents in descending order by their term length, variable named term_length.

  1. More Dplyr practice with presidential data.

We can do a bunch of summarizing of the Democratic presidents with the following code:

my_presidents %>%
  summarize(
    N = n(), 
    first_year = min(year(start)), 
    last_year = max(year(end)), 
    num_dems = sum(party == "Democratic"), 
    years = sum(term_length), 
    avg_term_length = mean(term_length)
  )

Modify this code so that it will give us the summary values again, but this time grouped by the party of the presidents (note that we no longer need the num_dems summary for this).

  1. Grading an exam using conditionals and the case_when() function. Note: this question is related to the tidyverse activity 2 slides and IDS Intro Ch. 4 Section 4.10.

Create a data frame with 8 observations and 2 variables that represent student numeric scores on an exam, scores out of 100 points total. The data frame should include the numeric variable score with values representing eight scores on an exam, and student with entries representing the first name of eight students.

Create a new variable grades with the letter grade that should be assigned to each student. Use the Bard grading criteria to create your grade ranges (e.g., A, A-, B+, B, etc.).

  • A is defined as a score of 93 or greater,

  • A- is 90 to 92, B+ 87 to 89, B 83 to 86,

  • B- 80 to 82, C+ 77 to 79, C 73 to 76,

  • C- 70 to 72, D+ 67 to 69, D 63 to 66,

  • D- 60 to 62, and

  • F is a score of 59 or lower.