Tidyverse activity with…cats!

Silly? Yes. But let’s practice tidyverse

Tidyverse activity

  • Explore data about cats using dyplr tidy functions: summarize, group_by, mutate.

  • Calculate course grades using case_when

Yes, there are data about… Cats!

Load packages

library(tidyverse)

Load data (link to download cats data)

cat_lovers <- read_csv("data/cat-lovers.csv")

What is in the data frame?

Summarize the data

Dplyr summarize()

cat_lovers |>
  summarize(avg_cats = mean(number_of_cats)) #Annotate
# A tibble: 1 × 1
  avg_cats
     <dbl>
1       NA
  • Why doesn’t this work? (Hint: explore the data frame)

Summarize

cat_lovers |> #Annotate
  summarize(avg_cats = mean(as.numeric(number_of_cats))) 
  • Why doesn’t this work? (Hint: explore the data frame)

Summarize

cat_lovers |> #Annotate
  summarize(avg_cats = mean(as.numeric(number_of_cats), na.rm = TRUE)) 
# A tibble: 1 × 1
  avg_cats
     <dbl>
1    0.776

Group by

Dplyr group_by()

cat_lovers %>% 
  group_by(handedness)
# A tibble: 60 × 3
# Groups:   handedness [3]
   name           number_of_cats handedness
   <chr>          <chr>          <chr>     
 1 Bernice Warren 0              left      
 2 Woodrow Stone  0              left      
 3 Willie Bass    1              left      
 4 Tyrone Estrada 3              left      
 5 Alex Daniels   3              left      
 6 Jane Bates     2              left      
 7 Latoya Simpson 1              left      
 8 Darin Woods    1              left      
 9 Agnes Cobb     0              left      
10 Tabitha Grant  0              left      
# ℹ 50 more rows
  • What is the class of the result of running group_by ?

Group by

  • Stratifying data before computing summary statistics
cat_lovers %>% 
  group_by(handedness)
# A tibble: 60 × 3
# Groups:   handedness [3]
   name           number_of_cats handedness
   <chr>          <chr>          <chr>     
 1 Bernice Warren 0              left      
 2 Woodrow Stone  0              left      
 3 Willie Bass    1              left      
 4 Tyrone Estrada 3              left      
 5 Alex Daniels   3              left      
 6 Jane Bates     2              left      
 7 Latoya Simpson 1              left      
 8 Darin Woods    1              left      
 9 Agnes Cobb     0              left      
10 Tabitha Grant  0              left      
# ℹ 50 more rows

What’s a tibble?

Tibbles tbl

  • Read about them in IDS Intro Chapter 4 section 4.6 Tibbles
  • The tbl, pronounced “tibble”, is a special kind of data frame.
  • The functions group_by and summarize always return this type of data frame.

Group by

Do people with different “handedness” own more/less cats?

  • Group by the “handedness” of cat owners and calculate the average number of cats per owner
cat_lovers %>%  #Annotate
  mutate(number_of_cats = as.numeric(number_of_cats)) %>% 
  group_by(handedness) %>% 
  summarize(mean_cats = mean(number_of_cats, na.rm = TRUE))
# A tibble: 3 × 2
  handedness   mean_cats
  <chr>            <dbl>
1 ambidextrous     0.8  
2 left             0.923
3 right            0.725

If/then in Dplyr to calculate grades

case_when(condition ~ output_value)

  • condition is the condition that evaluates as TRUE (the “if”)

  • output_value is the value to output if the condition is TRUE (the “then”)

df <- data.frame(
  student = c("Natascha", "Alex", "Arun", "Arturo", "Ashley", "Oscar", "James", "Elliot"), 
  score = c(92, 78, 85, 86, 93, 67, 56, 73))

df %>% 
  mutate(grade = case_when(
    score >= 90 ~ 'A',
    score >= 80 ~ 'B',
    score >= 70 ~ 'C',
    score >= 60 ~ 'D',
    TRUE ~ 'F'))
   student score grade
1 Natascha    92     A
2     Alex    78     C
3     Arun    85     B
4   Arturo    86     B
5   Ashley    93     A
6    Oscar    67     D
7    James    56     F
8   Elliot    73     C

Use case_when to recode

cat_lovers <- cat_lovers %>%
  mutate(number_of_cats = 
           case_when (
             name == "Ginger Clark" ~ "2",
             name == "Doug Bass"    ~ "3",
             TRUE                   ~ number_of_cats
           ),
         number_of_cats = as.numeric(number_of_cats)
         )

case_when() function

case_when(
    If Logical test 1 ~ new value,
    If Logical test 2 ~ new value,
    ....more tests and values...
    TRUE ~ default value
)