# A tibble: 2 × 4
customer_id item_1 item_2 item_3
<dbl> <chr> <chr> <chr>
1 1 bread milk banana
2 2 milk toilet paper <NA>
We have one row per customer, and in that row we have all the items the customer purchased in the columns.
# A tibble: 2 × 4
customer_id item_1 item_2 item_3
<dbl> <chr> <chr> <chr>
1 1 bread milk banana
2 2 milk toilet paper <NA>
# A tibble: 6 × 3
customer_id item_no item
<dbl> <chr> <chr>
1 1 item_1 bread
2 1 item_2 milk
3 1 item_3 banana
4 2 item_1 milk
5 2 item_2 toilet paper
6 2 item_3 <NA>
The goal of tidyr is to help you tidy your data via
NA
s should be treatedWe use “wider” and “longer” as relative terms
# A tibble: 2 × 4
customer_id item_1 item_2 item_3
<dbl> <chr> <chr> <chr>
1 1 bread milk banana
2 2 milk toilet paper <NA>
# A tibble: 6 × 3
customer_id item_no item
<dbl> <chr> <chr>
1 1 item_1 bread
2 1 item_2 milk
3 1 item_3 banana
4 2 item_1 milk
5 2 item_2 toilet paper
6 2 item_3 <NA>
We want to make this a longer data frame where each row represents one item purchased by one customer.
Keep all of the data but change the shape of a data frame
more columns
# A tibble: 2 × 4
customer_id item_1 item_2 item_3
<dbl> <chr> <chr> <chr>
1 1 bread milk banana
2 2 milk toilet paper <NA>
more columns
# A tibble: 2 × 4
customer_id item_1 item_2 item_3
<dbl> <chr> <chr> <chr>
1 1 bread milk banana
2 2 milk toilet paper <NA>
more rows
# A tibble: 6 × 3
customer_id item_no item
<dbl> <chr> <chr>
1 1 item_1 bread
2 1 item_2 milk
3 1 item_3 banana
4 2 item_1 milk
5 2 item_2 toilet paper
6 2 item_3 <NA>
pivot_longer()
pivot_longer()
pivot_longer()
pivot_longer()
data
(as usual)cols
: columns to pivot into longer formatnames_to
: name of the column where the column names of pivoted variables go (character string)values_to
: name of the column where data in pivoted variables go (character string)purchases <- customers %>%
pivot_longer(
cols = item_1:item_3, # variables item_1 to item_3
names_to = "item_no", # column names -> new column called item_no
values_to = "item" # values in columns -> new column called item
)
purchases
# A tibble: 6 × 3
customer_id item_no item
<dbl> <chr> <chr>
1 1 item_1 bread
2 1 item_2 milk
3 1 item_3 banana
4 2 item_1 milk
5 2 item_2 toilet paper
6 2 item_3 <NA>
Most likely, because the next step of your analysis needs it. More on left_join
later…
data
(as usual)names_from
: which column in the long format contains the what should be column names in the wide formatvalues_from
: which column in the long format contains the what should be values in the new columns in the wide formatTry the examples from the slides on your own.
https://cran.r-project.org/web/packages/tidyr/vignettes/pivot.html