data("admissions")
<- admissions |> select(-applicants) dat
Lab 7 Data wrangling with pivots and joins
Why are we here?
In this lab assignment you will wrangle data about college admissions and books.
In Part 1, you will practice pivoting tables.
In Part 2, you will practice working with multiple data tables using joins.
When you are finished, submit a single .qmd containing your responses to Part 1 and Part 2 on Brightspace. You do not need to render your Quarto for Lab 7.
To get started, load the tidyverse
and dslabs
packages.
Part 1 College admissions
Step 1
The admissions
data set contains admission information for men and women across six majors. In the next chunk we’ll load it and keep everything but the data on number of applicants.
After executing the above, you might want to examine the data to see what the data frame has in it. (Rather than try to view it in QMD, repeat the two lines in the Console and then you can run View(dat)
).
If we think of an observation as a major, and that each observation has two variables (men admitted percentage and women admitted percentage) then this is not tidy. Instead we want to have one row for each major, with separate columns for the men and the women (6 rows, 3 columns).
Use the pivot_wider
function to wrangle this into tidy shape.
# your code here
What now?
What we really want to do is wrangle the admissions data so that for each major we have 4 observations: admitted_men, admitted_women, applicants_men and applicants_women. The trick we perform here is actually quite common:
first use pivot_longer to generate an intermediate data frame.
then use pivot_wider to obtain the tidy data we want.
We will go step by step in the next three exercises.
Step 2
Note that this exercise is based on the original admissions
data frame, not on dat
Use the pivot_longer
function to create a tmp
data frame with a column containing the type of observation: admitted or applicants. Call the new columns name
and value
.
# your code here
Step 3
Now you have an object tmp
with columns major
, gender
, name
and value
. Note that if you combine name
and gender
, then you get the column names we want: admitted_men
, admitted_women
, applicants_men
, and applicants_women
. Use the function unite
to create a new column called column_name
, and overwrite tmp
with this new version. You can find helpful usage information by using ?unite
at the Console prompt.
# your code here
Step 4
As the final step, create admissions_wider
by applying the pivot_wider
function to tmp
in order to generate the tidy data with four variables for each major.
# your code here