Joining tables using two keys

Joins with the ecotourism package


First, you will work through an example using the manta_rays data set from the ecotourism package. Then, explore the ecotourism package using the help pane in RStudio and complete the same steps with another data set within the package.

In this activity, you will review and apply:

How to use this page:

  • In Part 1, the solution code is not automatically displayed, but you can see it by clicking the Code buttons. This gives you the opportunity to think first about how something should be done and then check to see what code was actually used. I strongly recommend you take the time to attempt an answer the question before revealing the solution.

  • Part 2 gives you a chance to apply what you’ve learned without example solutions.

Part 1: Manta Ray sightings and ecotourism activity

In Part 1 you will analyze occurrence data for manta rays in Australia, using records from the Atlas of Living Australia (ALA). manta rays are a sensitive aquatic species whose presence may correspond with seasonal weather conditions. You will integrate sightings of manta rays with weather and seasonality (represented by annual quarter).

Data from the ecotourism library:

  • manta_rays: This dataset contains occurrence records for the reef manta ray observed in Australian waters from 2014 to 2024. Each row represents an individual sighting of a manta ray by location (weather station ws_id), at a particular date and time.

  • weather: daily weather records for each station.

Take a moment to use the help page to learn more about each dataset. Use the help page in RStudio.

Step 1: Prepare a data frame of daily manta ray sightings

  • Wrangle manta_rays to create a new data frame manta_daily where each row represents the number of manta rays sighted each day at each weather station location. You will need to use the ws_id and date variable. Your resulting data frame should have 319 observations.
Code
manta_daily <- manta_rays |>
  count(ws_id, date)

# Run the object name manta_daily in the console to see the result

Step 2: Connect weather conditions with manta ray sightings

  • Create a new data frame called manta_weather that retains all observations from manta_daily and adds the corresponding month and average wind speed information from weather, maintaining the weather station-date observational unit (each row represents a weather station on a particular date).
  • Recall that in DATA 121 we covered joins here.
Code
manta_weather <- manta_daily |>
  left_join(weather, 
            by = c("ws_id", "date")) %>% 
  select(ws_id, date, n, month, wind_speed)

# Run the object name manta_weather in the console to see the result

Step 3 Create a new variable

  • Create a new data frame to_plot that with a new variable quarter, which reflects the annual quarter time frame.
    • The variable should be a factor data type with with levels 1,2,3,4.
      • Jan–Mar → 1
      • Apr–Jun → 2
      • Jul–Sep → 3
      • Oct–Dec → 4
    • Recall that in DATA 121 we covered case_when() here and fct_relevel here.
Code
to_plot <- manta_weather %>% 
  mutate(quarter = case_when(
       month %in% 1:3 ~ "1",
       month %in% 4:6 ~ "2",
       month %in% 7:9 ~ "3",
       month %in% 10:12 ~ "4",
       TRUE ~ NA
     ),
     quarter = fct_relevel(quarter, 
                           "1","2","3","4")) %>% 
  select(ws_id, n, quarter, wind_speed) # Select to keep only the variables of interest

Step 4 Plot the relationship between weather, season, and manta ray sightings

Replicate the plot below: Create a plot that shows the relationship between daily manta ray observations and average wind speed. Color the points in the plot by the annual quarter.

  • In a sentence or two, provide an interpretation of the visualization. What can we learn from this chart about the relationship between wind speed, season, and manta ray sightings?
  • Recall that in DATA 121 we covered effective visualization techniques here.
Code
to_plot %>% 
  drop_na(quarter) %>% 
  ggplot(aes(wind_speed, n, color = quarter)) +
  geom_point(alpha = 0.8) +
  labs(
    title = "Manta Ray Count vs. Wind Speed",
    subtitle = "Colored by Annual Quarter",
    x = "Average wind speed (m/s)",
    y = "manta ray Count",
    color = "Quarter"
  ) +
  theme_minimal()

Part 2

Explore the R help page information about the ecotourism package.

  • Apply the same steps you completed above to one of the other organism-specific datasets in the ecotourism package.
    • For results that are closest to what you saw with the manta_ray dataset, I suggest using the gouldian_finch data table and the precipitation variable prcp from the weather dataset.
Code
# Your code here

Submit on Brightspace

Submit a Quarto with your work for Part 2 of the activity.