Introduction to R

Working with data in R

In this lab you will be introduced to core R functions we will learn about throughout the course. You will review and practice base R functions (pre-loaded functions in R), and functions in the tidy data framework (from the tidyverse packages).

We’ll use data from the ecotourism package to get a feel for where we are heading in the course.

First, install and load the required packages

install.packages("ecotourism")

Load libraries

library(tidyverse)
library(sf)
library(ecotourism)

Run some code to make charts and make a map!

Download one of the following Quarto documents

Eco Tourism Examples

Directions to render one of the example Quarto files above.

  • Place the .qmd file in your RStudio project activities folder

  • Open in RStudio by selecting the file in the Files window (bottom right corner of RStudio)

  • Select Render (blue arrow) on the menu bar above your Quarto doc in RStudio.

Explore a dataset

Learn more about manta_rays (note: you can try this with your chosen dataset)

Practice:

  • Inspecting data: str(), head(), summary() vs glimpse()

  • Filtering / sorting

  • Creating variables

  • Grouping / summarizing

  • Basic plotting (base + ggplot)

What are rows + columns?

  • In other words, what are the observations and variables?

  • Run the code below in a code chunk or the RStudio Console, run one function at a time, e.g., str().

str(manta_rays)
head(manta_rays)
summary(manta_rays)

Make note, what do you learn about the data with each of these functions?

Base R filtering + sorting

recent <- manta_rays[manta_rays$year >= 2018, ] # Try running this code with a different year.
recent <- recent[order(recent$year, recent$month), ]
head(recent)

Simple summaries

Sightings of manta rays by year

table(manta_rays$year)

Plot (charts) in base R

plot(table(manta_rays$year), xlab = "Year", ylab = "Number of records")

Tidyverse

Learn about the variables in your dataset

manta_rays |> glimpse()

Filter and arrange rows

manta_rays |>
  filter(year >= 2018) |> # Try out different years
  arrange(year) |> # Arrange the table by year 
  select(year, obs_lat, obs_lon, ws_id) # Choose to show selected variables (columns)

Create a new variable with data in the table

manta2 <- manta_rays |>
  mutate(
    season = case_when(
      month %in% c(12, 1, 2) ~ "summer",
      month %in% c(3, 4, 5)  ~ "autumn",
      month %in% c(6, 7, 8)  ~ "winter",
      month %in% c(9,10,11)  ~ "spring",
      TRUE ~ NA_character_ # If any observations do not fit in the functions above, make the entry NA
    )
  )

manta2 # display the resulting table

# Show the new variable
manta2 |>
  select(year, obs_lat, obs_lon, ws_id, season)

Group and summarize

Summarize by year

by_year <- manta2 %>%
  group_by(year) %>%
  summarize(n_records = n(), .groups = "drop")

by_year # Display the result

Summarize by season

by_season <- manta2 %>%
  filter(!is.na(season)) %>%
  group_by(season) %>%
  summarize(n_records = n(), .groups = "drop") %>% # 
  arrange(desc(n_records))

by_season

Learn more about any of the functions we used about by running ? followed by the function name. For example: ?summarize()

Chart the result using ggplot package

  • Note: the ggplot package comes with the tidyverse. When you load the tidyverse library, several functions that are compatible with the tidy data approach are loaded (e.g., readr, dplyr, ggplot).
ggplot(by_year, aes(x = year, y = n_records)) +
  geom_line() +
  geom_point() +
  labs(x = NULL, y = "Number of records", title = "Manta ray occurrence records by year")
ggplot(by_season, aes(x = season, y = n_records)) +
  geom_col() +
  labs(x = NULL, y = "Number of records", title = "Manta ray records by season")

What patterns do your observe in these charts?