Introduction to ggolot

Global Plastic Waste

Exploratory Data Analysis

Plastic pollution is a major and growing problem, negatively affecting oceans and wildlife health. Our World in Data has a lot of great data at various levels including globally, per country, and over time. For this exercise we’ll focus on data from 2010.

Additionally, National Geographic ran a data visualization communication contest on plastic waste as seen here.

Note

These exercises will have you use some functions that you have not yet seen. For each one, use the results to try to understand what it’s actually doing. Follow the links for information about various visualizations you are asked to try.

Packages

We’ll use the tidyverse package for this analysis.

library(tidyverse)
Warning: package 'ggplot2' was built under R version 4.4.3
Warning: package 'readr' was built under R version 4.4.3

Data

The dataset for this activity can be found as a .csv file here (link) and on Brightspace. Read it into a dataframe.

plastic_waste <- read_csv("data/plastic-waste.csv")

This may require that you set the working directory properly.

The descriptions for variables in the file are as follows:

  • code: Three letter country code
  • entity: Country name
  • continent: Continent name
  • year: Year
  • gdp_per_cap: GDP per capita constant 2011 international ($), ratio
  • plastic_waste_per_cap: Amount of plastic waste per capita in kg/day
  • mismanaged_plastic_waste_per_cap: Amount of mismanaged plastic waste per capita in kg/day
  • mismanaged_plastic_waste: Tonnes of mismanaged plastic waste
  • coastal_pop: Number of individuals living on/near coast
  • total_pop: Total population according to Gapminder

Instructions

You should put the answer to each Exercise in the Quarto file for Part 2.

Get to know the Data

Start by taking a look at the distribution of plastic waste per capita in 2010, done by using ggplot to generate a histogram and box plot. Remember that you can learn more about this function by typing ?geom_histogram in the Console.

Exercise 1 summarizing a variable

Write code to calculate and display the mean, median, min, and max of the plastic_waste_per_cap variable.

# Your code here

Exercise 2 histogram

Create a histogram to show the distribution of the plastic_waste_per_cap variable. Put your code in the answer document.

# Your code here
# Use a binwidth of 0.2

Exercise 3 boxplot

Create a boxplot to show the distribution of the plastic_waste_per_cap variable. Interpret the box plot, include what the various components of the plot represent. Put your code and explanation in the answer document.

# Your code here

Exercise 4 learn more about the outlier

One country stands out as an unusual observation at the right side of the distribution. One way of identifying this country is to subset the data for countries where plastic waste per capita is greater than 3.5 kg/person.

Write code to display the the country that is an outlier. Add the result as a comment in your code chunk.

# Your code here

Note how we use the visualization to help drive our exploratory data analysis, and then use filter to give us information about the data.

Did you expect the result filter gave? You might consider doing some research on Trinidad and Tobago to see whether plastic waste per capita is realy high there and why, or whether this is a data error.

Exercise 5 bar plot

Create a bar plot of the continent variable for all countries below the median plastic_waste_per_cap. Briefly interpret the plot. Put your code and explanation in the document (not inside the code chunk).

# Your code here

Exercise 6 scatter plot

Write code to create a scatterplot of plastic_waste_per_cap versus gdp_per_cap. Interpret the result.

# Your code here