R Basics 2

Intro to Data Analytics

Jordan Ayala

Topics

Corresponds to IDS 2.1-2.5 and IMS Ch 1

  1. Review creating objects

  2. Getting help

  3. The working directory

  4. Data types

  5. Data frames

  6. The accessor $

  7. Vectors

  8. Coercion

Getting started

  • First, open up your CMSC 121 R project folder and select the .Rproj file for the course to open RStudio

  • Next, create a new Quarto Document for today’s in-class activities called Rbasics-2-activity.qmd

Storing information in objects

  • You can think of objects as boxes that store things. To create an object, we have to ask R to take the results of some code and assign those results to an object.

Create a vector using the concatenate function c() that contains your top three favorite colors:

colors <- c("fern green", "tangerine", "iris")

Interpreting error messages

Errors

Terminate a process that we are trying to run in R. They arise when it is not possible for R to continue evaluating a function.

Warnings

Don’t terminate a process but are meant to warn us that there may be an issue with our code and its output. They arise when R recognizes potential problems with the code we’ve supplied.

Messages

Also don’t terminate a process and don’t necessarily indicate a problem but simply provide us with more potentially helpful information about the code we’ve supplied.

Use the output to solve the issue

Referencing resources

  • There are a number of resources available to help you recall how certain functions work and answer questions that come up while working in R:

    • Cheatsheets

    • Search Google, Stack Overflow (include terms like “base R <insert your question/issue>)

    • See the syllabus for more reference texts

Getting help

  • Office and lab hours

    • Mondays in RKC 107: 12:00 - 1:00 PM

    • Wednesdays in RKC 107: 5:30 PM - 6:30 PM

  • See the syllabus for instructions about reaching out to me for help over email.

Help in RStudio

  1. Using the Help pane

    help()

    ?

Create a new code chunk in your Quarto and try using both to learn about a function.

  1. The args() function

args(log)

Try it with the sum() function you learned this week

The working directory

Why? R needs to know where your files are located. You make this easier by working from a single RStudio project for tutorials in this class.

getwd()

Data types

  • In R output we’ll see: Numeric num or character chr

  • Types of numeric data:

    • Numerical: measurements (e.g., ratios, percentages, intervals)

    • Categorical: counting (e.g., ranking scales, names or categories with no quantitative information)

Data can be broken down into four types:

  • Nominal data which have no implied order, size, or quantitative information (species type, place names, etc.)

  • Ordinal data have an implied order (e.g. ranked scores, likert score).

  • Discrete data that can only take on whole, countable values or intervals.

  • Continuous that can take on any numeric value

Working with data types

  1. Create a new code chunk in your Quarto
  2. Create several new objects of different types: numeric and character
  3. Check the type of objects you created using the class() function
  4. Check the structure of the objects using the str() function

Working with data types continued…

  1. Load the package that contains data for the IDS textbook

library(dslabs)

*Note: this should already by installed based on work you did for Lab 1

  1. Use the data() function to display the data sets available to you in the dslabs package.
  2. Load the murders data
  3. Check the class of the data set
  4. Check the structure of the object

Data frames

  • Used to represent a tabular spreadsheet

  • It is considered “tidy data” when each row is a unique observation and each column is a variable

  1. Use the head() function to explore the murders object
  2. Open in a table from the Environment pane
  3. Open in a table using the view() function

The accessor $

Use $ to access the different variables represented by columns in a data frame.

  1. Access the regions column in murders
  2. Use the names() function to see all of the columns in murders

The accessor [[]]

Use [[]] to access the different variables represented by columns in a data frame.

  1. Access the total column in the murders data frame. This contains the total number of murders in a state.
  2. Use the names() function to see all of the columns in murders

Vectors

  • Objects with several entries (like a column in a spreadsheet)
  1. Access the population column in murders
  2. Add the result to an object called pop
  3. Use length() to figure out how many entries “rows” are in the column

Creating a vector with names

  1. Create a vector object state that contains the name of three US states: New York, Missouri, and Kansas
  2. Create a vector object codes with the following state id codes: 36, 29, 20
  3. Create a vector object states with the following state id codes and names:

c("New York" = 36, "Missouri" = 29, "Kansas" = 20)

Subsetting

We use square brackets to access specific elements of a vector.

  1. Subset the states vector object by position using []
  2. Use : to subset the object to only the last two entries
  3. Access the New York entry by using the name of the entry

Coercion

When data does not match what R expects, functions try to guess what was meant without throwing an error message.

  1. Create a vector x <- c(1, "Missouri", 3) and check the class of the object
  2. Create a vector of the sequence 1 through 8.
    • Check the class of the object
      • Coerce the vector from numeric to character using as.character
      • Coerce it back to numeric using as.numeric
  3. Coerce the x object to numeric. What happens?