Intro to Data Analytics
Corresponds to IDS 2.1-2.5 and IMS Ch 1
Review creating objects
Getting help
The working directory
Data types
Data frames
The accessor $
Vectors
Coercion
First, open up your RStudio project folder (e.g., data121) and select the .Rproj file for the course to open RStudio. Alternatively, open RStudio and use the drop down list in the upper right corner to select your RStudio project.
Next, create a new Quarto Document in your /activities folder for today’s in-class activity called Rbasics-2-activity.qmd
Create a vector using the concatenate function c() that contains your top three favorite colors:
colors <- c("fern green", "tangerine", "iris")
Terminate a process that we are trying to run in R. They arise when it is not possible for R to continue evaluating a function.
Don’t terminate a process but are meant to warn us that there may be an issue with our code and its output. They arise when R recognizes potential problems with the code we’ve supplied.
Also don’t terminate a process and don’t necessarily indicate a problem but simply provide us with more potentially helpful information about the code we’ve supplied.
Use the output to solve the issue
There are a number of resources available to help you recall how certain functions work and answer questions that come up while working in R:
Search Google, Stack Overflow (include terms like “base R <insert your question/issue>)
See the syllabus for more reference texts
Using the Help pane
help()
?
Create a new code chunk in your Quarto and try using both to learn about a function.
args() functionargs(log)
Try it with the sum() function.
Why? R needs to know where your files are located. You make this easier by working from a single RStudio project for tutorials in this class.
getwd()
Note: Quarto documents have their own working directory WHEN your are rendering.
paths.txt example
In R output we’ll see: Numeric num or character chr
Types of numeric data:
Numerical: measurements (e.g., ratios, percentages, intervals)
Categorical: counting (e.g., ranking scales, names or categories with no quantitative information)
Data can be broken down into four types:
Nominal data which have no implied order, size, or quantitative information (species type, place names, etc.)
Ordinal data have an implied order (e.g. ranked scores, likert score).
Discrete data that can only take on whole, countable values or intervals.
Continuous that can take on any numeric value
class() functionstr() functionlibrary(tidyverse)library(dslabs)
*Note: this should already by installed based on work you did for an earlier lab.
data() function to display the data sets available to you in the dslabs package.murders data using the data() function.Used to represent a tabular spreadsheet
It is considered “tidy data” when each row is a unique observation and each column is a variable
head() function to explore the murders object.view() function.glimpse() to learn more about the structure of the data frame.$Use $ to access the different variables represented by columns in a data frame.
regions column in murdersnames() function to see all of the columns in murders[[]]Use [[]] to access the different variables represented by columns in a data frame.
total column in the murders data frame. This contains the total number of murders in a state.names() function to see all of the columns in murderspopulation column in murderspoplength() to figure out how many entries “rows” are in the columnstate that contains the name of three US states: New York, Missouri, and Kansascodes, with the following state id codes: 36, 29, 20states, with the following state id codes and names:c("New York" = 36, "Missouri" = 29, "Kansas" = 20)
We use square brackets to access specific elements of a vector.
states vector object by position using []: to subset the object to only the last two entriesWhen data does not match what R expects, functions try to guess what was meant without throwing an error message.
x <- c(1, "Missouri", 3) and check the class of the objectas.characteras.numericx object to numeric. What happens?