Intro to Data Analytics
Corresponds to IDS 2.1-2.5 and IMS Ch 1
Review creating objects
Getting help
The working directory
Data types
Data frames
The accessor $
Vectors
Coercion
First, open up your CMSC 121 R project folder and select the .Rproj file for the course to open RStudio
Next, create a new Quarto Document for today’s in-class activities called Rbasics-2-activity.qmd
Create a vector using the concatenate function c()
that contains your top three favorite colors:
colors <- c("fern green", "tangerine", "iris")
Terminate a process that we are trying to run in R. They arise when it is not possible for R to continue evaluating a function.
Don’t terminate a process but are meant to warn us that there may be an issue with our code and its output. They arise when R recognizes potential problems with the code we’ve supplied.
Also don’t terminate a process and don’t necessarily indicate a problem but simply provide us with more potentially helpful information about the code we’ve supplied.
Use the output to solve the issue
There are a number of resources available to help you recall how certain functions work and answer questions that come up while working in R:
Search Google, Stack Overflow (include terms like “base R <insert your question/issue>)
See the syllabus for more reference texts
Office and lab hours
Mondays in RKC 107: 12:00 - 1:00 PM
Wednesdays in RKC 107: 5:30 PM - 6:30 PM
See the syllabus for instructions about reaching out to me for help over email.
Using the Help pane
help()
?
Create a new code chunk in your Quarto and try using both to learn about a function.
args()
functionargs(log)
Try it with the sum() function you learned this week
Why? R needs to know where your files are located. You make this easier by working from a single RStudio project for tutorials in this class.
getwd()
In R output we’ll see: Numeric num
or character chr
Types of numeric data:
Numerical: measurements (e.g., ratios, percentages, intervals)
Categorical: counting (e.g., ranking scales, names or categories with no quantitative information)
Data can be broken down into four types:
Nominal data which have no implied order, size, or quantitative information (species type, place names, etc.)
Ordinal data have an implied order (e.g. ranked scores, likert score).
Discrete data that can only take on whole, countable values or intervals.
Continuous that can take on any numeric value
class()
functionstr()
functionlibrary(dslabs)
*Note: this should already by installed based on work you did for Lab 1
data()
function to display the data sets available to you in the dslabs package.murders
dataUsed to represent a tabular spreadsheet
It is considered “tidy data” when each row is a unique observation and each column is a variable
head()
function to explore the murders
objectview()
function$
Use $
to access the different variables represented by columns in a data frame.
regions
column in murders
names()
function to see all of the columns in murders
[[]]
Use [[]]
to access the different variables represented by columns in a data frame.
total
column in the murders
data frame. This contains the total number of murders in a state.names()
function to see all of the columns in murders
population
column in murders
pop
length()
to figure out how many entries “rows” are in the columnstate
that contains the name of three US states: New York, Missouri, and Kansascodes
with the following state id codes: 36, 29, 20states
with the following state id codes and names:c("New York" = 36, "Missouri" = 29, "Kansas" = 20)
We use square brackets to access specific elements of a vector.
states
vector object by position using []
:
to subset the object to only the last two entriesWhen data does not match what R expects, functions try to guess what was meant without throwing an error message.
x <- c(1, "Missouri", 3)
and check the class of the objectas.character
as.numeric
x
object to numeric. What happens?