Lab: Problem Solving
Why are we here?
In Lab 1, you installed and configured the computing tools that we are using in this course.
In this lab, we will begin learning the fundamentals of how to write programs in the R language. Whenever you’re programming a computer, chances are your program won’t work perfectly on your first try. That’s why we’ll also be developing one of the most important skill sets for a programmer to have: problem solving skills! Knowing basic trouble-shooting techniques you can apply when things don’t go according to plan are just as important as the plan itself.
The purpose of this lab is to introduce you to working with R, and with resources and techniques for problem solving while using R.
After completing this lab, you should understand
The RStudio interface
How to use R as a calculator
How to store the results of your calculations as objects in the R environment
How to interpret error messages from R
How to find the built-in help resources in R
Identify working directories and navigate file paths
To help you complete the exercises, this lab has an accompanying template file. You should download the template file at the start of the lab, move it into your course project, and open the template in RStudio. That way, you can answer the questions as you go through this lab.
Lab instructions
Setting up
Download the Lab 2 template. Each question in the template Quarto file corresponds to the examples contained in this file.
Before you open the template, use your computer’s file explorer to move the template file you just downloaded out of your Downloads folder and into your CMSC 121 folder.
Then, open RStudio, and make sure you’re working in the CMSC 121 project. If the
Project
pane in the top right of your RStudio window doesn’t show “CMSC 121”, go toOpen Project
as shown below and navigate to your project.
Finally, open the template file in RStudio. The easiest way to do this is to go to the Files pane in RStudio, find the lab_02_problem_solving.qmd
file from the list, and click on it.
If you don’t see a file named lab_02_problem_solving.qmd
in the list, make sure you did steps 2 and 3 correctly. If you’ve double-checked your steps, and still don’t see it, ask your instructor for help.
R as a calculator
The first thing you can try to do with R is a familiar one - simply use it like a calculator. It is easy to forget because of all the other things it can do, but R at the lowest level is still just working with numbers.
Storing information in objects
Right now, the outputs of our code (in this case, the number 7) are just being shown to us. That can be helpful, but isn’t really the point of a programming language. We want to be able to save our results and build on them as we go. We can do that by assigning the output of the code we run into an object.
You can think of objects as boxes that store things. To create an object, we have to ask R to take the results of some code and assign those results to an object. We do this using the assignment arrow, <-
. Let’s try it out.
You can see all of the objects you currently have in the upper right pane labeled Environment. At this point, you should see two objects in your environment: an object named result
(which as the value 7) and and object named applesauce
(which has the value 30). Objects can be named anything, as long as they 1) don’t start with a number, and 2) don’t contain any spaces.
Note that when you created result
and applesauce
, R didn’t print the results of 5 + 2
or 10 * 3
in the console. This is because creating an object is separate from showing it. Showing results is often called printing. In order to print an object, you can just write the name of it. When you run that, R will show you the object.
Objects are useful to use because we can use their names to represent their values in our code. For example, we can now add result
and applesauce
together, because result
represents the value 7 and applesauce
represents the value 30. Try it for yourself:
We can even change what is stored inside those objects, then run the same code to get different results!
R will never save information in your environment unless you use the <-
operator to store it. Printing information in the console is not the same as storing information for later!
R Functions and Arguments
Functions in R are like imperative sentences (e.g. “go”,“stay”, or “sleep”). They indicate to R that it should take some form of action. Telling R to use a function is described as “calling” a function. To call a function, its name must be followed by an opening and closing parentheses. For example, the Sys.Date
function, followed by an opening and closing parenthesis, tells R to output the current date.
Sys.Date()
Now imagine we approached someone with an imperative sentence like “Close” or “Bring”. Their next question would likely be “Close what?” or “Bring what?”. In response, we might clarify what exactly they should be closing or bringing, e.g., “Close the door” or “Bring dessert”.
We face a similar need to be specific when calling functions in . The majority of functions we call need specific inputs in order for the instructions to make sense to . These inputs that specify the “what” or the “how” to a function are called arguments. Arguments to a function are always placed inside of the parentheses that follow a function.
The sum()
function adds up all of its arguments, and outputs the grand total. In the Console, enter sum(5, 2)
and pressing Enter. What were the arguments you passed to the sum()
function?
R Editor Panel
Up to this point, we have been writing R code in the Console pane. You’ll notice that when you direct your attention to the console, it will look like this:
>
When we execute code in the console, we get a result immediately. The console is a great space to work when we need to run a piece of code only once. However, the console is not a great space to edit code or to save code that we’ve written. This is why most of the code that you will write in this course will be composed in the editor pane (which is typically located directly above the console). You’ll notice that when you direct your attention to the editor, the lab template for this lab will be open. In the open file, you will see text after a series of line numbers that look like this:
1
2
3
This is a file where you can compose, edit, and save code. In this file, you can also write text that explains the rationale for the code you’ve written, interprets the results, or offers additional contextual information regarding the data analysis.
Code Chunks
RStudio needs a way to know when we are switching from writing text about our code to actually writing code that we wish to execute. This is where code chunks come in to play. A code chunk is a space in a .qmd
file where we can write code that we wish to execute. There are a few ways to create code chunks in RStudio:
First, we can place our cursors on a new line in the editor pane and type in the characters
to indicate the start of a code chunk. We can press the Enter key a few times to add some space for our code to go, and then type in the characters to indicate the end of a code chunk. You’ll notice that when you do this, it creates a grey box with a green triangle icon in the upper right hand corner. The box will look something like this:
There’s another, simpler way to create a code chunk. Direct your attention to the upper right hand corner of your editor pane. You’ll notice a green icon with a plus sign (+) and the letter ‘C’, like this
If you place your cursor on a new line and click that button, a code chunk will be automatically created on that line in the file.
In this course, almost all of the code that we compose will be written in code chunks within .qmd
files.
Code
result <- 5 + 2
applesauce <- 10 * 3
Just like when you ran this same code in the Console, here you’ve saved the result of 5 + 2
and the result of 10 * 3
to objects in your environment. You’ve also evaluated the result of adding the values stored in these two objects.
However, unlike when you ran this in the Console, this time around you have the ability to edit your code. For example, you could place your cursor on the line where you created the object for result
, change the number 5
to 4
, and then re-run the code to get a new result. You can also save this file so that you can re-run this code at a later date.
Commenting
Any text that you enter into a code chunk must be executable R code. For instance, check out what happens when we try to write some instructions in “plain English” at the bottom of our code:
<- 5 + 2
result <- 10 * 3
applesauce Place cursor here.
We get an error because the words “Place cursor here” are not valid R code. If we want to add explanatory text to a code chunk, we need to preface that text with a hashtag (#) like this:
# This is a comment
This indicates that all of the characters following the hashtag should be ignored when it comes time to run the code.
Adding comments in code chunks is helpful for explaining what a piece of code is doing. This is important when sharing your code with others, or even reminding your future self why you took a certain approach. Check out how we use a comment to explain the code below:
# Below we create a vector of colors. To create a vector, we use the function c(), which is short for combine.
<- c("red", "blue", "green") colors
Interpreting Error Messages
Often, something won’t go as planned. R has several ways to communicate with us, the most common of which are error messages. Error messages appear when something doesn’t work as anticipated. It is tempting to see them as denoting failure, but that is not the case. Error messages are a way for R to alert us to potential problems; they are R’s way of trying to communicate with us. The real danger comes where R does not tell us when something goes wrong.
First, it’s important to make some distinctions between the kinds of messages that R presents to us when attempting to run code:
- Errors
-
Terminate a process that we are trying to run in R. They arise when it is not possible for R to continue evaluating a function. Like when we try to add letters together.
- Warnings
-
Don’t terminate a process but are meant to warn us that there may be an issue with our code and its output. They arise when R recognizes potential problems with the code we’ve supplied.
- Messages
-
Also don’t terminate a process and don’t necessarily indicate a problem but simply provide us with more potentially helpful information about the code we’ve supplied.
Check out the differences between an error and a warning in R by reviewing the output in the Console when you run the following code chunks.
Error in R
sum("3", "4")
Error in sum("3", "4"): invalid 'type' (character) of argument
Warning in R
<- c(1, 2, 3, 4, 5)
vector1 <- c(2, 4, 6, 8)
vector2 + vector2 vector1
Warning in vector1 + vector2: longer object length is not a multiple of shorter
object length
[1] 3 6 9 12 7
So what should you do when you get an error message? How should you interpret it? Luckily, there are some clues and standardized components of the message the indicate why R can’t execute the code. Consider the following error message that you received when running the code above:
Error in sum("3", "4") : invalid 'type' (character) of argument
There are three things we should pay attention to in this message:
The word “Error” indicates that this code did not run.
The text immediately after the word “in” tells us which specific function did not run.
The text after the colon gives us clues as why the code did not run.
Reviewing the error above, we can guess that there was a problem with the argument that we supplied to the sum()
function, and specifically that we supplied an argument of the wrong type. This is because R interprets “3” and “4” as character stings (i.e., words) and not numbers, and it doesn’t know how to add words together.
sum("3", "4")
Preparing to Get Help
When we do get errors in our code and need to ask for help in interpreting them, it’s important to provide collaborators with the information they need to help us. Sometimes when teaching R we will hear things like: “My code doesn’t work!” or “I’m stuck and don’t know what to do,” and it can be challenging to suss out the root of the issue without more information.
Notice the left side of this document has a series of numbers listed vertically next to each line? These are known as line numbers. Oftentimes, if you are having an issue with your code and ask me to review it, we will say something like: “Check out line 53.” By this we mean that you should scroll the document to the 53rd line. You can similarly tell me or your peers which line of your code you are struggling with.
Referencing Resources
There are a number of resources available to help you recall how certain functions work.
Help pages
R help pages can be a great resources when you know the function you need to use, but can’t remember how to apply it or what its parameters are. Help pages typically include a description of the function, its arguments and their names, details about the function, the values it produces, a list of related functions, and examples of its use. You can access the help pages for a function by typing the name of the function with a question mark in front of it into our Console (e.g. ?log
or ?sum
).
Cheatsheets
The R community has developed a series of cheatsheets that list the functions made available through some packages and their arguments. Cheatsheets are a great resources when you know what you need to do and what tools to use on a dataset in R, but can’t recall the function that enables you to do it.
Imagine that you needed to create a vector object representing the first 50 even numbers. You could do this by typing in the first 50 even numbers and combining them into a vector with the c()
function, like this:
c(2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34,
36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66,
68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98,
100
)
But, this would be quite tedious and error prone. A better approach would be using an R function to generate such a sequence automatically.
Searching the Web
You can also search the web when you get errors in your code. Others have likely experienced that error before and gotten help from communities of data analysts and programmers. You should use these resources to take notes and learn how to improve and revise code. Any time you reference Stack Overflow or any other Web resource to help you figure out an answer to a problem, you should cite that resource in your code. Creating a citation in your files will also help you keep track of useful resources for problem solving in the future. Here is how you would cite that post in APA format:
Username of asker. (Year, Month Date). Title of page. Stack Overflow. Webpage URL
File Paths and Organization
Computers allow you to save files, like Word documents or PDFs, so that you can access them whenever you need to. But, your computer doesn’t force all your files to be saved together in one place; your computer’s storage is divided into folders. For example, your computer comes with a folder named Downloads
(where files downloaded from the internet are saved by default), and a folder named Documents
(where new Word documents are saved by default).
Files that are saved on computer are called local files, while files that are elsewhere are called remote. Files that are in folders for syncing services like Dropbox, Google Drive, or similar services exist in an in-between, in that they are viable on your computer, but are downloaded on-demand whenever you try to open them. This can cause problems for R, so it is suggested you keep your code folders in non-synced locations.
The folders in your computer’s storage are arranged in a hierarchy, with one folder at the “top” of the hierarchy and others nested below them. Folders at lower levels of the the hierarchy are stored within the folders at higher levels. For example, your Downloads
and Documents
folders are actually nested within your user’s home folder (which is a folder you may not commonly encounter when browsing around your computer).
A good way to think about how your computer’s hierarchy of folders (usually called a file system) is organized is like a kitchen. You have access to everything, but you don’t want all your forks, spoons, plates, etc. all in one big pile on the floor. You organize them so they are easy to find and store. You may start with broad categories–you have a drawer for all your utensils–but inside that drawer it is broken down into smaller spaces; a spot for forks, spoons, knives, etc. The same principle is true of files. You create ever more specific spaces for things so they are easy to find later.
Making effective use of your file system
Unlike a kitchen, you can create new spaces for files whenever you want in the form of folders. For example, on the first day of class you made a special folder called an R Project, which we’ll learn more about in a minute. The power to create new folders is a useful organizational tool; it allows you to keep related files grouped together, but isolated from other irrelevant files.
It may be tempting to just store all your files in one place, like your Downloads
folder or your Desktop
. After all, it’s fewer clicks (or perhaps 0 clicks!) when you’re saving your work, and you can quickly accessing those folders. But, with all your files from all your classes in one place, you can’t quickly find what you need, just like if all your kitchen ware were in one giant pile. Also, if everything is in one place you can’t easily name your files, since all the names have to be unique (or you end up with several similar names like “Essay.docx”, “Essay (1).docx”, Essay (2).docx”, etc.). This makes searching for your files with the search bar tricky.
Furthermore, other programs on your computer might access and change files in this location (e.g., your web browser might clear your Downloads
folder if you ask it to clean up space). For reasons like this, it’s a good idea to make folders on your computer to store work for different tasks. For example, it’s probably a good idea to make a folder for each of your classes inside your Documents
folder. And it’s probably a good idea to make a folder just for homework assignments inside each of your class folders.
R Projects & The working directory
Keeping track of where you save and store files on your computer is important to statisticians and data scientists who write code beyond just good organization: you need to tell R where files are so it can work with them. Without knowing exactly where these files are and what their exact names are, you won’t be able to access them. Often, the easiest way to make sure you can access files from within R is to make sure R’s working directory is the same folder as your files are in. The easiest way to do this is by making sure that you are in an RStudio Project, like the one from Lab 1.
While it’s not obvious, R is assigned a “working directory”: a folder on your computer where it is looking for files. Opening an RStudio Project sets R’s working directory to the Project directory. You can check and see what working directory R is currently watching by going into the Console, and using the command getwd()
, which stands for “get working directory,” in the console.
getwd()
[1] "/Users/jayala/Documents/00_Bard/cmsc121/intro_da_r"
The getwd()
function prints out the path to your current working directory. Mine will be different than your because we are on different computers! Think of paths as addresses that give directions to get to a specific folder on your computer. For example, the result from getwd()
shown above says to get to the folder where the R program is currently watching on my computer, you have to:
Start the very top of the folder hierarchy
Then go to the folder name
Users
Then go to the folder named
jayala
Then go to the folder named
Documents
Then go to the folder named
00_Bard
Then go to the folder named
cmsc121-fall-2024
Since this particular path begins at the very top of the folder hierarchy (you can tell by the /
at the very start), it’s called an absolute path. If you paste this command into your R console, we can expect the path to your current working directory will be different, since your computer isn’t an exact copy of mine!
Why does the working directory matter?
For R to work on any files, it needs to know where they are. R Projects help R do that, because it will start looking for files inside the project folder, rather than way up at your Home or Users folder. Let’s work through an example to see how this helps.
Go to the File
menu in the upper left corner of your screen, and click on New File > Text File
. This will open a new text file in R Studio. Inside this text file, Copy the sentence: “R is helpful, but needs very specific directions.” After you enter this sentence, hit the return key to move to the next line of the file. Finally, save this file inside your CMSC 121 project folder by going to the File
menu in the upper left again, and clicking Save As ...
, call it paths.txt
. If you have your CMSC 121 project open, you should be able to see your new file in the files pane in the lower right pane of R Studio.
Say we wanted R to read this file for us, we have two options: give R the absolute path starting at our home directory, or the relative path, starting from our working directory. Where the absolute path to this file might look like:
"/Users/jayala/Documents/Classes/CMSC 121/paths.txt"
Because the file is a part of the RStudio project, the relative path could cut out everything before the project folder, meaning all we would need for R to find the file is:
"paths.txt"
Try the same thing, but use the relative path instead of the absolute path.
Submit on Brightspace
- The rendered version of your Quarto file as an .html file
lab_2_problem_solving.html
References
SDS 100 at Smith College