Lab

1 The basics

1 The basics

You can either download the lab as an RMarkdown file here, or copy and paste the code as we go into a .R script. Either way, save it into the 01-week folder where you completed the exercises!

1.1 Navigating RStudio

Besides the obvious benefits of seeing your environment, console, and scripts all in the same window, RStudio offers a lot of other helpful features. Here we’ll go over some that you might find helpful now, but no that there are lots more as well! And if you don’t like the default keyboard shortcuts, you can change them!

Here is a list of some helpful ones. If two keys are given, separated by /, the first is for Macs and the second for Windows. Try them out with the code below!

Run a line of code with cmd/ctrl + enter.

x <- 3

Insert the assignment arrow <- with opt/alt + -.

# write the same code above, but with the arrow shortcut

Restart the R session with cmd/ctrl + shift + F10.

Try it out!

Comment/uncomment lines with cmd/ctrl + shift + c.

# comment the next line
this should be a comment

Autocomplete snippets… try it out by typing lib then enter once the autocomplete pops up.

# write library(tidyverse) using the snippet here:

As elsewhere, cmd/ctrl + f is find, but with cmd/ctrl + shift + f you can find across all the files in a directory.

Look for the word “basics” throughout the directory

Drag a cursor to extend across multiple lines with opt/alt.

# edit these dates to use - instead of /
dates <- c("2020/07/16",
           "2020/07/17",
           "2020/07/18",
           "2020/07/19")

Put a new cursor elsewhere by holding down opt/alt + cmd/ctrl as you click.

# edit all these numbers to be in the 200s, not 100s
numbers <- c(100, 101, 104, 109)

Reformat code with cmd/ctrl + shift + a.

# try it out here:
mat<-matrix(c(
234,7456,12,654,183,753
),nrow=2)

Navigate between previously run lines of code in the console with the up and down arrows. Add cmd/ctrl to look at the entire list.
Compile this document with cmd/ctrl + shift + k.
Open a new R script with cmd/ctrl + shift + n.

This blog post has even more!

1.2 This week’s exercises

# create a vector of numeric values
vals <- c(1, 645, 329)
vals

# run these lines of code one at a time and compare what each does
# what happens in your environment window? what about your console?
new_vals
c(13, 7245, 23, 49.32)
new_vals <- c(13, 7245, 23, 49.32)
new_vals

# create and view different types of vectors
chars <- c("dog", "cat", "rhino")
chars
logs <- c(TRUE, FALSE, FALSE)
logs

# create a matrix
mat <- matrix(c(234, 7456, 12, 654, 183, 753), nrow = 2)
mat

# pull out rows
mat[2, ]

Extract 645 from vals using square brackets.
Extract "rhino" from chars using square brackets.
You saw how to extract the second row of mat. Figure out how to extract the second column.
Extract 183 from mat using square brackets.
Figure out how to get the following errors: incorrect number of dimensions subscript out of bounds

1.3 Data in R

We’re using some data from the National Longitudinal Survey of Youth 1979, a cohort of American young adults aged 14-22 at enrollment in 1979. They continue to be followed to this day, and there is a wealth of publicly available data online. I’ve downloaded the answers to a survey question about whether respondents wear glasses, a scale about their eyesight with glasses, their (NLSY-assigned 😒) race/ethnicity, their sex, their family’s income in 1979, and their age at the birth of their first child.

Reading in data

I’ve saved the dataset as a csv file. We can read this into R using the read_csv() function, which is loaded with the tidyverse. For now we’ll load it from the internet. We’ll talk about other options for reading in data later in the course!

library(tidyverse)
nlsy <- read_csv("https://intro-to-R-2020.louisahsmith.com/data/nlsy_cc.csv")

We can explore the data with a number of functions that we apply to either the whole dataset, or to a single variable in the dataset. Here are a couple of ways we can look at the whole dataset:

nlsy

#>  # A tibble: 1,205 x 14
#>     glasses eyesight sleep_wkdy sleep_wknd    id nsibs  samp race_eth   sex region income
#>       <dbl>    <dbl>      <dbl>      <dbl> <dbl> <dbl> <dbl>    <dbl> <dbl>  <dbl>  <dbl>
#>   1       0        1          5          7     3     3     5        3     2      1  22390
#>   2       1        2          6          7     6     1     1        3     1      1  35000
#>   3       0        2          7          9     8     7     6        3     2      1   7227
#>   4       1        3          6          7    16     3     5        3     2      1  48000
#>   5       0        3         10         10    18     2     1        3     1      3   4510
#>   6       1        2          7          8    20     2     5        3     2      1  50000
#>   7       0        1          8          8    27     1     5        3     2      1  20000
#>   8       1        1          8          8    49     6     5        3     2      1  23900
#>   9       1        2          7          8    57     1     5        3     2      1  23289
#>  10       0        1          8          8    67     1     1        3     1      1  35000
#>  # … with 1,195 more rows, and 3 more variables: res_1980 <dbl>, res_2002 <dbl>, age_bir <dbl>

glimpse(nlsy)

#>  Rows: 1,205
#>  Columns: 14
#>  $ glasses    <dbl> 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0…
#>  $ eyesight   <dbl> 1, 2, 2, 3, 3, 2, 1, 1, 2, 1, 3, 5, 1, 1, 1, 1, 3, 2, 3, 3, 4, 2, 2, 5, 1…
#>  $ sleep_wkdy <dbl> 5, 6, 7, 6, 10, 7, 8, 8, 7, 8, 8, 7, 7, 7, 8, 7, 7, 8, 8, 8, 7, 6, 8, 7, …
#>  $ sleep_wknd <dbl> 7, 7, 9, 7, 10, 8, 8, 8, 8, 8, 8, 7, 8, 7, 8, 7, 4, 8, 8, 9, 7, 10, 8, 7,…
#>  $ id         <dbl> 3, 6, 8, 16, 18, 20, 27, 49, 57, 67, 86, 96, 97, 98, 117, 137, 172, 179, …
#>  $ nsibs      <dbl> 3, 1, 7, 3, 2, 2, 1, 6, 1, 1, 7, 2, 7, 2, 2, 4, 9, 2, 2, 2, 4, 2, 4, 4, 2…
#>  $ samp       <dbl> 5, 1, 6, 5, 1, 5, 5, 5, 5, 1, 7, 6, 5, 6, 1, 5, 6, 5, 5, 5, 8, 1, 7, 5, 5…
#>  $ race_eth   <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 3, 2, 3, 3…
#>  $ sex        <dbl> 2, 1, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2…
#>  $ region     <dbl> 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1…
#>  $ income     <dbl> 22390, 35000, 7227, 48000, 4510, 50000, 20000, 23900, 23289, 35000, 1688,…
#>  $ res_1980   <dbl> 11, 3, 11, 11, 11, 3, 11, 11, 11, 3, 11, 11, 11, 11, 6, 3, 11, 11, 3, 11,…
#>  $ res_2002   <dbl> 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 19, 11, 11, 11, 11, 11, 11, 11, 1…
#>  $ age_bir    <dbl> 19, 30, 17, 31, 19, 30, 27, 24, 21, 36, 17, 19, 29, 30, 26, 26, 35, 22, 3…

summary(nlsy)

#>      glasses          eyesight      sleep_wkdy       sleep_wknd           id       
#>   Min.   :0.0000   Min.   :1.00   Min.   : 0.000   Min.   : 0.000   Min.   :    3  
#>   1st Qu.:0.0000   1st Qu.:1.00   1st Qu.: 6.000   1st Qu.: 6.000   1st Qu.: 2317  
#>   Median :1.0000   Median :2.00   Median : 7.000   Median : 7.000   Median : 4744  
#>   Mean   :0.5178   Mean   :1.99   Mean   : 6.643   Mean   : 7.267   Mean   : 5229  
#>   3rd Qu.:1.0000   3rd Qu.:3.00   3rd Qu.: 8.000   3rd Qu.: 8.000   3rd Qu.: 7937  
#>   Max.   :1.0000   Max.   :5.00   Max.   :13.000   Max.   :14.000   Max.   :12667  
#>       nsibs             samp           race_eth          sex            region     
#>   Min.   : 0.000   Min.   : 1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
#>   1st Qu.: 2.000   1st Qu.: 4.000   1st Qu.:2.000   1st Qu.:1.000   1st Qu.:2.000  
#>   Median : 3.000   Median : 5.000   Median :3.000   Median :2.000   Median :3.000  
#>   Mean   : 3.937   Mean   : 7.002   Mean   :2.395   Mean   :1.584   Mean   :2.593  
#>   3rd Qu.: 5.000   3rd Qu.:11.000   3rd Qu.:3.000   3rd Qu.:2.000   3rd Qu.:3.000  
#>   Max.   :16.000   Max.   :20.000   Max.   :3.000   Max.   :2.000   Max.   :4.000  
#>       income         res_1980        res_2002        age_bir     
#>   Min.   :    0   Min.   : 1.00   Min.   : 5.00   Min.   :13.00  
#>   1st Qu.: 6000   1st Qu.:11.00   1st Qu.:11.00   1st Qu.:19.00  
#>   Median :11155   Median :11.00   Median :11.00   Median :22.00  
#>   Mean   :15289   Mean   : 9.14   Mean   :11.05   Mean   :23.45  
#>   3rd Qu.:20000   3rd Qu.:11.00   3rd Qu.:11.00   3rd Qu.:27.00  
#>   Max.   :75001   Max.   :16.00   Max.   :19.00   Max.   :52.00

# within the RStudio browser
View(nlsy)

In many functions in R, we refer to specific variables using dollar-sign notation. So to access the id variable in the nlsy dataset we’d type nlsy$id and all of the id numbers would print to the console. Don’t do this though, or 1000+ numbers will print out! Instead, we might look at the first or last few with head() or tail()

head(nlsy$id)

#>  [1]  3  6  8 16 18 20

tail(nlsy$sleep_wknd)

#>  [1] 12  8 12  5  7  5

We can use the summary() function on a single variable.

summary(nlsy$sleep_wkdy)

#>     Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    0.000   6.000   7.000   6.643   8.000  13.000

Many of the most basic functions in R are pretty straightforward:

table(nlsy$region)

#>  
#>    1   2   3   4 
#>  206 333 411 255

mean(nlsy$age_bir)

#>  [1] 23.44813

We can find out more information from the documentation:

help(cor)

And if you’re not sure what you’re looking for, there’s a ton of info elsewhere:

1.4 Group challenge exercises

How many people are in the NLSY? How many variables are in this dataset? What are two ways you can answer these questions using tools we’ve discussed?
Can you find an R function(s) we haven’t discussed that answers question 1? Feel free to Google! See how many ways you and your group can come up with!
What’s the standard deviation of the number of hours of sleep on weekends?
What’s the Spearman correlation between hours of sleep on weekends and weekdays in this data?
Try to read in the data from an Excel file (it should be possible even if you don’t have Excel on your computer!). It’s in a tab called data, but there’s a header as well. (It might help to open up in whatever spreadsheet program you have.) You’ll have to load the readxl package (you already installed with with tidyverse, but it doesn’t load automatically), and probably read some of the documentation: https://readxl.tidyverse.org.

# first, use this script to download the data to your current working directory
download.file("https://intro-to-R-2020.louisahsmith.com/data/nlsy_cc.xlsx",
              destfile = file.path(getwd(), "nlsy_cc.xlsx"))
# this will be the path argument you'll need
path <- "nlsy_cc.xlsx"
# the variables also still have the NLSY-assigned names, so you'll need these
col_names <- c("glasses", "eyesight", "sleep_wkdy", "sleep_wknd", "id", "nsibs", 
               "samp", "race_eth", "sex", "region", "income", "res_1980",
               "res_2002", "age_bir")

Last updated on July 26, 2020