Introduction to RWeek 1: The basicsLouisa SmithJuly 13 - July 171 / 26

Let's start with...

The basics

2 / 26

About this class

Non-credit
6 weeks
Watch the videos and do the exercises on your own (or with friends/classmates), come together for lab
Practice by yourself in between classes
Everything you need is at http://intro-to-r-2020.louisahsmith.com

You are not going to break anything!

3 / 26

About me

Rising 5th-year PhD candidate in Epidemiology
Started using R during my master's (so 6 years of experience); learned mostly by doing
Problem sets, manuscripts, slides, website all in R
Almost 100 R projects on my computer, over 1000 R scripts

4 / 26

About me

Rising 5th-year PhD candidate in Epidemiology
Started using R during my master's (so 6 years of experience); learned mostly by doing
Problem sets, manuscripts, slides, website all in R
Almost 100 R projects on my computer, over 1000 R scripts

I have to Google things literally every time I use R!

4 / 26

Plan

Week 1: The basics

Week 2: Figures

Week 3: Selecting, filtering, and mutating

Week 4: Grouping and tables

Week 5: Functions

Week 6: Analyze your data

5 / 26

An IDE for R

An integrated development environment is software that makes coding easier

see objects you've imported and created
autocomplete
syntax highlighting
run part or all of your code

Setup...

6 / 26

Your turn...

Install R
Install R Studio

7 / 26

8 / 26

Packages

Some functions are built into R
- mean(), lm(), table(), etc.
They actually come from built-in packages
- base, stats, graphics, etc.
Anyone (yes, anyone) build their own package to add to the functionality of R
- ggplot2, dplyr, data.table, survival, etc.

Image from Zhi Yang

9 / 26

Packages

You have to install a package once*

install.packages("survival")

You then have to load the package every time you want to use it

library(survival)

*Actually, with every new major R release, but we won't worry about that.

10 / 26

Packages

"You only have to buy the book once, but you have to go get it out of the bookshelf every time you want to read it."

install.packages("survival")
library(survival)
survfit(...)

Several days later...

library(survival)
coxph(...)

Demonstration...

11 / 26

Package details

When you use install.packages, packages are downloaded from CRAN (The Comprehensive R Archive Network)
- This is also where you downloaded R
Packages can be hosted lots of other places, such as Bioconductor (for bioinformatics), and Github (for personal projects or while still developing)
The folks at CRAN check to make things "work" in some sense, but don't check on the statistical methods...
- But because R is open-source, you can always read the code yourself
Two functions from different packages can have the same name... if you load them both, you may have some trouble

12 / 26

tidyverse

The same people who make RStudio also are responsible for a set of packages called the tidyverse

13 / 26

tidyverse

Running install.packages(tidyverse) actually downloads more than a dozen packages*
Running library(tidyverse) loads: ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, forcats
This is by no means the only way to manage your data, but I find that a lot of the time, it's the easiest and simplest way to get things done.

*See which ones at https://tidyverse.tidyverse.org

14 / 26

Your turn...

Install the tidyverse "package"
Load one of the tidyverse packages

15 / 26

R projects

my-project/
 ├─ my-project.Rproj
 ├─ README
 ├─ data/
 │   ├── raw/
 │   └── processed/
 ├─ code/
 ├─ results/
 │   ├── tables/
 │   ├── figures/
 │   └── output/
 └─ docs/

An .Rproj file is mostly just a placeholder. It remembers various options, and makes it easy to open a new RStudio session that starts up in the correct working directory. You never need to edit it directly.
A README file can just be a text file that includes notes for yourself or future users.
I like to have a folder for raw data -- which I never touch -- and a folder(s) for datasets that I create along the way.

16 / 26

This course

R-course/
 ├─ 01-week/
 │   ├── 01-week.Rproj
 │   ├── 01-exercises.R
 │   ├── 01-lab.Rmd
 │   ├── 01-slides.pdf
 │   └── data/
 │        └── nlsy.csv 
 ├─ 02-week/
 │   ├── 02-week.Rproj
 │   ├── 02-exercises.R
 │   ├── 02-lab.Rmd
 │   ├── 02-slides.pdf
 │   └── data/
 │        └── nhanes.xlsx 
 ├── 03-week/

Each week you'll download a zip file of some or all of the things you need for the week
- You may be adding more later!
Open the week's work by opening the .Rproj file
- This will ensure you're in the right working directory to easily access the data, etc.
  
  Demonstration...

17 / 26

Your turn...

Download the 01-week.zip file here
Open up the 01-week.Rproj file

18 / 26

R uses `<-` for assignment

Create an object vals that contains and sequence of numbers:

# create values
vals <- c(1, 645, 329)

Put your cursor at the end of the line and hit ctrl/cmd + enter.

Now vals holds those values.

We can see them again by running just the name (put your cursor after the name and press ctrl/cmd + enter again).

vals

## [1]   1 645 329

No assignment arrow means that the object will be printed to the console.

19 / 26

Types of data (classes)

We could also create a character vector:

chars <- c("dog", "cat", "rhino")
chars

## [1] "dog"   "cat"   "rhino"

Or a logical vector:

logs <- c(TRUE, FALSE, FALSE)
logs

## [1]  TRUE FALSE FALSE

We'll see more options as we go along!

20 / 26

Types of objects

We created vectors with the c() function (c stands for concatenate)

We could also create a matrix of values with the matrix() function:

# turn the vector of numbers into a 2-row matrix
mat <- matrix(c(234, 7456, 12, 654, 183, 753), nrow = 2)
mat

##      [,1] [,2] [,3]
## [1,]  234   12  183
## [2,] 7456  654  753

The numbers in square brackets are indices, which we can use to pull out values:

# extract second row
mat[2, ]

## [1] 7456  654  753

21 / 26

Dataframes

We usually do analysis in R with dataframes (or some variant).

Dataframes are basically like spreadsheets: columns are variables, and rows are observations.

gss_cat

## # A tibble: 21,483 x 9
##     year marital       age race  rincome      partyid       relig        denom        tvhours
##    <int> <fct>       <int> <fct> <fct>        <fct>         <fct>        <fct>          <int>
##  1  2000 Never marr…    26 White $8000 to 99… Ind,near rep  Protestant   Southern ba…      12
##  2  2000 Divorced       48 White $8000 to 99… Not str repu… Protestant   Baptist-dk …      NA
##  3  2000 Widowed        67 White Not applica… Independent   Protestant   No denomina…       2
##  4  2000 Never marr…    39 White Not applica… Ind,near rep  Orthodox-ch… Not applica…       4
##  5  2000 Divorced       25 White Not applica… Not str demo… None         Not applica…       1
##  6  2000 Married        25 White $20000 - 24… Strong democ… Protestant   Southern ba…      NA
##  7  2000 Never marr…    36 White $25000 or m… Not str repu… Christian    Not applica…       3
##  8  2000 Divorced       44 White $7000 to 79… Ind,near dem  Protestant   Lutheran-mo…      NA
##  9  2000 Married        44 White $25000 or m… Not str demo… Protestant   Other              0
## 10  2000 Married        47 White $25000 or m… Strong repub… Protestant   Southern ba…       3
## # … with 21,473 more rows

22 / 26

tibble???

23 / 26

 
tibbles are basically just pretty dataframesas_tibble(gss_cat)[, 1:4]

# A tibble: 21,483 x 4
    year marital         age race 
   <int> <fct>         <int> <fct>
2000 Never married    26 White
2000 Divorced         48 White
2000 Widowed          67 White
2000 Never married    39 White
2000 Divorced         25 White
2000 Married          25 White
2000 Never married    36 White
2000 Divorced         44 White
2000 Married          44 White
2000 Married          47 White
# … with 21,473 more rows
as.data.frame(gss_cat)[, 1:4]

   year       marital age  race
2000 Never married  26 White
2000      Divorced  48 White
2000       Widowed  67 White
2000 Never married  39 White
2000      Divorced  25 White
2000       Married  25 White
2000 Never married  36 White
2000      Divorced  44 White
2000       Married  44 White
2000       Married  47 White
2000       Married  53 White
2000       Married  52 White
2000       Married  52 White
2000       Married  51 White
2000      Divorced  52 White
2000       Married  40 Black
2000       Widowed  77 White
2000 Never married  44 White
2000       Married  40 White
2000       Married  45 Black
/ 26

 
and tibbles are the quickest and most intuitive way to make and read a datasetdat1 <- tibble(
  age = c(24, 76, 38),
  height_in = c(70, 64, 68),
  height_cm = height_in * 2.54
)
dat1

## # A tibble: 3 x 3
##     age height_in height_cm
##   <dbl>     <dbl>     <dbl>
## 1    24        70      178.
## 2    76        64      163.
## 3    38        68      173.
dat2 <- tribble(
  ~n, ~food, ~animal,
  39, "banana", "monkey",
  21, "milk", "cat",
  18, "bone", "dog"
)
dat2

## # A tibble: 3 x 3
##       n food   animal
##   <dbl> <chr>  <chr> 
## 1    39 banana monkey
## 2    21 milk   cat   
## 3    18 bone   dog
25 / 26

Your turn...

Work through the code in 01-week/01-todo.R

26 / 26

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help

Introduction to R

Week 1: The basics

Louisa Smith

July 13 - July 17

About this class

About me

About me

Plan

Week 1: The basics

Week 2: Figures

Week 3: Selecting, filtering, and mutating

Week 4: Grouping and tables

Week 5: Functions

Week 6: Analyze your data

An IDE for R

Packages

Packages

Packages

Package details

tidyverse

tidyverse

R projects

This course

R uses <- for assignment

Types of data (classes)

Types of objects

Dataframes

tibble???

tibbles are basically just pretty dataframes

and tibbles are the quickest and most intuitive way to make and read a dataset

Help

R uses `<-` for assignment