+ - 0:00:00
Notes for current slide
Notes for next slide

Introduction to R

Week 1: The basics

Louisa Smith

July 13 - July 17

1 / 26

Let's start with...

The basics

2 / 26

About this class

  • Non-credit

  • 6 weeks

  • Watch the videos and do the exercises on your own (or with friends/classmates), come together for lab

  • Practice by yourself in between classes

  • Everything you need is at http://intro-to-r-2020.louisahsmith.com

You are not going to break anything!

3 / 26

About me

  • Rising 5th-year PhD candidate in Epidemiology

  • Started using R during my master's (so 6 years of experience); learned mostly by doing

  • Problem sets, manuscripts, slides, website all in R

  • Almost 100 R projects on my computer, over 1000 R scripts

4 / 26

About me

  • Rising 5th-year PhD candidate in Epidemiology

  • Started using R during my master's (so 6 years of experience); learned mostly by doing

  • Problem sets, manuscripts, slides, website all in R

  • Almost 100 R projects on my computer, over 1000 R scripts

I have to Google things literally every time I use R!

4 / 26

Plan

Week 1: The basics

Week 2: Figures

Week 3: Selecting, filtering, and mutating

Week 4: Grouping and tables

Week 5: Functions

Week 6: Analyze your data

5 / 26

An IDE for R

An integrated development environment is software that makes coding easier

  • see objects you've imported and created
  • autocomplete
  • syntax highlighting
  • run part or all of your code




Setup...

6 / 26

1

Your turn...

  • Install R
  • Install R Studio
7 / 26
8 / 26
8 / 26

Packages

  • Some functions are built into R
    • mean(), lm(), table(), etc.
  • They actually come from built-in packages
    • base, stats, graphics, etc.
  • Anyone (yes, anyone) build their own package to add to the functionality of R
    • ggplot2, dplyr, data.table, survival, etc.

Image from Zhi Yang

9 / 26

Packages

  • You have to install a package once*
install.packages("survival")
  • You then have to load the package every time you want to use it
library(survival)
*Actually, with every new major R release, but we won't worry about that.
10 / 26

Packages

"You only have to buy the book once, but you have to go get it out of the bookshelf every time you want to read it."

install.packages("survival")
library(survival)
survfit(...)

Several days later...

library(survival)
coxph(...)

Demonstration...

11 / 26

Package details

  • When you use install.packages, packages are downloaded from CRAN (The Comprehensive R Archive Network)
    • This is also where you downloaded R
  • Packages can be hosted lots of other places, such as Bioconductor (for bioinformatics), and Github (for personal projects or while still developing)
  • The folks at CRAN check to make things "work" in some sense, but don't check on the statistical methods...
    • But because R is open-source, you can always read the code yourself
  • Two functions from different packages can have the same name... if you load them both, you may have some trouble
12 / 26

tidyverse

  • The same people who make RStudio also are responsible for a set of packages called the tidyverse

13 / 26

tidyverse

  • Running install.packages(tidyverse) actually downloads more than a dozen packages*
  • Running library(tidyverse) loads: ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, forcats
  • This is by no means the only way to manage your data, but I find that a lot of the time, it's the easiest and simplest way to get things done.

*See which ones at https://tidyverse.tidyverse.org

14 / 26

2

Your turn...

  • Install the tidyverse "package"
  • Load one of the tidyverse packages
15 / 26

R projects

my-project/
├─ my-project.Rproj
├─ README
├─ data/
│ ├── raw/
│ └── processed/
├─ code/
├─ results/
│ ├── tables/
│ ├── figures/
│ └── output/
└─ docs/
  • An .Rproj file is mostly just a placeholder. It remembers various options, and makes it easy to open a new RStudio session that starts up in the correct working directory. You never need to edit it directly.

  • A README file can just be a text file that includes notes for yourself or future users.

  • I like to have a folder for raw data -- which I never touch -- and a folder(s) for datasets that I create along the way.

16 / 26

This course

R-course/
├─ 01-week/
│ ├── 01-week.Rproj
│ ├── 01-exercises.R
│ ├── 01-lab.Rmd
│ ├── 01-slides.pdf
│ └── data/
│ └── nlsy.csv
├─ 02-week/
│ ├── 02-week.Rproj
│ ├── 02-exercises.R
│ ├── 02-lab.Rmd
│ ├── 02-slides.pdf
│ └── data/
│ └── nhanes.xlsx
├── 03-week/
  • Each week you'll download a zip file of some or all of the things you need for the week
    • You may be adding more later!
  • Open the week's work by opening the .Rproj file
    • This will ensure you're in the right working directory to easily access the data, etc.




      Demonstration...

17 / 26

3

Your turn...

  • Download the 01-week.zip file here
  • Open up the 01-week.Rproj file
18 / 26

R uses <- for assignment

Create an object vals that contains and sequence of numbers:

# create values
vals <- c(1, 645, 329)

Put your cursor at the end of the line and hit ctrl/cmd + enter.

Now vals holds those values.

We can see them again by running just the name (put your cursor after the name and press ctrl/cmd + enter again).

vals
## [1] 1 645 329

No assignment arrow means that the object will be printed to the console.

19 / 26

Types of data (classes)

We could also create a character vector:

chars <- c("dog", "cat", "rhino")
chars
## [1] "dog" "cat" "rhino"

Or a logical vector:

logs <- c(TRUE, FALSE, FALSE)
logs
## [1] TRUE FALSE FALSE

We'll see more options as we go along!

20 / 26

Types of objects

We created vectors with the c() function (c stands for concatenate)

We could also create a matrix of values with the matrix() function:

# turn the vector of numbers into a 2-row matrix
mat <- matrix(c(234, 7456, 12, 654, 183, 753), nrow = 2)
mat
## [,1] [,2] [,3]
## [1,] 234 12 183
## [2,] 7456 654 753

The numbers in square brackets are indices, which we can use to pull out values:

# extract second row
mat[2, ]
## [1] 7456 654 753
21 / 26

Dataframes

We usually do analysis in R with dataframes (or some variant).

Dataframes are basically like spreadsheets: columns are variables, and rows are observations.

gss_cat
## # A tibble: 21,483 x 9
## year marital age race rincome partyid relig denom tvhours
## <int> <fct> <int> <fct> <fct> <fct> <fct> <fct> <int>
## 1 2000 Never marr… 26 White $8000 to 99… Ind,near rep Protestant Southern ba… 12
## 2 2000 Divorced 48 White $8000 to 99… Not str repu… Protestant Baptist-dk … NA
## 3 2000 Widowed 67 White Not applica… Independent Protestant No denomina… 2
## 4 2000 Never marr… 39 White Not applica… Ind,near rep Orthodox-ch… Not applica… 4
## 5 2000 Divorced 25 White Not applica… Not str demo… None Not applica… 1
## 6 2000 Married 25 White $20000 - 24… Strong democ… Protestant Southern ba… NA
## 7 2000 Never marr… 36 White $25000 or m… Not str repu… Christian Not applica… 3
## 8 2000 Divorced 44 White $7000 to 79… Ind,near dem Protestant Lutheran-mo… NA
## 9 2000 Married 44 White $25000 or m… Not str demo… Protestant Other 0
## 10 2000 Married 47 White $25000 or m… Strong repub… Protestant Southern ba… 3
## # … with 21,473 more rows
22 / 26

tibble???

23 / 26

tibbles are basically just pretty dataframes

as_tibble(gss_cat)[, 1:4]
# A tibble: 21,483 x 4
year marital age race
<int> <fct> <int> <fct>
1 2000 Never married 26 White
2 2000 Divorced 48 White
3 2000 Widowed 67 White
4 2000 Never married 39 White
5 2000 Divorced 25 White
6 2000 Married 25 White
7 2000 Never married 36 White
8 2000 Divorced 44 White
9 2000 Married 44 White
10 2000 Married 47 White
# … with 21,473 more rows
as.data.frame(gss_cat)[, 1:4]
year marital age race
1 2000 Never married 26 White
2 2000 Divorced 48 White
3 2000 Widowed 67 White
4 2000 Never married 39 White
5 2000 Divorced 25 White
6 2000 Married 25 White
7 2000 Never married 36 White
8 2000 Divorced 44 White
9 2000 Married 44 White
10 2000 Married 47 White
11 2000 Married 53 White
12 2000 Married 52 White
13 2000 Married 52 White
14 2000 Married 51 White
15 2000 Divorced 52 White
16 2000 Married 40 Black
17 2000 Widowed 77 White
18 2000 Never married 44 White
19 2000 Married 40 White
20 2000 Married 45 Black
24 / 26

and tibbles are the quickest and most intuitive way to make and read a dataset

dat1 <- tibble(
age = c(24, 76, 38),
height_in = c(70, 64, 68),
height_cm = height_in * 2.54
)
dat1
## # A tibble: 3 x 3
## age height_in height_cm
## <dbl> <dbl> <dbl>
## 1 24 70 178.
## 2 76 64 163.
## 3 38 68 173.
dat2 <- tribble(
~n, ~food, ~animal,
39, "banana", "monkey",
21, "milk", "cat",
18, "bone", "dog"
)
dat2
## # A tibble: 3 x 3
## n food animal
## <dbl> <chr> <chr>
## 1 39 banana monkey
## 2 21 milk cat
## 3 18 bone dog
25 / 26

4

Your turn...

  • Work through the code in 01-week/01-todo.R
26 / 26

Let's start with...

The basics

2 / 26
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow