class: center, middle, inverse, title-slide # Introduction to R ## Week 6: Analyze your data ### Louisa Smith ### August 17 - August 21 --- class: inverse, center, middle <style type="text/css"> .webcam-wrapper{ height:0; //width: 175px; //height: 90px; //float: right; } </style> .hand-large[ Let's put it ] .larger[ ALL ] .hand-large[ together ] --- # Agenda ### Week 1: The basics ✅ -- ### Week 2: Figures ✅ -- ### Week 3: Selecting, filtering, and mutating ✅ -- ### Week 4: Grouping and tables ✅ -- ### Week 5: Functions ✅ -- ### Week 6: Analyze your data -- .go[Put everything you've learned into action, and more!] --- # Organization .pull-left[ ``` my-project/ ├─ my-project.Rproj ├─ README ├─ data/ │ ├── raw/ │ └── processed/ ├─ code/ ├─ results/ │ ├── tables/ │ ├── figures/ │ └── output/ └─ docs/ ``` ] .pull-right[ - An `.Rproj` file is mostly just a placeholder. It remembers various options, and makes it easy to open a new RStudio session that starts up in the correct working directory. You never need to edit it directly. - A README file can just be a text file that includes notes for yourself or future users. - I like to have a folder for raw data -- which I never touch -- and a folder(s) for datasets that I create along the way. ] --- # Referring to files with the `here` package .pull-left[.midi[ ```r source(here::here("code", "functions.R")) dat <- read_csv(here::here( "data", "raw", "data.csv")) p <- ggplot(dat) + geom_point(aes(x, y)) ggsave(plot = p, filename = here::here( "results", "figures", "fig.pdf")) ``` ]] .pull-right[ - The `here` package lets you refer to files without worrying too much about relative paths. - Construct file paths with reference to the top directory holding your `.Rproj` file. - `here::here("data", "raw", "data.csv")` for me, here, becomes `"/Users/louisahsmith/Google Drive/Teaching/R course/materials/data/raw/data.csv"` - But if I send you this file, it will become whatever file path *you* need it to be. ] --- # Referring to the `here` package .midi[ ```r here::here() ``` is equivalent to ```r library(here) here() ``` I just prefer to write out the package name whenever I need it, but you can load the package for your entire session if you want. Note that you can refer to any function without loading the whole package this way: ```r tableone::CreateTableOne() ``` is the same as ```r library(tableone) CreateTableOne() ``` ] --- class: big-code # The `source()` function Will run code from another file. ```r # run the code in script.R, assuming it's in my current working directory source("script.R") # run the code in my-project/code/functions.R from wherever I am in my-project source(here::here("code", "functions.R")) ``` All the objects will be created, packages loaded, etc. as if you had run the code directly from the console. --- # The `source()` function Can even run code directly from a URL. ```r source("https://raw.githubusercontent.com/louisahsmith/intro-to-R-2020/master/static/exercises/01-week/01-todo.R") ``` ``` ## Error in eval(ei, envir): object 'new_vals' not found ``` Remember the first week when I had you generating errors on purpose? - Reading code from another file can make it a bit harder to debug. - But it's nice when you have functions, etc. that you use a lot and want to include them at the start of every script. --- # Reading in data You could also begin your scripts by reading in your data via a data-cleaning file with `source()`. Each of these have different arguments that will allow you to read in specific columns only, skip rows, give the variables names, etc. There are also better options out there if your dataset is really big (look into the `data.table` or the `vroom` package), and if you have other types of data. ```r library(tidyverse) dat <- read_csv(`"data.csv"`) dat <- read_table(`"data.dat"`) dat <- read_rds(`"data.rds"`) dat <- readxl::read_excel(`"data.xlsx"`) dat <- haven::read_sas(`"data.sas7bdat"`) dat <- haven::read_stata(`"data.dta"`) dat <- googlesheets4::read_sheet(`"sheet-id"`) ``` --- # Saving your data But once you've cleaned your data and created your dataset, you probably just want to save a copy so you don't need to perform all your data cleaning functions every time you want to use it. - You can basically do the opposite of most of the `read` functions: `write`. - The one I usually use, if I'm creating data for myself, is `write_rds()`. It creates an R object you can read in with `read_rds()`, so you can guarantee nothing will change in between writing and reading. - If I'm sharing data, I usually use `write_csv()`. Note: these are the `tidyverse` versions of the functions, which have better defaults, are more consistent, and are just more likely to do what you want. The "base R" versions are: `read.csv()`, `write.csv()`, `readRDS()` and `saveRDS()`. --- # Analysis plan So my process might look like this: #### 1. Clean the raw data in `code/clean_data.R` ```r # read in the raw dataset my collaborators gave me dat <- read_csv(here::here("data", "raw", "dataset.csv")) # do whatever cleaning/subsetting I need to newdat <- dat %>% mutate(new_var = var * 2) %>% filter(age >= 40) %>% select(age, new_var) # save as an r object for later analysis write_rds(newdat, here::here("data", "processed", "over_40.rds")) ``` --- # Analysis plan <br> #### 2. In `code/main_analysis.R`, read in the clean data ```r dat <- read_rds(here::here("data", "processed", "over_40.rds")) # run a linear regression model model <- lm(new_var ~ age, data = dat) # save the model for making tables and figures later write_rds(model, here::here("results", "output", "linear_mod.rds")) ``` --- # Analysis plan <br> #### 3. In `code/make_tables.R`, make some tables ```r dat <- read_rds(here::here("data", "processed", "over_40.rds")) mod <- read_rds(here::here("data", "output", "linear_mod.rds")) # make a table 1 tab1 <- tableone::CreateTableOne(..., data = dat) # make a table of analysis results tab2 <- broom::tidy(model) ``` Depending on my needs, I might read these tables into an RMarkdown document, save them to a `.csv` file, etc. Now on to making figures, etc.... --- class: inverse .pull-left[ .huge-number[ 1 ] ] .hand-large[ <br> Your turn... ] .exercise[ Exercises 6.1: Change the file paths so the document knits. ] --- # Missing values - R uses `NA` for missing values - Unlike some other statistical software, it will return `NA` to any logical statement - This makes it somewhat harder to deal with but also harder to make mistakes ```r 3 < NA ``` ``` ## [1] NA ``` ```r mean(c(1, 2, NA)) ``` ``` ## [1] NA ``` ```r mean(c(1, 2, NA), na.rm = TRUE) ``` ``` ## [1] 1.5 ``` --- # Special `NA` functions Certain functions deal with missing values explicitly ```r vals <- c(1, 2, NA) is.na(vals) ``` ``` ## [1] FALSE FALSE TRUE ``` ```r anyNA(vals) ``` ``` ## [1] TRUE ``` ```r na.omit(vals) ``` ``` ## [1] 1 2 ``` --- # Creating `NA`s with `na_if()` .midi[ You might read in data that has special values to indicate missingness. In NLSY, -1 = Refused, -2 = Don't know, -3 = Invalid missing, -4 = Valid missing, -5 = Non-interview ] .pull-left-wide[ ```r nlsy[1, c("id", "glasses", "age_bir")] ``` ``` ## # A tibble: 1 x 3 ## id glasses age_bir ## <dbl> <dbl> <dbl> ## 1 1 -4 -5 ``` ```r nlsy_na <- nlsy %>% `na_if`(-1) %>% `na_if`(-2) %>% `na_if`(-3) %>% `na_if`(-4) %>% `na_if`(-5) nlsy_na[1, c("id", "glasses", "age_bir")] ``` ``` ## # A tibble: 1 x 3 ## id glasses age_bir ## <dbl> <dbl> <dbl> ## 1 1 NA NA ``` ] .pull-right-narrow[.midi[ This is obviously a bit annoying if you have a lot of values that indicate missingness. In that case, you may want to look into the [naniar package](https://cran.r-project.org/web/packages/naniar/vignettes/replace-with-na.html). ]] --- # More `na_if()` The `na_if()` strategy is generally the most useful if you're determining NA's over the course of your analysis, or if you have different NA values for different variables. ```r # we decide that person 2 is a mistake... nlsy_bad <- nlsy %>% mutate(id = na_if(id, 2)) nlsy_bad[1:3, c("id", "glasses", "age_bir")] ``` ``` ## # A tibble: 3 x 3 ## id glasses age_bir ## <dbl> <dbl> <dbl> ## 1 1 -4 -5 ## 2 NA 0 34 ## 3 3 0 19 ``` --- # Better: read in NA's directly Or, if you know a priori which values indicate missingness (e.g., "."), you can specify that when reading in the data. .large[ ```r nlsy <- read_csv(here::here("data", "nlsy.csv"), col_names = colnames_nlsy, skip = 1, * na = c("-1", "-2", "-3", "-4", "-5")) ``` ] - You have to write the values as strings, even if they're numbers - Caveat: This way you use the info about the reason for missingness. If that's important, read in the data first, create a variable for missingness reason (e.g., use `fct_recode()`), then changes the values to `NA`. --- # Complete cases .midi[ Sometimes you may just want to get rid of all the rows with missing values. ```r nrow(nlsy) ``` ``` ## [1] 12686 ``` ```r nlsy_cc <- nlsy %>% filter(`complete.cases`(nlsy)) nrow(nlsy_cc) ``` ``` ## [1] 1436 ``` ```r nlsy2 <- nlsy %>% `na.omit`() nrow(nlsy2) ``` ``` ## [1] 1436 ``` ] .go[Don't do this without good reason!] --- class: inverse .pull-left[ .huge-number[ 2 ] ] .hand-large[ <br> Your turn... ] .exercise[ Exercises 6.2: Create and exclude observations based on missing values. ] --- # Sharing your results ## First: some quick analysis ```r # load packages library(tidyverse) # must install with install.packages if haven't already library(broom) # for making pretty model output library(splines) # for adding splines # read in data nlsy_clean <- read_rds(here::here("data", "nlsy_clean.rds")) ``` - We're not going into many details because this isn't actually a statistical analysis class, but the [broom package](https://broom.tidymodels.org) is very helpful for regression model results! --- # Quick regression overview Model formulas will automatically make indicator variables for factors, with the first level the reference. An intercept will be included unless suppressed with `y ~ -1 + x`. ```r # linear regression (OLS) mod_lin1 <- `lm`(log_inc ~ age_bir + sex + race_eth, data = nlsy_clean) # another way to do linear regression (GLM) mod_lin2 <- `glm`(log_inc ~ age_bir + sex + race_eth, `family = gaussian(link = "identity")`, data = nlsy_clean) # logistic regression mod_log <- `glm`(glasses ~ eyesight + sex + race_eth, `family = binomial(link = "logit")`, data = nlsy_clean) # poisson regression mod_pois <- `glm`(nsibs ~ sleep_wkdy + sleep_wknd, `family = poisson(link = "log")`, data = nlsy_clean) ``` You can use the `survival` package for time-to-event models. --- # Quick regression overview, cont. - Create interactions with `*` (will automatically include main terms too). - Create polynomial terms with, e.g., `I(x^2)`. - Create splines with the splines package and the `ns()` functions (or other packages/functions). ```r mod_big <- glm(log_inc ~ sex `*` age_bir + nsibs + `I`(nsibs^2) + `ns`(sleep_wkdy, knots = 3), family = gaussian(link = "identity"), data = nlsy_clean) ``` - Like the `tidyverse` packages, you don't need to quote the variable names or use `data$variable` notation in model formulas --- # Look at results .midi[ ```r summary(mod_log) ``` ``` ## ## Call: ## glm(formula = glasses ~ eyesight + sex + race_eth, family = binomial(link = "logit"), ## data = nlsy_clean) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -1.4275 -1.0825 -0.8343 1.2221 1.6261 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -0.51260 0.06317 -8.114 4.89e-16 *** ## eyesightVery Good -0.07920 0.05359 -1.478 0.1394 ## eyesightGood -0.07146 0.06188 -1.155 0.2481 ## eyesightFair -0.21488 0.09105 -2.360 0.0183 * ## eyesightPoor 0.10558 0.18152 0.582 0.5608 ## sexFemale 0.69281 0.04493 15.420 < 2e-16 *** ## race_ethNon-Hispanic Black -0.28460 0.06493 -4.383 1.17e-05 *** ## race_ethNon-Black, Non-Hispanic 0.28528 0.05945 4.799 1.60e-06 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 11655 on 8443 degrees of freedom ## Residual deviance: 11281 on 8436 degrees of freedom ## (4242 observations deleted due to missingness) ## AIC: 11297 ## ## Number of Fisher Scoring iterations: 4 ``` ] --- # Look at results Or use the `tidy()` function from the `broom` package, which nicely summarizes all sorts of models. ```r # from the broom package tidy(mod_log) ``` ``` ## # A tibble: 8 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) -0.513 0.0632 -8.11 4.89e-16 ## 2 eyesightVery Good -0.0792 0.0536 -1.48 1.39e- 1 ## 3 eyesightGood -0.0715 0.0619 -1.15 2.48e- 1 ## 4 eyesightFair -0.215 0.0911 -2.36 1.83e- 2 ## 5 eyesightPoor 0.106 0.182 0.582 5.61e- 1 ## 6 sexFemale 0.693 0.0449 15.4 1.21e-53 ## 7 race_ethNon-Hispanic Black -0.285 0.0649 -4.38 1.17e- 5 ## 8 race_ethNon-Black, Non-Hispanic 0.285 0.0594 4.80 1.60e- 6 ``` --- # Pull off a coefficient ```r coef(mod_log) ``` ``` ## (Intercept) eyesightVery Good ## -0.51260479 -0.07920222 ## eyesightGood eyesightFair ## -0.07145996 -0.21487546 ## eyesightPoor sexFemale ## 0.10557518 0.69280825 ## race_ethNon-Hispanic Black race_ethNon-Black, Non-Hispanic ## -0.28460369 0.28527712 ``` .pull-left-narrow[ ```r coef(mod_log)[6] ``` ``` ## sexFemale ## 0.6928082 ``` ] .pull-right-wide[ ```r tidy(mod_log) %>% slice(6) %>% pull(estimate) ``` ``` ## [1] 0.6928082 ``` ] Reminder: if you have a model `\(\beta_0 + \beta_1x_1 +\)` etc., coefficient 6 is really `\(\beta_5\)`! --- # Pull off a coefficient by name ```r coef(mod_log)["sexFemale"] ``` ``` ## sexFemale ## 0.6928082 ``` ```r tidy(mod_log) %>% filter(term == "sexFemale") %>% pull(estimate) ``` ``` ## [1] 0.6928082 ``` --- # Creating new values .midi[ Since it's just a tibble/dataframe, you can create new columns! ```r # 95% confidence interval res_mod_log <- mod_log %>% tidy() %>% mutate(lci = estimate - 1.96 * std.error, uci = estimate + 1.96 * std.error) res_mod_log ``` ``` ## # A tibble: 8 x 7 ## term estimate std.error statistic p.value lci uci ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) -0.513 0.0632 -8.11 4.89e-16 -0.636 -0.389 ## 2 eyesightVery Good -0.0792 0.0536 -1.48 1.39e- 1 -0.184 0.0258 ## 3 eyesightGood -0.0715 0.0619 -1.15 2.48e- 1 -0.193 0.0498 ## 4 eyesightFair -0.215 0.0911 -2.36 1.83e- 2 -0.393 -0.0364 ## 5 eyesightPoor 0.106 0.182 0.582 5.61e- 1 -0.250 0.461 ## 6 sexFemale 0.693 0.0449 15.4 1.21e-53 0.605 0.781 ## 7 race_ethNon-Hispanic Black -0.285 0.0649 -4.38 1.17e- 5 -0.412 -0.157 ## 8 race_ethNon-Black, Non-Hispanic 0.285 0.0594 4.80 1.60e- 6 0.169 0.402 ``` We could also clean up the `term` variable, perhaps with `fct_recode()`. ] --- # Calculating ORs Since these are results from a logistic regression, we'll probably want to exponentiate the coefficients and their CIs. ```r res_mod_log <- res_mod_log %>% select(term, estimate, lci, uci) %>% filter(term != "(Intercept)") %>% # exponentiate all three columns at once! mutate(across(c(estimate, lci, uci), exp)) res_mod_log ``` ``` ## # A tibble: 7 x 4 ## term estimate lci uci ## <chr> <dbl> <dbl> <dbl> ## 1 eyesightVery Good 0.924 0.832 1.03 ## 2 eyesightGood 0.931 0.825 1.05 ## 3 eyesightFair 0.807 0.675 0.964 ## 4 eyesightPoor 1.11 0.779 1.59 ## 5 sexFemale 2.00 1.83 2.18 ## 6 race_ethNon-Hispanic Black 0.752 0.662 0.854 ## 7 race_ethNon-Black, Non-Hispanic 1.33 1.18 1.49 ``` --- # Confidence intervals with `str_glue()` .midi[ Now we want to combine the lower and upper CI limits. ```r res_mod_log %>% select(term, estimate, lci, uci) %>% filter(term != "(Intercept)") %>% mutate(ci = `str_glue`("({lci}, {uci})")) ``` ``` ## # A tibble: 7 x 5 ## term estimate lci uci ci ## <chr> <dbl> <dbl> <dbl> <glue> ## 1 eyesightVery Good 0.924 0.832 1.03 (0.831734362089001, 1.02617441582346) ## 2 eyesightGood 0.931 0.825 1.05 (0.824698936937584, 1.05107868439189) ## 3 eyesightFair 0.807 0.675 0.964 (0.674797666860532, 0.96424628337116) ## 4 eyesightPoor 1.11 0.779 1.59 (0.778639992096712, 1.58622477434497) ## 5 sexFemale 2.00 1.83 2.18 (1.8307856398006, 2.18337382956281) ## 6 race_ethNon-Hispanic Black 0.752 0.662 0.854 (0.662416384318316, 0.854408001346314) ## 7 race_ethNon-Black, Non-Hispanic 1.33 1.18 1.49 (1.18383962963298, 1.49449918933821) ``` We can paste text and R code (within `{}`) together with `str_glue()`. Everything goes in quotation marks. - This means we can use the `round()` function within the curly braces too! ] --- # More `str_glue()` .midi[ You can paste any R expression you want evaluated in the curly braces. You can break up chunks of your string to make it easier to read in your code. ```r str_glue( "The intercept from the regression is ", "{round(coef(lm(income~sex, data = nlsy_clean))[1])} and a random ", "number that I generated is {round(rnorm(1, 0, 1), 3)}." ) ``` ``` ## The intercept from the regression is 14880 and a random number that I generated is -1.469. ``` More functions are available in the `glue` package. For example, you could write a nice list of the regions in the data like this: ```r glue::glue_collapse(levels(nlsy_clean$region), sep = ", ", last = ", and ") ``` ``` ## Northeast, North central, South, and West ``` ] --- # Better: Create a function We want to take these values and print "OR (95% CI *LCI*, *UCI*)" for each one. Let's make a function to put together everything we've done so far! ```r ci_func <- function(estimate, lci, uci) { OR <- round(exp(estimate), 2) lci <- round(exp(lci), 2) uci <- round(exp(uci), 2) to_print <- str_glue("{OR} (95% CI {lci}, {uci})") return(to_print) } ``` Let's test on some made-up values: ```r ci_func(.2523421, -.142433, .851234) ``` ``` ## 1.29 (95% CI 0.87, 2.34) ``` --- # From start to finish .midi[ ```r new_mod <- glm(glasses ~ eyesight + sex, family = binomial(link = "logit"), data = nlsy_clean) tidy(new_mod) %>% filter(term != "(Intercept)") %>% # we don't care about this term mutate(lci = estimate - 1.96 * std.error, uci = estimate + 1.96 * std.error, OR = ci_func(estimate, lci, uci), p.value = scales::pvalue(p.value)) %>% # for formatting p-values select(term, OR, p.value) ``` ``` ## # A tibble: 5 x 3 ## term OR p.value ## <chr> <glue> <chr> ## 1 eyesightVery Good 0.9 (95% CI 0.81, 1) 0.042 ## 2 eyesightGood 0.88 (95% CI 0.78, 0.99) 0.041 ## 3 eyesightFair 0.74 (95% CI 0.62, 0.88) <0.001 ## 4 eyesightPoor 1.01 (95% CI 0.71, 1.44) 0.959 ## 5 sexFemale 1.99 (95% CI 1.82, 2.17) <0.001 ``` ] .footnote[If you want 2 decimal places no matter what, use something like `format(round(x, digits = 2), nsmall = 2)`] --- # Important R lesson #### Whatever you want to write code to do, someone else has already probably done it The `tidy()` function could have actually done the exponentiating and confidence interval calculating for us. See `help(tidy.glm)`. But much more fun to do it ourselves 😉 ```r tidy(new_mod, conf.int = TRUE, exponentiate = TRUE) ``` ``` ## # A tibble: 6 x 7 ## term estimate std.error statistic p.value conf.low conf.high ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 0.655 0.0430 -9.85 6.59e-23 0.602 0.712 ## 2 eyesightVery Good 0.898 0.0531 -2.04 4.18e- 2 0.809 0.996 ## 3 eyesightGood 0.882 0.0612 -2.05 4.06e- 2 0.782 0.995 ## 4 eyesightFair 0.738 0.0902 -3.37 7.58e- 4 0.618 0.880 ## 5 eyesightPoor 1.01 0.181 0.0512 9.59e- 1 0.708 1.44 ## 6 sexFemale 1.99 0.0446 15.4 8.85e-54 1.82 2.17 ``` .footnote[If you're wondering why these confidence intervals differ slightly from the last slide, these are likelihood-based and ours were Wald confidence intervals. Don't worry if you don't know what this means!] --- # Final challenge ## Data analysis from start to finish 1. Prepare and organize your project 2. Load and clean the data 3. Do some exploratory analysis (table 1, figure) 4. Do some regression analysis (results table, figure) .large[.go[You can do it!]] --- # Prepare your project #### Do something totally outside the folders you've used for the course material! - File -> New Project -> New Directory -> New Project - Name it something like NLSY and put it in an appropriate folder on your computer - Within that folder, make new folders as follows: ``` NLSY/ ├─ NLSY.Rproj ├─ data/ │ ├── raw/ │ └── processed/ ├─ code/ └── results/ ├── tables/ └── figures/ ``` --- # Prepare the data - Download the linked dataset and save into `data/raw`. - Create a new file and save it as `clean_data.R`. - In that file, read in the NLSY data and load any packages you need. Make sure you replace any missing values with NA. Hint: there are extra missing values in the some variables. - Add factor labels as necessary. Select the factor variables plus `income`, `id`, and 2 others of your choosing. Then restrict to complete cases and people with incomes < $30,000. Make a variable for the log of income (replace with NA if income <= 0). - Also in that file, save your new dataset as a `.rds` file to the `data/processed` folder. --- # Do some exploratory analysis - Create a file called `create_figure.R`. In this file, read in the cleaned dataset. Load any packages you need. Then make a `ggplot` figure of your choosing to show something about the distribution of the data. Save it to the `results/figures` folder as a `.png` file using the `ggsave()` function. - Create a file called `table_1.R`. In this file, read in the cleaned dataset and use the `tableone` package to create a table 1 with the variables of your choosing. Modify the following code to save it as a `.csv` file. Open it in Excel/Numbers/Google Sheets/etc. to make sure it worked. ```r tab1 <- CreateTableOne(...) %>% print() %>% as_tibble(rownames = "id") write_csv(tab1, ...) ``` --- # Do some regression analysis - In another file called `lin_reg.R`, read in the data and run the following linear regression: `lm(log_inc ~ sex + race_eth +{other variables of your choosing}, data = nlsy)`. Modify the CI function to produce a table of results for a *linear* regression. Add an argument `digits = `, with a default of 2, to allow you to choose the number of digits you'd like. Save it in a separate file called `functions.R`. Use `source()` to read in the function at the beginning of your script. - Save a table of your results as a `.csv` file. Make the names of the coefficients nice! - Using the results, use `ggplot` to make a figure. Use `geom_point()` for the point estimates and `geom_errorbar()` for the confidence intervals. It will look something like this: ```r ggplot(data) + geom_point(aes(x = , y = )) + geom_errorbar(aes(x = , ymin = , ymax = )) ``` - Save that figure as a `.pdf` using `ggsave()`. You may want to play around with the `height = ` and `width = ` arguments to make it look like you want. --- # Appendix: some other packages I like but haven't mentioned - `lubridate`: Work with dates and times really easily. (https://lubridate.tidyverse.org) - `janitor`: Helps clean variable names, etc. (http://sfirke.github.io/janitor/) - `furrr`: Speed up your code with parallel processing. (https://davisvaughan.github.io/furrr/) - `shiny`: Make interactive apps. I made http://selection-bias.louisahsmith.com in shiny. (http://shiny.rstudio.com) - `drake`: Pipeline for analysis. (https://docs.ropensci.org/drake) - `rvest`: Scrape data from websites. (https://rvest.tidyverse.org) --- class: inverse .pull-left[ .huge-number[ 3 ] ] .hand-large[ <br> Your turn... ] .exercise[ Exercises 6.3: Work through the challenge! ]