Lab

5 Functions

You can either download the lab as an RMarkdown file here, or copy and paste the code as we go into a .R script. Either way, save it into the 05-week folder where you completed the exercises!

As always, we’ll be using the tidyverse package and the NLSY data.

library(tidyverse)
nlsy <- read_csv("nlsy_cc.csv")

5.1 Functions in RStudio

As with everything else, there are some tricks to make your life easier when using functions in RStudio.

Let’s say you have been writing some code, and you realize you want to make it into a function:

y <- x * 2
z <- exp(y)
mean(c(x, y, z))

If you highlight the code and press ctrl + alt + x on Windows or cmd + option + x on a Mac, you can automatically convert it into a function:

weird_func <- function(x) {
  y <- x * 2
  z <- exp(y)
  mean(c(x, y, z))
}

This can be helpful for a couple of reasons: if you don’t remember the syntax for a function, if you don’t want to deal with indenting, etc. and especially if you aren’t sure what you need as arguments to your function. Careful, though: it’s not great at distinguishing between objects and variable names, so it might try to add arguments that you don’t actually need:

nlsy %>%
  mutate(only = case_when(
                nsibs == 0 ~ "yes",
                TRUE ~ "no"
                )
         ) %>%
  select(id, contains("sleep"), only) %>%
  filter(only == "yes")

Another trick is F2: use it to go directly to the source code of a function. If it’s in your R script, it will go there, or else it will open up another tab where you can view it.

weird_func
read_csv

It can be really helpful to see how other people write functions as you’re learning to write your own!

5.2 Writing functions

Raise to any power

Make a function that uses two arguments, x for a number, and power for the power. Call it raise().

raise <- function() {
  
}

# test with
raise(x = 2, power = 4)
# should give you
2^4

Default arguments

Change your raise() function to default to squaring x when the user doesn’t enter a value for power.

# test
raise(x = 5)
# should give you
5^2

Functions for data

Write a function to calculate the stratified mean income for grouping variable var. In other words, write a function such that mean_group_inc(var = "sex") and mean_group_inc(var = "glasses") produce the results above.

Look at the function from the slides for help:

var_q <- function(q, var) {
  quant <- nlsy %>%
    rename(new_var = var) %>% #<<
    summarize(q_var = quantile(new_var, probs = q))
  return(quant)
}
var_q(q = 0.5, var = "income")

Write your function here:

mean_group_inc <- function(var) {
  
}

# test with
mean_group_inc(var = "glasses")
mean_group_inc(var = "sex")

Rewrite your function to accept two arguments: group_var to determine what the grouping variable is, and mean_var to determine what variable you want to take the mean of (e.g., mean_group(group_var = "sex", mean_var = "income") should give you the same results as above).

mean_group <- function(group_var, mean_var) {
  
}

# test with
mean_group(group_var = "sex", mean_var = "income")

5.3 For loops

Write a for loop

We used this function:

var_q_new <- function(q, var) {
  quant <- nlsy %>%
    rename(new_var = var) %>%
    summarize(q_var = quantile(new_var, probs = q)) %>%
    pull(q_var)
  return(quant)
}
var_q_new(q = 0.5, var = "income")
#>    50% 
#>  11155

inside of a for loop in order to calculate each decile of income:

qs <- seq(0.1, 0.9, by = 0.1)
deciles <- rep(NA, length(qs))
for (i in seq_along(qs)) {
  deciles[[i]] <- var_q_new(q = qs[[i]], 
                        var = "income")
}
deciles
#>  [1]  3177.2  5025.6  6907.2  9000.0 11155.0 14000.0 18053.6 23800.0 33024.0

Change the for loop above to loop over different variables instead of different quantiles. That is, calculate the 0.25 quantile for each of c("income", "age_bir", "nsibs") in a for loop.

vars <- c("income", "age_bir", "nsibs")
q_25s <- ...

Nested loops

You can nest for loops inside each other, as long as you use different iteration variables. Write a nested for loop to iterate over variables (with i) and quantiles (with j). You’ll need to start with an empty matrix instead of a vector, with rows indexed by i and columns by j. Calculate each of the deciles for each of the above variables.

vars <- c("income", "age_bir", "nsibs")
qs <- qs <- seq(0.1, 0.9, by = 0.1)
results_mat <- matrix(NA, ncol = length(qs), nrow = length(vars))

# helpful to print to see what's going on
for (i in vars) {
  for (j in qs) {
    print(c(i, j))
  }
}

for (i in seq_along(vars)) {
  for (j in seq_along(qs)) {
    print(var_q_new(q = qs[[j]], var = vars[[i]]))
  }
}

for (i in seq_along(vars)) {
  for (j in seq_along(qs)) {
    results_mat[i, j] <- var_q_new(q = qs[[j]], var = vars[[i]])
  }
}

results_mat

rownames(results_mat) <- vars
colnames(results_mat) <- qs

results_mat

5.4 Group work

Related to “for loops” are “while loops”. The latter don’t iterate a set number of times, but rather only as long as a condition is true. This is helpful when you don’t know how many times you’ll need to do something. For example, if I want to do something as long as x divided by 2 is less than 5, I could write:

x <- 0
while ((x / 2) < 5) {
  x <- x + 1
  print(x)
}
#>  [1] 1
#>  [1] 2
#>  [1] 3
#>  [1] 4
#>  [1] 5
#>  [1] 6
#>  [1] 7
#>  [1] 8
#>  [1] 9
#>  [1] 10

Be careful you don’t get stuck in an infinite loop! For example, if I had said while ((x / 2) >= 0), and started at 0, adding 1 each time, it would never not be true, and R would crash if I didn’t stop it!

As a harder example, imagine I wanted to find the Fibonacci sequence through 2-digit numbers:

x <- c(0, 1)
i <- 2
while (x[i] < 100) {
  x <- c(x, x[i - 1] + x[i])
  i <- i + 1
}
x
#>   [1]   0   1   1   2   3   5   8  13  21  34  55  89 144

While loops are a bit confusing, but we’ll make them fun by playing with the penguins again!

(See last week’s lab for more info on the palmerpenguins dataset and the artwork by Allison Horst.)!

It’s available in the palmerpenguins package, or we can download it directly here:

penguins <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-07-28/penguins.csv')

Challenge: You want to take some penguins home from Antarctica with you. Your plane can only hold 10,000 g of cargo. What is the greatest number of penguins from this dataset that you can take with you? Write a loop with while() to figure it out.

(Hint: You might want to sort the penguins by size first. There are a couple of ways to do this, one of which is with the arrange() function.)