Iteration, like functions, serve to reduce the amount of duplication in code. In R, this comes in two flavors: imperative programming, such as for and while loops, and functional programming, which avoids changing-state data.1
Imperative Programming: For Loops
When creating for loops in R, it’s important to allocate sufficient space for new objects using the
vector() function. For example, we create a new object that computes the median of each column of a data frame:
There are a couple of looping patterns worth noting:
for (i in seq_along(xs))loops over indices; this is the most common way to construct for loops.
for (x in xs)loops over elements; this is useful if you only care about side effects like plots.
for (nm in names(xs))loops over names; this is useful if you only need names
Most of the time, you will know what form the function output will take. However, if the length of the output is unknown, combine the data into a single object after the loop is complete:
- When generating an unknown number of results, save vectors to a list and combine them into a single vector with
- When generating a long string, save outputs in a character vector and combine them into a single string with
paste(output, collapse = "")
- When generating a big data frame, save output to a list and combine them into a single data frame with
As an example exercise to demonstrate the syntax of for loops, let’s write a function that takes a data frame, and prints the mean of each numeric column next to the name of that column.
Functional Programming: Map Functions
Map functions automate the pattern of looping over a vector, applying a function to each piece, and returning a new vector that’s the same length as the input. These functions are a part of the purrr package, which are implemented in C for speed. There is one map function for each type of output:
map()makes a list
map_lgl()makes a logical vector
map_int()makes an integer vector
map_dbl()makes a double vector
map_chr()makes a character vector
Example 1: The most basic case – calculate the means of all columns of the mtcars dataset.
Example 2: Pass in a one-sided formula for anonymous functions. For each value of the number of cylinders on a car, what is the the linear relationship between miles per gallon and car weight?
Example 3: Extract named components from a function using a string. For each linear relationship between miles per gallon and car weight by the number of cylinders, what is the correlation?
Example 4: Select elements by position by passing in an integer. Select the elements in the second position of a list.
Dealing with mapping failures: safely(), possibly(), and quietly()
If any one of the mapping operations fails, then an error message will be returned with no output. There are three ways to deal with failures:
safely()returns a modified version of a function that returns a list of two elements – result and error.
possibly()always always succeeds, but gives a default value when there is an error.
quietly()captures printed output, messages, and warnings instead of errors.
Map a function over multiple arguments: map2() and pmap()
map2()allows you to iterate along two related inputs in parallel. As an example, we can draw a random sample from the normal distribution with three different mean and standard deviation pairs:
pmap(), on a similar note, iterates along a list of related inputs in parallel. If your arguments are the same length, it is best practice to store them in a data frame. For example, we can vary the mean, standard deviation, and the number of samples from a normal distribution:
Map multiple functions with multiple sets of arguments: invoke_map()
invoke_map() can invoke different functions with multiple parameters. Use a data frame to make matching pairs easier. For example, we can generate samples from three different types of distributions with their own inputs:
Walk: an alternative to map
When you want to call a function for its side effects, such as printing output to the screen or saving files to disk, you can use the walk functions:
walk(), walk2(), pwalk().
For example, if you had a list of plots and a vector of file names, you can use the
pwalk() function to save each file to teh corresponding location on disk.
Note: these functions all invisibly return the first argument, which makes them suitable for use in the middle of pipelines.
Functional Programming: Other Patterns
There are a number of other functions in the purrr package that abstract over other types of for loops, including predicate functions, reduce, and accumulate.
Predicate functions check a condition, returning either TRUE or FALSE
keep()keeps element where the predicate is true
discard()discards elements where the predicate is false
every()determines if the predicate is true for any or all of the elements
detect()finds the first element where true, detect_index returns its position
tail_while()take elements from the start r end of a vector while a predicate is true
Reduce and Accumulate use a binary function to simplify complex lists
reduce() takes a binary function – a function with two primary inputs – and applies it to a list until only one element is left.
accumulate() is the same as reduce except it keeps all of the interim results.