# Programming in R: Iteration

In which we explore the basics of iteration through the lenses of functional and imperative programming by examining for loops, map functions, and more.

February 25, 2020 - 12 minute read -

Iteration, like functions, serve to reduce the amount of duplication in code. In R, this comes in two flavors: imperative programming, such as for and while loops, and functional programming, which avoids changing-state data.1

## Imperative Programming: For Loops

When creating for loops in R, it’s important to allocate sufficient space for new objects using the vector() function. For example, we create a new object that computes the median of each column of a data frame:

There are a couple of looping patterns worth noting:

1. for (i in seq_along(xs)) loops over indices; this is the most common way to construct for loops.
2. for (x in xs) loops over elements; this is useful if you only care about side effects like plots.
3. for (nm in names(xs)) loops over names; this is useful if you only need names

Most of the time, you will know what form the function output will take. However, if the length of the output is unknown, combine the data into a single object after the loop is complete:

• When generating an unknown number of results, save vectors to a list and combine them into a single vector with unlist()
• When generating a long string, save outputs in a character vector and combine them into a single string with paste(output, collapse = "")
• When generating a big data frame, save output to a list and combine them into a single data frame with bind_rows(output)

As an example exercise to demonstrate the syntax of for loops, let’s write a function that takes a data frame, and prints the mean of each numeric column next to the name of that column.

## Functional Programming: Map Functions

Map functions automate the pattern of looping over a vector, applying a function to each piece, and returning a new vector that’s the same length as the input. These functions are a part of the purrr package, which are implemented in C for speed. There is one map function for each type of output:

• map() makes a list
• map_lgl() makes a logical vector
• map_int() makes an integer vector
• map_dbl() makes a double vector
• map_chr() makes a character vector

Example 1: The most basic case – calculate the means of all columns of the mtcars dataset.

Example 2: Pass in a one-sided formula for anonymous functions. For each value of the number of cylinders on a car, what is the the linear relationship between miles per gallon and car weight?

Example 3: Extract named components from a function using a string. For each linear relationship between miles per gallon and car weight by the number of cylinders, what is the correlation?

Example 4: Select elements by position by passing in an integer. Select the elements in the second position of a list.

#### Dealing with mapping failures: safely(), possibly(), and quietly()

If any one of the mapping operations fails, then an error message will be returned with no output. There are three ways to deal with failures:

1. safely() returns a modified version of a function that returns a list of two elements – result and error.
1. possibly() always always succeeds, but gives a default value when there is an error.
1. quietly() captures printed output, messages, and warnings instead of errors.

#### Map a function over multiple arguments: map2() and pmap()

1. map2() allows you to iterate along two related inputs in parallel. As an example, we can draw a random sample from the normal distribution with three different mean and standard deviation pairs:
1. pmap(), on a similar note, iterates along a list of related inputs in parallel. If your arguments are the same length, it is best practice to store them in a data frame. For example, we can vary the mean, standard deviation, and the number of samples from a normal distribution:

#### Map multiple functions with multiple sets of arguments: invoke_map()

invoke_map() can invoke different functions with multiple parameters. Use a data frame to make matching pairs easier. For example, we can generate samples from three different types of distributions with their own inputs:

#### Walk: an alternative to map

When you want to call a function for its side effects, such as printing output to the screen or saving files to disk, you can use the walk functions: walk(), walk2(), pwalk().

For example, if you had a list of plots and a vector of file names, you can use the pwalk() function to save each file to teh corresponding location on disk.

Note: these functions all invisibly return the first argument, which makes them suitable for use in the middle of pipelines.

## Functional Programming: Other Patterns

There are a number of other functions in the purrr package that abstract over other types of for loops, including predicate functions, reduce, and accumulate.

#### Predicate functions check a condition, returning either TRUE or FALSE

• keep() keeps element where the predicate is true
• discard() discards elements where the predicate is false
• some() and every() determines if the predicate is true for any or all of the elements
• detect() finds the first element where true, detect_index returns its position
• head_while() and tail_while() take elements from the start r end of a vector while a predicate is true

#### Reduce and Accumulate use a binary function to simplify complex lists

reduce() takes a binary function – a function with two primary inputs – and applies it to a list until only one element is left.

accumulate() is the same as reduce except it keeps all of the interim results.

1. This post is meant for a person who is looking for a refresher on basic programming in R, and the content in this post is based on chapter twenty-one of R for Data Science by Hadley Wickham & Garrett Grolemund.