Brandon Walker

Data Scientist

Tutorial: The Pipe (%>%) Operator in R

3 minutes
October 3, 2018

If you are learning R, something that makes your code much more readable to yourself and others is making use of the pipe operator, which looks like this %>%. You can make use of the pipe operator by installing and loading either the magrittr package or the tidyverse package.

The pipe will take the object on the left and pass it in to the first argument of the function on the right. Line 2 and line 3 of the code below are thus equivalent.

library(tidyverse)

mean(iris$Sepal.Length) # line 2
## [1] 5.843333
iris$Sepal.Length %>% mean() # line 3
## [1] 5.843333

Now that you know what it is and what it does, let’s see the benefit by making a pivot table of a few of the columns of the iris data set without using the pipes. Notice how difficult it is to work out what is going on below.

filter(
    summarise(
        group_by(
            select(iris, Petal.Length, Petal.Width, Species),
        Species), 
    "Mean Petal Length" = mean(Petal.Length), 
    "Mean Petal Width" = mean(Petal.Width)), 
Species %in% c("versicolor", "virginica"))
## Warning: The `printer` argument is deprecated as of rlang 0.3.0.
## This warning is displayed once per session.
## # A tibble: 2 x 3
##   Species    `Mean Petal Length` `Mean Petal Width`
##   <fct>                    <dbl>              <dbl>
## 1 versicolor                4.26               1.33
## 2 virginica                 5.55               2.03

Now let’s try this with pipes.

iris %>% 
select(Petal.Length, Petal.Width, Species) %>%
group_by(Species) %>%
summarise("Mean Petal Length" = mean(Petal.Length),
          "Mean Petal Width" = mean(Petal.Width)) %>%
filter(Species %in% c("versicolor", "virginica"))
## # A tibble: 2 x 3
##   Species    `Mean Petal Length` `Mean Petal Width`
##   <fct>                    <dbl>              <dbl>
## 1 versicolor                4.26               1.33
## 2 virginica                 5.55               2.03

There are two primary benefits to using the pipe

  1. We know the order that every function is applied. First we select variables, then we group by species, then we summarise, then we filter. It’s also easier to identify what data we are working with since we see that iris was the data set that came first. I always appreciate it when someone uses pipes in their code as it makes it easier for me to review.
  2. It is now possible to comment out part of our code without really chaning. I’ll comment out the select statement below. Notice it still gives a useful output.
iris %>% 
# select(Petal.Length, Petal.Width, Species) %>%
group_by(Species) %>%
summarise("Mean Petal Length" = mean(Petal.Length),
          "Mean Petal Width" = mean(Petal.Width)) %>%
filter(Species %in% c("versicolor", "virginica"))
## # A tibble: 2 x 3
##   Species    `Mean Petal Length` `Mean Petal Width`
##   <fct>                    <dbl>              <dbl>
## 1 versicolor                4.26               1.33
## 2 virginica                 5.55               2.03

It is worth noting that the pipe operator works well because data is usually the first argument to a function. Almost all functions (if not all) from the tidyverse packages have data as there first argument.