What does %>% function mean in R?

633.7k views Asked by At

I have seen the use of %>% (percent greater than percent) function in some packages like dplyr and rvest. What does it mean? Is it a way to write closure blocks in R?

7

There are 7 answers

3
G. Grothendieck On BEST ANSWER

%...% operators

%>% has no builtin meaning but the user (or a package) is free to define operators of the form %whatever% in any way they like. For example, this function will return a string consisting of its left argument followed by a comma and space and then it's right argument.

"%,%" <- function(x, y) paste0(x, ", ", y)

# test run

"Hello" %,% "World"
## [1] "Hello, World"

The base of R provides %*% (matrix mulitiplication), %/% (integer division), %in% (is lhs a component of the rhs?), %o% (outer product) and %x% (kronecker product). It is not clear whether %% falls in this category or not but it represents modulo.

expm The R package, expm, defines a matrix power operator %^%. For an example see Matrix power in R .

operators The operators R package has defined a large number of such operators such as %!in% (for not %in%). See http://cran.r-project.org/web/packages/operators/operators.pdf

igraph This package defines %--% , %->% and %<-% to select edges.

lubridate This package defines %m+% and %m-% to add and subtract months and %--% to define an interval. igraph also defines %--% .

Pipes

magrittr In the case of %>% the magrittr R package has defined it as discussed in the magrittr vignette. See http://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html

magittr has also defined a number of other such operators too. See the Additional Pipe Operators section of the prior link which discusses %T>%, %<>% and %$% and http://cran.r-project.org/web/packages/magrittr/magrittr.pdf for even more details.

dplyr The dplyr R package used to define a %.% operator which is similar; however, it has been deprecated and dplyr now recommends that users use %>% which dplyr imports from magrittr and makes available to the dplyr user. As David Arenburg has mentioned in the comments this SO question discusses the differences between it and magrittr's %>% : Differences between %.% (dplyr) and %>% (magrittr)

pipeR The R package, pipeR, defines a %>>% operator that is similar to magrittr's %>% and can be used as an alternative to it. See http://renkun.me/pipeR-tutorial/

The pipeR package also has defined a number of other such operators too. See: http://cran.r-project.org/web/packages/pipeR/pipeR.pdf

postlogic The postlogic package defined %if% and %unless% operators.

wrapr The R package, wrapr, defines a dot pipe %.>% that is an explicit version of %>% in that it does not do implicit insertion of arguments but only substitutes explicit uses of dot on the right hand side. This can be considered as another alternative to %>%. See https://winvector.github.io/wrapr/articles/dot_pipe.html

Bizarro pipe. This is not really a pipe but rather some clever base syntax to work in a way similar to pipes without actually using pipes. It is discussed in http://www.win-vector.com/blog/2017/01/using-the-bizarro-pipe-to-debug-magrittr-pipelines-in-r/ The idea is that instead of writing:

1:8 %>% sum %>% sqrt
## [1] 6

one writes the following. In this case we explicitly use dot rather than eliding the dot argument and end each component of the pipeline with an assignment to the variable whose name is dot (.) . We follow that with a semicolon.

1:8 ->.; sum(.) ->.; sqrt(.)
## [1] 6

Update Added info on expm package and simplified example at top. Added postlogic package.

Update 2 The development version of R has defined a |> pipe. Unlike magrittr's %>% it can only substitute into the first argument of the right hand side. Although limited, it works via syntax transformation so it has no performance impact.

Update 3 In recent versions of R one can use underscore _ on the RHS to specify a different argument than first. I

"banana" |> grepl("an", x = _)

It can only be used once, it cannot be used for a call within a call and the _ argument must be named.

# Specify name.
"banana" |> grepl("an", _)  # bad
"banana" |> grepl("an", x = _) # ok

# Must be an argument to grepl, not sub.  Break into two.
"banana" |> grepl("an", x = sub("n", "m", x = _)) # bad
"banana" |> sub("n", "m", x = _) |> grepl("an", x = _) # ok

# Can only be used once on RHS. 
"banana" |> grepl(pattern = _, x _) # bad
"banana" |> list(. = _) |> with(grepl(pattern = ., .)) # ok
2
qwr On

%>% is the pipe operator from magrittr, widely used in other Tidyverse and compatible packages.

The basic way to understand it is it takes the left-hand side (LHS) and turns it into the first argument of the right-hand side (RHS). x %>% f(y) is special syntax that is essentially f(x,y). If the RHS only has one argument, you can leave off the parentheses, e.g. x %>% f turns into f(x).

The pipes can be chained together. This lets you write functions that pass data left-to-right, like unix pipes, instead of nested function calls, which are read inside-to-out. Consider following the logicial flow of

mtcars %>% subset(hp > 100) %>% print 

vs. the traditional

print(subset(mtcars, hp > 100))

or creating intermediate variables.

The piped version is more natural to read left-to-right, with less parentheses, as steps in a data transform/modeling task. It also lets you easily insert steps into your process without fiddling with nested functions.

Usefully, . is used as a placeholder for the RHS when it's not the first function. For example, x %>% f(y, .) means f(y, x). Tidyverse packages are designed with the "data" as the first argument, so you usually don't need this.

The funny percent-sign syntax is how R lets users define their own infix functions. An example of a built-in infix operator in R is +; the + in 1 + 2 actually does the function call `+`(1, 2) (you can see this by looking at the source code by typing in `+`).

For further information, see https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html

The pipe operator is so useful that R 4.1 added the native pipe |>. It's not as featureful though and it's still quite new. See https://www.tidyverse.org/blog/2023/04/base-vs-magrittr-pipe/

0
Shivam Panchbhai On

I don't know much about it but I have seen it in one case study during the study of Multivariate Normal Distribution in R in my college

suppose you have a data frame in a variable called "df_gather" and you want to pipe it into a ggplot then you can use that %>%

EG:

df_gather %>% ggplot(aes(x = Value, fill = Variable, color = Variable))+
geom_density(alpha = 0.3)+ggtitle('Distibution of X')
0
Hamzah On

Another usage for %---% is the use of %<-% which means a multi-assignment operator for example:

session <- function(){
x <- 1
y <- 2
z <- y + x
list(x,y,z)
}

c(var1,var2,result) %<-% session()
1
HKE On

The R packages dplyr and sf import the operator %>% from the R package magrittr.

Help is available by using the following command:

?'%>%'

Of course the package must be loaded before by using e.g.

library(sf)

The documentation of the magrittr forward-pipe operator gives a good example: When functions require only one argument, x %>% f is equivalent to f(x)

1
Francisco López-Sancho On

My understanding after reading the link offered by G.Grothendieck is that %>% is an operator that pipes functions. This helps readability and productivity as it's easier to follow the flow of multiple functions through these pipes than going backwards when multiple function are nested.

1
RAJAT BHATHEJA On

%>% is similar to pipe in Unix. For example, in

a <- combined_data_set %>% group_by(Outlet_Identifier) %>% tally()

the output of combined_data_set will go into group_by and its output will go into tally, then the final output is assigned to a.

This gives you handy and easy way to use functions in series without creating variables and storing intermediate values.