In R 4.1 (May 2021) a native pipe operator was introduced that is "more streamlined" than previous implementations. I already noticed one difference between the native |> and the magrittr pipe %>%, namely 2 %>% sqrt works but 2 |> sqrt doesn't and has to be written as 2 |> sqrt(). Are there more differences and pitfalls to be aware of when using the native pipe operator?
What are the differences between R's native pipe `|>` and the magrittr pipe `%>%`?
15.8k views Asked by sieste AtThere are 5 answers
On
The native pipe is implemented as a syntax transformation and so 2 |> sqrt() has no discernible overhead compared to sqrt(2), whereas 2 %>% sqrt() comes with a small penalty.
microbenchmark::microbenchmark(
sqrt(1),
2 |> sqrt(),
3 %>% sqrt()
)
# Unit: nanoseconds
# expr min lq mean median uq max neval
# sqrt(1) 117 126.5 141.66 132.0 139 246 100
# sqrt(2) 118 129.0 156.16 134.0 145 1792 100
# 3 %>% sqrt() 2695 2762.5 2945.26 2811.5 2855 13736 100
You see how the expression 2 |> sqrt() passed to microbenchmark is parsed as sqrt(2). This can also be seen in
quote(2 |> sqrt())
# sqrt(2)
On
The base R pipe |> added in R 4.1.0 "just" does functional composition. I.e. we can see that its use really is just the same as the functional call:
> 1:5 |> sum() # simple use of |>
[1] 15
> deparse(substitute( 1:5 |> sum() ))
[1] "sum(1:5)"
>
That has some consequences:
- it makes it a little faster
- it makes it a little simpler and more robust
- it makes it a little more restrictive:
sum()here needs the parens for a proper call - it limits uses of the 'implicit' data argument
This leads to possible use of => which is currently "available but not active" (for which you need to set the enviornment variable _R_USE_PIPEBIND_, and which may change for R 4.2.0).
(This was first offered as answer to a question duplicating this over here and I just copied it over as suggested.)
Edit: As the follow-up question on 'what is =>' comes up, here is a quick follow-up. Note that this operator is subject to change.
> Sys.setenv("_R_USE_PIPEBIND_"=TRUE)
> mtcars |> subset(cyl == 4) |> d => lm(mpg ~ disp, data = d)
Call:
lm(formula = mpg ~ disp, data = subset(mtcars, cyl == 4))
Coefficients:
(Intercept) disp
40.872 -0.135
> deparse(substitute(mtcars |> subset(cyl==4) |> d => lm(mpg ~ disp, data = d)))
[1] "lm(mpg ~ disp, data = subset(mtcars, cyl == 4))"
>
The deparse(substitute(...)) is particularly nice here.
On
One difference is their placeholder, _ in base R, . in magrittr.
Since R 4.2.0, the base R pipe has a placeholder for piped-in values, _, similar to %>%'s ., but its use is restricted to named arguments, and can only be used once per call.
It is now possible to use a named argument with the placeholder _ in the rhs call to specify where the lhs is to be inserted. The placeholder can only appear once on the rhs.
To reiterate Ronak Shah's example, you can now use _ as a named argument on the right-hand side to refer to the left-hand side of the formula:
c("dogs", "cats", "rats") |>
grepl("at", x = _)
#[1] FALSE TRUE TRUE
but it has to be named:
c("dogs", "cats", "rats") |>
grepl("at", _)
#Error: pipe placeholder can only be used as a named argument
and cannot appear more than once (to overcome this issue, one can still use the solutions provided by Ronak Shah):
c("dogs", "cats", "rats") |>
expand.grid(x = _, y = _)
# Error in expand.grid(x = "_", y = "_") : pipe placeholder may only appear once
While this is possible with magrittr:
library(magrittr)
c("dogs", "cats", "rats") %>%
expand.grid(x = ., y = .)
# x y
#1 dogs dogs
#2 cats dogs
#3 rats dogs
#4 dogs cats
#5 cats cats
#6 rats cats
#7 dogs rats
#8 cats rats
#9 rats rats
On
| Topic | Magrittr 2.0.3 | Base 4.3.0 |
|---|---|---|
| Operator | %>% %<>% %$% %!>% %T>% |
|> (since 4.1.0) |
| Function call | 1:3 %>% sum() |
1:3 |> sum() |
1:3 %>% sum |
Needs brackets / parentheses | |
1:3 %>% `+`(4) |
Some functions are not supported | |
| Insert on first empty place | mtcars %>% lm(formula = mpg ~ disp) |
mtcars |> lm(formula = mpg ~ disp) |
| Placeholder | . |
_ (since 4.2.0) |
mtcars %>% lm(mpg ~ disp, data = . ) |
mtcars |> lm(mpg ~ disp, data = _ ) |
|
mtcars %>% lm(mpg ~ disp, . ) |
Needs named argument | |
1:3 %>% setNames(., .) |
Can only appear once | |
1:3 %>% {sum(sqrt(.))} |
Nested calls are not allowed | |
| Extraction call | mtcars %>% .$cyl mtcars %>% {.$cyl[[3]]} or mtcars %$% cyl[[3]] |
mtcars |> _$cyl (since 4.3.0) mtcars |> _$cyl[[3]] |
| Environment | %>% has additional function environment use: "x" %!>% assign(1) |
"x" |> assign(1) |
| Create Function | top6 <- . %>% sort() %>% tail() |
Not possible |
| Speed | Slower because Overhead of function call | Faster because Syntax transformation |
Many differences and limitations disappear when using |> in combination with an (anonymous) function:
1 |> (\(.) .)()
-3:3 |> (\(.) sum(2*abs(.) - 3*.^2))()
Have also a look at: How to pipe purely in base R ('base pipe')? and What are the differences and use cases of the five Magrittr Pipes %>%, %<>%, %$%, %!>% and %T>%?.
Needs brackets
library(magrittr)
1:3 |> sum
#Error: The pipe operator requires a function call as RHS
1:3 |> sum()
#[1] 6
1:3 |> approxfun(1:3, 4:6)()
#[1] 4 5 6
1:3 %>% sum
#[1] 6
1:3 %>% sum()
#[1] 6
1:3 %>% approxfun(1:3, 4:6) #But in this case empty parentheses are needed
#Error in if (is.na(method)) stop("invalid interpolation method") :
1:3 %>% approxfun(1:3, 4:6)()
#[1] 4 5 6
Some functions are not supported,
but some still can be called by placing them in brackets, call them via the function ::, use the placeholder, call it in a function or define a link to the function.
1:3 |> `+`(4)
#Error: function '+' not supported in RHS call of a pipe
1:3 |> (`+`)(4)
#[1] 5 6 7
1:3 |> base::`+`(4)
#[1] 5 6 7
1:3 |> `+`(4, e2 = _)
#[1] 5 6 7
1 |> (`+`)(2) |> (`*`)(3) #(1 + 2) * 3 or `*`(`+`(1, 2), 3) and NOT 1 + 2 * 3
#[1] 9
1:3 |> (\(.) . + 4)()
#[1] 5 6 7
fun <- `+`
1:3 |> fun(4)
#[1] 5 6 7
1:3 %>% `+`(4)
#[1] 5 6 7
Placeholder needs named argument
2 |> setdiff(1:3, _)
#Error: pipe placeholder can only be used as a named argument
2 |> setdiff(1:3, y = _)
#[1] 1 3
2 |> (\(.) setdiff(1:3, .))()
#[1] 1 3
2 %>% setdiff(1:3, .)
#[1] 1 3
2 %>% setdiff(1:3, y = .)
#[1] 1 3
Also for variadic functions with ... (dot-dot-dot) arguments, the placeholder _ needs to be used as a named argument.
"b" |> paste("a", _, "c")
#Error: pipe placeholder can only be used as a named argument
"b" |> paste("a", . = _, "c")
#[1] "a b c"
"b" |> (\(.) paste("a", ., "c"))()
#[1] "a b c"
Placeholder can only appear once
1:3 |> setNames(nm = _)
#1 2 3
#1 2 3
1:3 |> setNames(object = _, nm = _)
#Error in setNames(object = "_", nm = "_") :
# pipe placeholder may only appear once
1:3 |> (\(.) setNames(., .))()
#1 2 3
#1 2 3
1:3 |> list() |> setNames(".") |> with(setNames(., .))
#1 2 3
#1 2 3
1:3 |> list(. = _) |> with(setNames(., .))
#1 2 3
#1 2 3
1:3 %>% setNames(object = ., nm = .)
#1 2 3
#1 2 3
1:3 %>% setNames(., .)
#1 2 3
#1 2 3
Nested calls are not allowed
1:3 |> sum(sqrt(x=_))
#Error in sum(1:3, sqrt(x = "_")) : invalid use of pipe placeholder
1:3 |> (\(.) sum(sqrt(.)))()
#[1] 4.146264
1:3 %>% {sum(sqrt(.))}
#[1] 4.146264
Extraction call
Experimental feature since 4.3.0. The placeholder _ can now also be used in the rhs of a forward pipe |> expression as the first argument in an extraction call, such as _$coef. More generally, it can be used as the head of a chain of extractions, such as _$coef[[2]]*
mtcars |> _$cyl
mtcars |> _[["cyl"]]
mtcars |> _[,"cyl"]
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
mtcars |> _$cyl[[4]]
#[1] 6
mtcars %>% .$cyl
mtcars %>% .[["cyl"]]
mtcars %>% .[,"cyl"]
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
#mtcars %>% .$cyl[4] #gives mtcars[[4]]
mtcars %>% .$cyl %>% .[4]
#[1] 6
No additional Environment
assign("x", 1)
x
#[1] 1
"x" |> assign(2)
x
#[1] 2
"x" |> (\(x) assign(x, 3))()
x
#[1] 2
1:3 |> assign("x", value=_)
x
#[1] 1 2 3
"x" %>% assign(4)
x
#[1] 1 2 3
4 %>% assign("x", .)
x
#[1] 1 2 3
"x" %!>% assign(4) #Use instead the eager pipe
x
#[1] 4
5 %!>% assign("x", .)
x
#[1] 5
Create a Function
top6 <- . %>% sort() %>% tail()
top6(c(1:10,10:1))
#[1] 8 8 9 9 10 10
Other possibilities:
A different pipe operator and different placeholder could be realized with the Bizarro pipe ->.; what is not a pipe (see disadvantages) which is overwriting .
1:3 ->.; sum(.)
#[1] 6
mtcars ->.; .$cyl
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
mtcars ->.; .$cyl[4]
#[1] 6
1:3 ->.; setNames(., .)
#1 2 3
#1 2 3
1:3 ->.; sum(sqrt(x=.))
#[1] 4.146264
"x" ->.; assign(., 5)
x
#[1] 5
6 ->.; assign("x", .)
x
#[1] 6
1:3 ->.; . + 4
#[1] 5 6 7
1 ->.; (`+`)(., 2) ->.; (`*`)(., 3)
#[1] 9
1 ->.; .+2 ->.; .*3
#[1] 9
and evaluates different.
x <- data.frame(a=0)
f1 <- \(x) {message("IN 1"); x$b <- 1; message("OUT 1"); x}
f2 <- \(x) {message("IN 2"); x$c <- 2; message("OUT 2"); x}
x ->.; f1(.) ->.; f2(.)
#IN 1
#OUT 1
#IN 2
#OUT 2
# a b c
#1 0 1 2
x |> f1() |> f2()
#IN 2
#IN 1
#OUT 1
#OUT 2
# a b c
#1 0 1 2
f2(f1(x))
#IN 2
#IN 1
#OUT 1
#OUT 2
# a b c
#1 0 1 2
Or define a custom pipe operator which is setting . to the value of the lhs in a new environment and evaluates rhs in it. But here values in the calling environment could not be created or changed.
`:=` <- \(lhs, rhs) eval(substitute(rhs), list(. = lhs))
mtcars := .$cyl[4]
#[1] 6
1:3 := setNames(., .)
#1 2 3
#1 2 3
1:3 := sum(sqrt(x=.))
#[1] 4.146264
"x" := assign(., 6)
x
#Error: object 'x' not found
1 := .+2 := .*3
#[1] 9
So another try is assigning lhs to the placeholder . in the calling environment and evaluate the rhs in the calling environment. But here . will be removed from calling environment in case it was already there.
`?` <- \(lhs, rhs) {
on.exit(if(exists(".", parent.frame())) rm(., envir = parent.frame()))
assign(".", lhs, envir=parent.frame())
eval.parent(substitute(rhs))
}
mtcars ? .$cyl[4]
#[1] 6
1:3 ? setNames(., .)
#1 2 3
#1 2 3
1:3 ? sum(sqrt(x=.))
#[1] 4.146264
"x" ? assign(., 6)
x
#[1] 6
1 ? .+2 ? .*3
#[1] 9
Another possibility will be to replace all . with lhs so that during evaluation . does not exists anymore as a name.
`%|>%` <- \(lhs, rhs)
eval.parent(eval(call('substitute', substitute(rhs), list(. = lhs))))
mtcars %|>% .$cyl[4]
[1] 6
1:3 %|>% setNames(., .)
1 2 3
1 2 3
1:3 %|>% sum(sqrt(x=.))
[1] 4.146264
"x" %|>% assign(., 6)
x
#[1] 6
1 %|>% .+2 %|>% .*3
#[1] 7
The name of the used operator influences the operator precedence: See Same function but using for it the name %>% causes a different result compared when using the name :=.
For more advanced options see: Write own / custom pipe operator.
Speed
library(magrittr)
`:=` <- \(lhs, rhs) eval(substitute(rhs), list(. = lhs))
`?` <- \(lhs, rhs) {
on.exit(if(exists(".", parent.frame())) rm(., envir = parent.frame()))
assign(".", lhs, envir=parent.frame())
eval.parent(substitute(rhs))
}
`%|>%` <- \(lhs, rhs)
eval.parent(eval(call('substitute', substitute(rhs), list(. = lhs))))
x <- 42
bench::mark(min_time = 0.2, max_iterations = 1e8
, x
, identity(x)
, "|>" = x |> identity()
, "|> _" = x |> identity(x=_)
, "->.;" = {x ->.; identity(.)}
, "|> f()" = x |> (\(y) identity(y))()
, "%>%" = x %>% identity
, ":=" = x := identity(.)
, "list." = x |> list() |> setNames(".") |> with(identity(.))
, "%|>%" = x %|>% identity(.)
, "?" = x ? identity(.)
)
Result
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl>
1 x 31.08ns 48.2ns 19741120. 0B 7.46 2646587 1
2 identity(x) 491.04ns 553.09ns 1750116. 0B 27.0 323575 5
3 |> 497.91ns 548.08ns 1758553. 0B 27.3 322408 5
4 |> _ 506.87ns 568.92ns 1720374. 0B 26.9 320003 5
5 ->.; 725.03ns 786.04ns 1238488. 0B 21.2 233864 4
6 |> f() 972.07ns 1.03µs 929926. 0B 37.8 172288 7
7 %>% 2.76µs 3.05µs 315448. 0B 37.2 59361 7
8 := 3.02µs 3.35µs 288025. 0B 37.0 54561 7
9 list. 5.19µs 5.89µs 166721. 0B 36.8 31752 7
10 %|>% 6.01µs 6.86µs 143294. 0B 37.0 27076 7
11 ? 30.9µs 32.79µs 30074. 0B 31.3 5768 6
In R 4.1, there was no placeholder syntax for the native pipe. Thus, there was no equivalent of the
.placeholder of magrittr and thus the following was impossible with|>.As of R 4.2, the native pipe can use
_as a placeholder but only with named arguments.The
.and magrittr is still more flexible as.can be repeated and appear in expressions.It is also not clear how to use
|>with a function that takes in unnamed variadic arguments (i.e.,...). In thispaste()example, we can make upxandyarguments to trick the placeholder in the correct place, but that feels hacky.Here are additional ways to work around the place holder limitations-
Use an anonymous function
a) Use the "old" syntax
b) Use the new anonymous function syntax
Specify the first parameter by name. This relies on the fact that the native pipe pipes into the first unnamed parameter, so if you provide a name for the first parameter it "overflows" into the second (and so on if you specify more than one parameter by name)