I have used %>%, the magrittr pipe, as given in its documentation by providing a function without empty parentheses to the RHS in this answer and got a comment that the recommended convention is to supply empty parentheses to the RHS.
library(magrittr)
1:3 %>% sum # The documentation calls this: Basic use
1:3 %>% sum() # It's also possible to supply empty parentheses
1:3 |> sum() # And It's similar to |> the base pipe
An advantage might be that the syntax is like for |>, the base pipe.
But on the other hand, %>% could also be used like a function and there functions are typically provided without parentheses.
`%>%`(1:3, sum)
sapply(list(1:3), sum)
`%=>%` <- sapply
list(1:3) %=>% sum
do.call(sum, list(1:3))
`%<%` <- do.call
sum %<% list(1:3)
In this case, it looks like it's constant to use it without parentheses.
On the other hand, when using the placeholder, parentheses need to be provided.
"axc" %>% sub("x", "b", .)
What are the disadvantages when providing a function without parentheses to the pipe and what are the good technical reasons to provide it with empty parentheses?
No, this is confusing things: there is no single way in which functions are “typically provided”, it entirely depends on the usage.
You use the examples of
sapplyanddo.call. Both are higher-order functions, which means that they expect functions as arguments.1 Since they expect functions as arguments, we can pass a name which refers to a function. But instead of a name we can also pass an arbitrary expression which evaluates to a function.… In fact, don’t get hung up on the fact that you are passing a name in your example, it’s a red herring. Here’s an example where we pass the result of an expression (which returns a function) instead:
But this is potentially a distraction, because
%>%does not expect a function object as its second argument. Instead, it expects a function call expression.In my example above,
sapplyis a regular function, which evalutes its arguments using standard evaluation. Both its arguments,1 : 3, as well asmake_adder(2), are evaluated and the results are passed tosapplyas arguments.2%>%is not a regular function: it suppresses standard evaluation of the second argument. Instead, it keeps the expression in its unevaluated form and manipulates it. The way it does that is fairly complex but in the simplest case it injects its first argument into the expression and subsequently evaluates it. Here’s some pseudocode to illustrate this:This works for any valid
rhsexpression:sum(),head(3), etc.%>%transforms these into, respectively,sum(lhs),sum(lhs, 3), etc., and evaluates the resulting expression.So far, this is perfectly consistent. However, the author of
%>%chose to allow an additional, entirely distinct usage: instead of passing a function call expression asrhs, you can also pass a simple name. In that case,%>%does something completely different. Instead of constructing a new call expression that injectslhs, and evaluating that, it directly callsrhs(lhs):In other words,
%>%accepts two fundamentally different types of arguments asrhs, and does different things for them.This isn’t in itself a problem yet. It becomes a problem if we pass a function factory as the
rhs. That’s a higher-order function which itself returns a function.make_adderfrom above is such a function factory.So: what does
1 : 3 %>% make_adder(2)do? …Oh, right!
make_adder(2)is a function call expression, so the first definition of%>%applies: transform the expression and evaluate it. So it attempts to evaluatemake_adder(2, 1 : 3), and that fails, becausemake_adderonly expects one argument.Luckily for our sanity we can use
make_adderwith%>%. This doesn’t even require additional rules or documentation. With a bit of thinking it follows directly from the first definition above: we need to add another layer of function call, because we want%>%to call the function that is returned bymake_adder. The following works:%>%interpolated thelhssuch thatnew_rhsbecamemake_adder(2)(1 : 3).We could make this a bit more readable by assigning the return value of
make_adder(2)to a name:We directly replaced a subexpression by a newly introduced name here. This is an extremely basic computer science concept, but it is so powerful that it has its own name: referential transparency. It’s a concept which makes reasoning about programs easier, because we know that we can always assign arbitrary sub-expression to a name and use that name in its place in a piece of code: (1) and (2) are identical.
But, actually, referential transparency requires that we can also do the replacement in reverse, i.e. replace the name by the value that it refers to. Sure enough, this works, and we get our original expression back:
(1) and (2) are still identical.
But unfortunately it does not always work:
(1) works, but (2) fails, even though we merely substituted
add_2with its definition.%>%does not preserve referential transparency.3And that is why not using parentheses on the RHS is inconsistent, and why it is widely discouraged (e.g. by the tidyverse style guide). And it is also (as far as I understand) why the R core developers decided that
|>always requires a function call expression as its RHS, and you cannot omit the parentheses.1 We have a special word for this concept because accepting functions as arguments used to be very uncommon in mainstream programming languages.
2 This is a simplification. The truth is more complicated, but irrelevant here. If you are curious, see R Language Definition: Argument evaluation.
3 Violating referential transparency in R is quite easy because R gives us a lot of control over how we want to evaluate expressions. And often this can be quite handy. But when not used with care it can cause confusing code and subtle bugs, and it is recommended to weigh violations of referential transparency carefully against the benefits.