How can I use a progress bar for piped functions in R / tidyverse

513 views Asked by At

I have a main function which performs a handful of variously complicated (and long-running) computations on some data, it performs these steps using the pipe from tidyverse / magrittr. I would like a progress bar to report on the stage of the processing as it works through it, however, I'm at a loss. I've looked at the cli, progress and progressr packages, and out of them I could only get cli to work (in a manner of speaking.

Here's a minimal example:

library(tidyverse)
library(cli)

main_fun <- function() {
  cli_progress_step(msg = "Running main function")
  tibble(a = 1:5) %>% 
    fun1() %>% 
    fun2() %>% 
    fun3()
}

fun1 <- function(data) {
  cli_progress_step(msg = "Doing sub function 1")
  Sys.sleep(2)

  return(data)
}
fun2 <- function(data) {
  cli_progress_step(msg = "Doing sub function 2")
  Sys.sleep(1)

  return(data)
}
fun3 <- function(data) {
  cli_progress_step(msg = "Doing sub function 3")
  Sys.sleep(3)

  return(data)
}

main_fun()
#> ℹ Running main function
#> ℹ Doing sub function 3
#> ℹ Doing sub function 2
#> ℹ Doing sub function 1
#> ✔ Doing sub function 1 [2s]
#> 
#> ℹ Doing sub function 2✔ Doing sub function 2 [3s]
#> 
#> ℹ Doing sub function 3✔ Doing sub function 3 [6.1s]
#> 
#> ℹ Running main function✔ Running main function [6.1s]
#> # A tibble: 10 × 1
#>        a
#>    <int>
#>  1     1
#>  2     2
#>  3     3
#>  4     4
#>  5     5

This displays the progress bars but in 'reverse' order i.e. 3 then 2 then 1. Once it's all completed all are shown, which is about the only bit I'm happy with.

2

There are 2 answers

3
GKi On

This is because, in a pipe, functions are not evaluated form left to right. Regular R semantics for evaluation apply - Lazy evaluation or call-by-need. Your call with the base pipe |> will look like:

fun3(fun2(fun1(tibble(a = 1:5))))

You can force the evaluation e.g. with forceAndCall.

data.frame(a = 1:5) |> forceAndCall(n=1, Fun=fun1, data=_) |>
  forceAndCall(n=1, Fun=fun2, data=_) |> forceAndCall(n=1, Fun=fun3, data=_)
#✔ Doing sub function 1 [2s]
#✔ Doing sub function 2 [1s]
#✔ Doing sub function 3 [3s]
#...

Or with magrittr you can use the eager pipe %!>% to evaluate form left to right (Thanks @Moohan for the comment!).

data.frame(a = 1:5) %!>% fun1() %!>%  fun2() %!>% fun3()
#✔ Doing sub function 1 [2s]
#✔ Doing sub function 2 [1s]
#✔ Doing sub function 3 [3s]
#...

You can force the evaluation of a function argument in the first line of the functions, which will result as you might have expected. This works for both pipes |> and %>%.

library(magrittr)
library(cli)

fun1 <- function(data) {
  force(data) #or simple only data
  cli_progress_step(msg = "Doing sub function 1")
  Sys.sleep(2)
  data
}
fun2 <- function(data) {
  force(data)
  cli_progress_step(msg = "Doing sub function 2")
  Sys.sleep(1)
  data
}
fun3 <- function(data) {
  force(data)
  cli_progress_step(msg = "Doing sub function 3")
  Sys.sleep(3)
  data
}

data.frame(a = 1:5) %>% fun1() %>% fun2() %>% fun3()
#✔ Doing sub function 1 [2s]
#✔ Doing sub function 2 [1s]
#✔ Doing sub function 3 [3s]
#✔ Running main function [6.1s]
#...

data.frame(a = 1:5) |> fun1() |> fun2() |> fun3()
#✔ Doing sub function 1 [2s]
#✔ Doing sub function 2 [1s]
#✔ Doing sub function 3 [3s]
#✔ Running main function [6.1s]
#...

Another way might be to write a custom pipe function.

`:=` <- function(lhs, rhs) eval(substitute(rhs), list(. = lhs))

data.frame(a = 1:5) := fun1(.) := fun2(.) := fun3(.)
#✔ Doing sub function 1 [2s]
#✔ Doing sub function 2 [1s]
#✔ Doing sub function 3 [3s]
#...

Another example showing when entering and exiting the functions.

library(magrittr)
f1 <- \(x) {message("IN 1"); x$b <- 1; message("OUT 1"); x}
f2 <- \(x) {message("IN 2"); x$c <- 2; message("OUT 2"); x}

data.frame(a=0) %>% f1 %>% f2
#IN 2
#IN 1
#OUT 1
#OUT 2
#  a b c
#1 0 1 2

data.frame(a=0) |> f1() |> f2()
#IN 2
#IN 1
#OUT 1
#OUT 2
#  a b c
#1 0 1 2

f2(f1(data.frame(a=0)))
#IN 2
#IN 1
#OUT 1
#OUT 2
#  a b c
#1 0 1 2

data.frame(a=0) %!>% f1 %!>% f2
#IN 1
#OUT 1
#IN 2
#OUT 2
#  a b c
#1 0 1 2

data.frame(a=0) := f1(.) := f2(.)
#IN 1
#OUT 1
#IN 2
#OUT 2
#  a b c
#1 0 1 2

. <- data.frame(a=0)
. <- f1(.)
#IN 1
#OUT 1
. <- f2(.)
#IN 2
#OUT 2
.
#  a b c
#1 0 1 2
0
Moohan On

This can be achieved using the 'eager pipe' (%!>%) from {magrittr}

library(tidyverse)
library(cli)
library(magrittr)

main_fun <- function() {
  cli_progress_step(msg = "Running main function")
  tibble(a = 1:5) %!>% 
    fun1() %!>% 
    fun2() %!>% 
    fun3()
}

main_fun()

#> ℹ Running main function
#> ℹ Doing sub function 1
#> ✔ Doing sub function 1 [2s]
#> 
#> ℹ Running main functionℹ Doing sub function 2
#> ✔ Doing sub function 2 [1s]
#> 
#> ℹ Running main functionℹ Doing sub function 3
#> ✔ Doing sub function 3 [3s]
#> 
#> ℹ Running main function✔ Running main function [6.1s]
#> # A tibble: 10 × 1
#>        a
#>    <int>
#>  1     1
#>  2     2
#>  3     3
#>  4     4
#>  5     5