How to apply a custom recursive function with data.table and loop over each index group-wise?

531 views Asked by At

Since I can't find an answer in below questions:

Apply a recursive function over groups and rows without explicit for loop

How do I mimic the drag functionality for new rows such as in Excel but for R?

I'll try in asking a new question related to above. I.e, I want to apply a custom function recursively based on the output of previous values to the current row by group.

Example with a dataframe and a for loop:

for(i in 2:nrow(df1)) df1$z[i] <- df1$z[i-1] + df1$x[i-1] - df1$y[i-1] 

Example with a dataframe and a for loop with custom function:

for(i in 2:nrow(df1)) df1$z[i] <- ifelse(df1$z[i-1] == df1$z[i],
                                          df1$z[i-1] + df1$x[i-1] - df1$y[i-1],
                                          df1$z[i-1] - df1$x[i-1] - df1$y[i-1])

However, with >1 mill rows, data.frames and for-loops are not optimal.

Is there any way to do above with data.table or dtplyr and optimized but also group-wise?

EDIT: See visualization of question.

It should first initiate from 2nd row like in for(i in 2:nrow(df) and it should use the custom function if and only if group[i]==group[i-1] enter image description here

1

There are 1 answers

17
langtang On BEST ANSWER

Does this use of Reduce do the trick?

tmp = data.table(
  grp = c(rep(0,6), rep(1,6)),
  x=c(10,20,30,40,50,60,1,2,3,4,5,6),
  y=c(1,2,3,4,5,6, 10,20,30,40,50,60)
)
tmp[, z:=Reduce(f=function(z,i) z + x[i-1] - y[i-1],
                x=(1:.N)[-1],
                init=0,
                accumulate = T)
    ,by=grp
]

Output:

    grp  x  y    z
 1:   0 10  1    0
 2:   0 20  2    9
 3:   0 30  3   27
 4:   0 40  4   54
 5:   0 50  5   90
 6:   0 60  6  135
 7:   1  1 10    0
 8:   1  2 20   -9
 9:   1  3 30  -27
10:   1  4 40  -54
11:   1  5 50  -90
12:   1  6 60 -135

Take for example, row 4. The value in the z column is 54, which is equal to the prior row's z-value + prior row's x-value, minus prior row's y-value.

The function f within Reduce can take any complicated form, including ifelse statements. Here is an example, where I've made a function called func, which is a wrapper around Reduce. Notice that within the Reduce statement, f is a function taking prev (thanks to suggestion by @r2evans), and this function first calculates previous row's s value minus previous row's t value (this is akin to your x[-1]-y[-1]. Then there is an ifelse statement. If the difference between the prior rows s and t value (i.e. k) is >20, then the new value in this row will be the previous z value minus the product of 20-4k (i.e. prev-(20-4k)), otherwise it will the previous z value + k (i.e. which is equal to your original formulation: z[i-1]+x[i-1]-y[i-1])

func <- function(s,t) {
  Reduce(
    f=function(prev,i) {
      k=s[i-1] - t[i-1]
      ifelse(k>10, prev -(20-4*k), prev+k)
    },
    x=2:length(s),
    init=0,
    accumulate = TRUE
  )
}

You can then assign the value of the func(x,y) to z, like this:

tmp[, z:=func(x,y), by=.(grp)][]

Output:

    grp  x  y    z
 1:   0 10  1    0
 2:   0 20  2    9
 3:   0 30  3   61
 4:   0 40  4  149
 5:   0 50  5  273
 6:   0 60  6  433
 7:   1  1 10    0
 8:   1  2 20   -9
 9:   1  3 30  -27
10:   1  4 40  -54
11:   1  5 50  -90
12:   1  6 60 -135