Loop through list of dataframes in R to take set difference

49 views Asked by At

I'm hoping to take the set difference between 40 dataframes (10 years of data each with 4 quarters) starting with the second quarter of my first year and comparing it to the first quarter of my first year, all the way to my last quarter. I have all my dataframes in a list, but can't figure out the loop.


short <- list(q4_17, q1_18, q2_18, q3_18, q4_18)

for(i in short)
{j=i+1
new<- j %>%
filter(PATID %in% setdiff(j,i)$PATID)
}

I'm getting the error: Error in FUN(left, right): nun-numeric argument to binary operator.

Thanks for any suggestions. I would try to put my dataframes up but the code is on a remote desktop and difficult to get off.

1

There are 1 answers

1
r2evans On

In addition to the comments above, two more problems:

  1. You overwrite new which each pass of the for loop, so you will only benefit from the last pass, all previous passes will be silently discarded.
  2. i+1 (treating it as an index) works up until i indicates the last frame within short, at which point the "next frame in short" is an indexing error.

To appease both issues, we can do either of these, assuming that the setdiff/filter code actually does everything you need.

## still a for loop
out <- list()
for (j in seq_along(short)[-1]) {
  i <- j - 1
  new <- short[[j]] %>%
    filter(PATID %in% setdiff(short[[j]], short[[i]])$PATID)
  out <- c(out, list(new))
}
out <- do.call(rbind, out)

## new: lapply
out <- lapply(seq_along(short)[-1], function(j) {
  short[[j]] %>%
    filter(PATID %in% setdiff(short[[j]], short[[j-1]])$PATID)
})
out <- do.call(rbind, out)

Judging by what you may be doing, I feel there is a better way to do this setdiff/filter part.