I have a function that I'm applying to different sets of coordinates to create four new columns in my tibble. This function has a pretty long start-up time (loads the genome into RAM, converts tibble to GRanges, and retrieves sequences) but is relatively fast, so that there's not much difference between 100 and 1,000,000 sequences. Is there any way to send each col in the mutate
to a different core so they can be processed at the same time? I thought about using pivot_long
and then group
+partition
but this got me thinking about whether there was a different way to accomplish this. A multi_mutate
of sorts?
(I don't actually expect the multiplyr partition/collect to be that time-saving in my case given the small cost to additional coordinates, but if I could avoid the time cost of pivoting, which is still relatively small, and mess in my code, that'd be cool.)
Send different dplyr::mutate cols to different cores with multdplyr?
161 views Asked by GenesRus At
1
I know you were looking for an existing package, but I couldn't find anything on that. Other similar questions (like here or here) appear not to provide a package either..
However, what about you hack it out yourself... Look at this example with
furrr
.It needs some testing a guess.. and It would need to be improved.. for example using the same methods available for
mutate
. But it's a start.Notice that I need to use
future_options
..