I have a large dataset with about 15 columns and more than 3 million rows.
Because the dataset is so big, I would like to use multidplyr
on it .
Because of the data, it would be impossible to just split my data frame to 12 parts. Lets say that there are columns col1
and col2
which each have several different values but they repeat (in each column separately).
How can I make 12 (or n
) similar sized groups which each of them contain rows that have the same value in both col1
and col2
?
Example: Lets say one of the possible values in col1
foo
and in col2
is bar
. Then they would be grouped, all rows with this values would be in one group.
So that the question makes sense, there are always more than 12 unique combinations of col1
and col2
.
I would try to do something with for and while loops if this was python but as this is R
, there probably is another way.
Try this:
Slicing down the data by chance on two rows per group.