I have a large dataset with about 15 columns and more than 3 million rows.
Because the dataset is so big, I would like to use multidplyron it .
Because of the data, it would be impossible to just split my data frame to 12 parts. Lets say that there are columns col1 and col2 which each have several different values but they repeat (in each column separately).
How can I make 12 (or n) similar sized groups which each of them contain rows that have the same value in both col1 and col2?
Example: Lets say one of the possible values in col1 foo and in col2 is bar. Then they would be grouped, all rows with this values would be in one group.
So that the question makes sense, there are always more than 12 unique combinations of col1 and col2.
I would try to do something with for and while loops if this was python but as this is R, there probably is another way.
Try this:
Slicing down the data by chance on two rows per group.