I like the h2o.ai tool for ml. It is java but it is familiar and does a decent job.
Here is info about stratified splitting in general:
I have a variable that is strongly imbalanced, so I need R-gui based stratified splitting of my data on that variable, in h2o.ai. Is there a way to do it?
An R command for splitting data in the h2o.ai tool is this:
splits = h2o.splitFrame(mydata, ratios=myratio, destination_frames=...)
There is no option for stratification in the splitframe variable. The I know in the Flow (web interface to running java) tool they allow balanced classes in the cross-validated approach, so somewhere in there it is doing stratified splitting.
I hate to do this in base R because the memory handling in R is not as effective as in h2o.ai and my data sizes are large.
As far as I understand your problem is to use stratified sampling since your data is heavily imbalanced
when creating model you can set certain args to achieve this, for example
or else you can try setting
Hope this will help you, for more details please refer to https://docs.h2o.ai/h2o/latest-stable/h2o-r/h2o_package.pdf