How do I perform stratified random splitting in R h2o.ai?

270 views Asked by At

I like the h2o.ai tool for ml. It is java but it is familiar and does a decent job.

Here is info about stratified splitting in general:

I have a variable that is strongly imbalanced, so I need R-gui based stratified splitting of my data on that variable, in h2o.ai. Is there a way to do it?

An R command for splitting data in the h2o.ai tool is this:

splits = h2o.splitFrame(mydata, ratios=myratio, destination_frames=...)

There is no option for stratification in the splitframe variable. The I know in the Flow (web interface to running java) tool they allow balanced classes in the cross-validated approach, so somewhere in there it is doing stratified splitting.

I hate to do this in base R because the memory handling in R is not as effective as in h2o.ai and my data sizes are large.

1

There are 1 answers

0
Mathanraj-Sharma On

As far as I understand your problem is to use stratified sampling since your data is heavily imbalanced

when creating model you can set certain args to achieve this, for example

h2o.gbm(....., nfolds=n, fold_asssignment="Stratified", fold_column="Your Column")

or else you can try setting

h2o.gbm(..., balance_classes=True, ...)

Hope this will help you, for more details please refer to https://docs.h2o.ai/h2o/latest-stable/h2o-r/h2o_package.pdf