What parameters in a EC2 virtual machine should I use to optimize H2O's XGBoost performance?

144 views Asked by At

I'm trying to run H2O xgboost on r4.8x large. But it's taking too long to run (15+ hrs as opposed to 4 hours for GBM with same hyperparameter grid size).

Knowing that XGBoost uses cache optimization, is there any particular instance type that works best for H2O's XGBoost implementation?

My training data has 28K rows with 150 binary columns. And I'm running a grid search.

1

There are 1 answers

0
Neema Mashayekhi On

Changing your EC2 instance won't necessarily make it faster. You need to understand where is the bottleneck. Review the logs and see what takes time on GBM vs XGBoost. Is XGBoost creating deeper trees or more trees? It could be your settings are different between the two algorithms. Check that all the hyperparameters are similar (close as possible).

Also, XGBoost uses memory external to H2O's JVM. As mentioned in FAQ of H2O's XGBoost docs, try adding -extramempercent 120 and lowering your H2O memory.