Why does gbm() give different results than h2o.gbm() in this minimal example?

Question

Why does gbm() give different results than h2o.gbm() in this minimal example?

721 views Asked by Ben At 18 December 2016 at 22:52

Tinkering with gradient boosting and I noticed R's gbm package produces different results than h2o on a minimal example. Why?

Data

library(gbm)
library(h2o)

h2o.init()

train <- data.frame(
  X1 = factor(c("A", "A", "A", "B", "B")),
  X2 = factor(c("A", "A", "B", "B", "B")),
  Y = c(0, 1, 3, 4, 7)
)
  X1 X2 Y
1  A  A 0
2  A  A 1
3  A  B 3
4  B  B 4
5  B  B 7

gbm

# (gbm, 1 round, mae)
model.gbm <- gbm(
  Y ~ X1 + X2, data = train, distribution="laplace", n.tree = 1, shrinkage = 1, n.minobsinnode=1, bag.fraction=1, 
  interaction.depth = 1, verbose=TRUE
)
train$Pred.mae.gbm1 <- predict(model.gbm, newdata=train, n.trees=model.gbm$n.trees)

h2o

# (h2o, 1 round, mae)
model.h2o <- h2o.gbm(
  x=c("X1", "X2"), y="Y", training_frame=as.h2o(train), distribution="laplace", ntrees=1, max_depth=1, 
  learn_rate = 1, min_rows=1
)
train$Pred.mae.h2o1 <- as.data.frame(h2o.predict(model.h2o, as.h2o(train)))$predict

Results

train
  X1 X2 Y Pred.mae.gbm1 Pred.mae.h2o1
1  A  A 0           1.0           0.5
2  A  A 1           1.0           0.5
3  A  B 3           1.0           4.0
4  B  B 4           5.5           4.0
5  B  B 7           5.5           4.0

Original Q&A

There are 1 answers

**Darren Cook** · Accepted Answer · 2016-12-19T08:32:43+00:00

They are completely independent implementations, and I doubt either has been tuned or designed with the way you are using it in mind (i.e. a single tree, min_rows set to 1). In this case it looks like R's gbm has used its single tree on learning the "B" inputs correctly, while h2o.gbm has concentrated on the "A" inputs.

When you start using real data and real settings, there may still be differences. There are a lot of parameters you are not touching (with h2o.gbm() at least, which is the one I'm familiar with). And there is also a stochastic element: try a hundred values of seed to h2o.gbm(), and a constant set.seed() before R's gbm, and you will likely hit the same results on at least one of them.

TechQA.

Why does gbm() give different results than h2o.gbm() in this minimal example?

There are 1 answers

Related Questions in R

Related Questions in MACHINE-LEARNING

Related Questions in H2O

Related Questions in GBM

Popular Questions

Popular Tags

Trending Questions