Why does gbm() give different results than h2o.gbm() in this minimal example?

711 views Asked by At

Tinkering with gradient boosting and I noticed R's gbm package produces different results than h2o on a minimal example. Why?


Data

library(gbm)
library(h2o)

h2o.init()

train <- data.frame(
  X1 = factor(c("A", "A", "A", "B", "B")),
  X2 = factor(c("A", "A", "B", "B", "B")),
  Y = c(0, 1, 3, 4, 7)
)
  X1 X2 Y
1  A  A 0
2  A  A 1
3  A  B 3
4  B  B 4
5  B  B 7

gbm

# (gbm, 1 round, mae)
model.gbm <- gbm(
  Y ~ X1 + X2, data = train, distribution="laplace", n.tree = 1, shrinkage = 1, n.minobsinnode=1, bag.fraction=1, 
  interaction.depth = 1, verbose=TRUE
)
train$Pred.mae.gbm1 <- predict(model.gbm, newdata=train, n.trees=model.gbm$n.trees)

h2o

# (h2o, 1 round, mae)
model.h2o <- h2o.gbm(
  x=c("X1", "X2"), y="Y", training_frame=as.h2o(train), distribution="laplace", ntrees=1, max_depth=1, 
  learn_rate = 1, min_rows=1
)
train$Pred.mae.h2o1 <- as.data.frame(h2o.predict(model.h2o, as.h2o(train)))$predict

Results

train
  X1 X2 Y Pred.mae.gbm1 Pred.mae.h2o1
1  A  A 0           1.0           0.5
2  A  A 1           1.0           0.5
3  A  B 3           1.0           4.0
4  B  B 4           5.5           4.0
5  B  B 7           5.5           4.0
1

There are 1 answers

0
Darren Cook On BEST ANSWER

They are completely independent implementations, and I doubt either has been tuned or designed with the way you are using it in mind (i.e. a single tree, min_rows set to 1). In this case it looks like R's gbm has used its single tree on learning the "B" inputs correctly, while h2o.gbm has concentrated on the "A" inputs.

When you start using real data and real settings, there may still be differences. There are a lot of parameters you are not touching (with h2o.gbm() at least, which is the one I'm familiar with). And there is also a stochastic element: try a hundred values of seed to h2o.gbm(), and a constant set.seed() before R's gbm, and you will likely hit the same results on at least one of them.