Tinkering with gradient boosting and I noticed R's gbm package produces different results than h2o on a minimal example. Why?
Data
library(gbm)
library(h2o)
h2o.init()
train <- data.frame(
X1 = factor(c("A", "A", "A", "B", "B")),
X2 = factor(c("A", "A", "B", "B", "B")),
Y = c(0, 1, 3, 4, 7)
)
X1 X2 Y
1 A A 0
2 A A 1
3 A B 3
4 B B 4
5 B B 7
gbm
# (gbm, 1 round, mae)
model.gbm <- gbm(
Y ~ X1 + X2, data = train, distribution="laplace", n.tree = 1, shrinkage = 1, n.minobsinnode=1, bag.fraction=1,
interaction.depth = 1, verbose=TRUE
)
train$Pred.mae.gbm1 <- predict(model.gbm, newdata=train, n.trees=model.gbm$n.trees)
h2o
# (h2o, 1 round, mae)
model.h2o <- h2o.gbm(
x=c("X1", "X2"), y="Y", training_frame=as.h2o(train), distribution="laplace", ntrees=1, max_depth=1,
learn_rate = 1, min_rows=1
)
train$Pred.mae.h2o1 <- as.data.frame(h2o.predict(model.h2o, as.h2o(train)))$predict
Results
train
X1 X2 Y Pred.mae.gbm1 Pred.mae.h2o1
1 A A 0 1.0 0.5
2 A A 1 1.0 0.5
3 A B 3 1.0 4.0
4 B B 4 5.5 4.0
5 B B 7 5.5 4.0
They are completely independent implementations, and I doubt either has been tuned or designed with the way you are using it in mind (i.e. a single tree,
min_rowsset to 1). In this case it looks like R's gbm has used its single tree on learning the "B" inputs correctly, whileh2o.gbmhas concentrated on the "A" inputs.When you start using real data and real settings, there may still be differences. There are a lot of parameters you are not touching (with
h2o.gbm()at least, which is the one I'm familiar with). And there is also a stochastic element: try a hundred values ofseedto h2o.gbm(), and a constantset.seed()before R'sgbm, and you will likely hit the same results on at least one of them.