Tinkering with gradient boosting and I noticed R's gbm
package produces different results than h2o
on a minimal example. Why?
Data
library(gbm)
library(h2o)
h2o.init()
train <- data.frame(
X1 = factor(c("A", "A", "A", "B", "B")),
X2 = factor(c("A", "A", "B", "B", "B")),
Y = c(0, 1, 3, 4, 7)
)
X1 X2 Y
1 A A 0
2 A A 1
3 A B 3
4 B B 4
5 B B 7
gbm
# (gbm, 1 round, mae)
model.gbm <- gbm(
Y ~ X1 + X2, data = train, distribution="laplace", n.tree = 1, shrinkage = 1, n.minobsinnode=1, bag.fraction=1,
interaction.depth = 1, verbose=TRUE
)
train$Pred.mae.gbm1 <- predict(model.gbm, newdata=train, n.trees=model.gbm$n.trees)
h2o
# (h2o, 1 round, mae)
model.h2o <- h2o.gbm(
x=c("X1", "X2"), y="Y", training_frame=as.h2o(train), distribution="laplace", ntrees=1, max_depth=1,
learn_rate = 1, min_rows=1
)
train$Pred.mae.h2o1 <- as.data.frame(h2o.predict(model.h2o, as.h2o(train)))$predict
Results
train
X1 X2 Y Pred.mae.gbm1 Pred.mae.h2o1
1 A A 0 1.0 0.5
2 A A 1 1.0 0.5
3 A B 3 1.0 4.0
4 B B 4 5.5 4.0
5 B B 7 5.5 4.0
They are completely independent implementations, and I doubt either has been tuned or designed with the way you are using it in mind (i.e. a single tree,
min_rows
set to 1). In this case it looks like R's gbm has used its single tree on learning the "B" inputs correctly, whileh2o.gbm
has concentrated on the "A" inputs.When you start using real data and real settings, there may still be differences. There are a lot of parameters you are not touching (with
h2o.gbm()
at least, which is the one I'm familiar with). And there is also a stochastic element: try a hundred values ofseed
to h2o.gbm(), and a constantset.seed()
before R'sgbm
, and you will likely hit the same results on at least one of them.