How to fix "Actual column must contain binary class labels, but found cardinality" error in R when fitting the h2o model?

76 views Asked by At

I am getting an error. It occurs when I use the deep learning function in h2o in R. My response is a categorical variable which takes on 3 values so I can't change it to binary labels.

Error: java.lang.IllegalArgumentException: Actual column must contain binary class labels, but found cardinality 3!

This is my input:

h2o.init()
dat_h20 = data.frame(Event=as.factor(space_data$Event), TrajA= space_data$TrajA, AcousticA = space_data$AcousticA, HullScan= as.factor(space_data$HullScan), MCStatus = as.factor(space_data$MCStatus))
set.seed(2023)

set = sample(1:150, 150 , replace = FALSE)
data_train = as.h2o(dat_h20[set,])
head(data_train)
data_val = as.h2o(dat_h20[-set,])

value = exp(seq(-10,-3, length = 20))
value
validation_errors = numeric(20) # validation error for each regularisation parameter
?h2o.deeplearning
dat_h20[1]
for (i in 1:length(value))
{
model = h2o.deeplearning(x = 2:5, y = 1 ,
                         training_frame = data_train, 
                         validation_frame = data_val,
                         standardize = TRUE, 
                         hidden = c(5,5), 
                         activation = 'Rectifier', 
                         distribution = 'multinomial',
                         loss = 'CrossEntropy',
                         l2 = value[i],
                         rate = 0.01,
                         adaptive_rate = FALSE,
                         epochs = 1000,
                         reproducible = TRUE,
                         seed = 2,
                         )
validation_errors[i]= h2o.logloss(model, train = TRUE, valid = TRUE)
}

plot(value, validation_errors)
2

There are 2 answers

2
Luis Felipe On

Ensure Multi-Class Setting: In H2O's deep learning function, the default behavior for classification is to expect binary labels. For multi-class classification, the system should automatically handle it when provided with a categorical response column with more than two levels. If you've encoded your response variable as numeric, this might be the root of the problem. It's preferable to leave it as a factor

1
Wendy On

Yes, Luis Felipe is correct. Please make sure

  1. you set distribution="multinomial"
  2. change your dat_h20 to H2O frame as data_h2o <- as.h2o(dat_h20)
  3. Make sure the response column is a factor: data_h2o[,4] <- as.factor(data_h2o[,4] assuming that the fourth column is the response column.

Normally if you have 2 and 3, H2O deeplearning will be able to figure out that it is a multiclass classification automatically.