Multilabel classification using H20.ai

424 views Asked by At

We are testing out the capabilities of driverless AI. One of our first datasets is like this. X1,X2.... X400, Y1,Y2...Y200
Here we want to do multi-label classification on our dataset. However, in the driverless AI web client, there is only an option to specify only one target.

Another alternative , I tried was concating all the Y variables into a single list. enter image description here
However, instead of predicting each Y variable, h20.ai just treats every sequence of number as a class.
Like if there was 3 Y variables.
then [0 0 1] and [0 1 0] and so on till 8 classes.
Then while training, it just complains that some of these 8 classes dont have enough rows and drops them. In my case, i have over 200 Y variables, so it drops a lot of these classes.

How to do this in driverless AI?

1

There are 1 answers

1
Neema Mashayekhi On BEST ANSWER

Driverless AI does not support multi-label at the moment. One option would be to create a model for each class (which is what multi-class modeling does anyway). 200 Y variables/classes is a lot, so you may want to use the Python client to automate it, but that would take some time to run them all and evaluate. Maybe try it out for the top 5 classes and see how they perform. It may be helpful to consider reducing the 200 classes into groups to simplify it.