How to split dataset in multiclass classification task of computer vision?

34 views Asked by At

I am generally talking about Zero-shot Learning. I feel that the current data splitting method for multi-task classification is not very reasonable because the validation set and the test set contain completely different classes. This can easily lead to parameters tuned on the validation set performing poorly on the test set, making it challenging to select parameters that truly yield high performance.

As far as I can see, the only solution is to tune the parameters on the validation set and also assess performance on the test set simultaneously. Otherwise, it's hard to know where the model is heading on its own. However, this approach is not very standard and is almost equivalent to tuning parameters directly on the test set. Of course, my current understanding is limited to the field of video classification. I'm not sure if other fields follow the same classification approach.

1

There are 1 answers

4
Paplepel93 On

In machine learning tuning parameters on the test set is always a bad idea. The only way to get a reasonable approximation on the generalisability of your model is by testing it on unseen data. As soon as you base any decision of the modelling process on the test set, you introduce bias, therefore degrading the approximation of the true generalisability.

I dont really follow why you need to asses performance on the test set to know where the model is heading. This can also be done on the validation set.

I have encountered instances before where I needed a second validation set, therefore dividing my dataset in 4 chunks (train, val1, val2, test). Its not standard, but maybe this can be a solution for your situation.