Combining Multi Class Wrapper with Sampling Wrappers in mlr to get subproblem specific sampling

102 views Asked by At

I face an imbalanced multi-class classification problem I am working on using the mlr package:

label samples
A 232
B 657
C 221
D 154

I would like to use different machine learning algorithms to solve it including some that do not support multi-class classification out of the box. Therefore I checked the mlr documentation and found the MulticlassWrapper to create binary "one vs one"problems from the underlying multi-class problem.

I would like the balancing to be performed specifically for every binary classification problem. So I do not want the same up or downsampling rate for every problem, but a different rate depending on the actual class distribution of this binary problem.

What I found in the mlr documentation for sampling purposes are the OversampleWrapper, UndersampleWrapper, OverbaggingWrapper, and SMOTEWrapper. The problem is they all demand a sampling rate that is applied and does not perform distribution-based sampling. So when I use one of these Wrappers like shown in the following snippet all binary subproblems are sampled with the same rate.

learner <- makeLearner("classif.rpart") # just an example 

over.lrn = makeOversampleWrapper(learner, osw.rate = 2)

multiclass.over.lrn = makeMulticlassWrapper(over.lrn, mcw.method = "onevsone")

multiclass.over.model <- train(multiclass.over.lrn, task.train)

E.g. the problem "A" vs "B" would be sampled to a distribution of 464 “A"s and 657 “B"s.

What I would like to get for “A" vs “B" is 657 “A"s and 657 “B"s. For “A" vs “C” it would be 232 “A"s and 232 “B"s and so on. So the sampling rate should not be fixed for all binary problems created by the MulticlassWrapper, but dynamic so e.g. for oversampling the minority class is oversampled to match the number of samples of the major class (vice versa for downsampling).

Is there a way to achieve this with the mlr package?

(And if not, and I would write my own sampling wrapper to achieve this, would this be something interesting in general to be included in the mlr package despite it is in considered retired by the team? Sorry, I am not experienced in contributing to open source projects).

0

There are 0 answers