sampling ratio for imbalanced dataset

218 views Asked by At

I have an imbalanced dataset that have two classes (+1,-1). The positives are only 7% of the dataset.

I want to classify using Desicion Trees. I have tried downsampling the negatives to:

  1. The same size of the positives
  2. The double or triple the size of the positives.

For all of them I got almost the same precision, however the recall of positives was much better for the first sample (negatives same size as positives). But I feel I'm missing something here so what is bad about this sampling??

1

There are 1 answers

0
Has QUIT--Anony-Mousse On

It is fairly common to downsample a dominant class.

But you need to make sure to solve your actual problem.

If you downsample your classes to a 1:1 ratio that may make certain evaluation appear good, but does this still reflect reality? You classifier is trained to predict positive in 50% of cases, but only 3% are positive. If "false positives" cost you a lot of money, this can be a problem.