sampling ratio for imbalanced dataset

Question

sampling ratio for imbalanced dataset

207 views Asked by Sara At 04 December 2017 at 11:49

I have an imbalanced dataset that have two classes (+1,-1). The positives are only 7% of the dataset.

I want to classify using Desicion Trees. I have tried downsampling the negatives to:

The same size of the positives
The double or triple the size of the positives.

For all of them I got almost the same precision, however the recall of positives was much better for the first sample (negatives same size as positives). But I feel I'm missing something here so what is bad about this sampling??

Original Q&A

There are 1 answers

**Has QUIT--Anony-Mousse** · Answer 1 · 2017-12-16T14:09:50+00:00

It is fairly common to downsample a dominant class.

But you need to make sure to solve your actual problem.

If you downsample your classes to a 1:1 ratio that may make certain evaluation appear good, but does this still reflect reality? You classifier is trained to predict positive in 50% of cases, but only 3% are positive. If "false positives" cost you a lot of money, this can be a problem.

TechQA.

sampling ratio for imbalanced dataset

There are 1 answers

Related Questions in RANDOM

Related Questions in MACHINE-LEARNING

Related Questions in STATISTICS

Related Questions in CLASSIFICATION

Related Questions in STATISTICAL-SAMPLING

Popular Questions

Popular Tags

Trending Questions