I know the imbalance in an image classification problem such as the cat vs dog classification,if there are too many cat images and too few dog images. But I don't know how to adress an imbalance in a segmentation problem.
For example,my task is to mask cloud cover from satellite images, so I transform the problem to two classes of segmentation, one is cloud, the other is background. The dataset has 5800 4-band-16bits images with size of 256*256. The architecture is Segnet, the loss function is binary crossentropy.
There are two cases assumed:
- Half of all samples is covered fully by clouds, half is without any cloud.
- In every image, half is covered by cloud, half is not.
So,case 2 is balanced I guess, but how about case 1?
In reality and my task, the two cases are impossible in source satellite image since the cloud cover is always relative small against the background, but if the image samples are cropped from source images because of their big size, some new cases emerge.
So, the samples always contain three types of images:
- fully covered by clouds (254 in 5800 samples).
- without any cloud (1241 in 5800 samples).
- some areas covered by cloud, some areas not. (4305 in 5800, but I don't know the cloud percentage, maybe very high in some samples, maybe little in other samples)
My question:
Are the samples imbalanced and what should I do?
Thanks in advance.
 
                        
Usually, in segmentation tasks one considers his/hers samples "balanced" if for each image the number of pixels belonging to each class/segment is roughly the same (case 2 in your question).
In most cases, the samples are never balanced, like in your example.
What can go wrong? when there is one segment/class that dominates the samples, the model might find it easier to output all pixels as belonging to the dominant class/segment. This constant prediction although not informative can still yield high accuracy and small loss.
How can I detect such faulty result? You can make
"Accuracy"layer output not only the overall accuracy, but also the per-class accuracy. If your model is "locked" on a single class the per-class accuracy of all other classes will be very low.What can I do? You can use
"InfogainLoss"layer to give more weight to errors on other classes to counter the effect of the dominant class.