Given a dataset with a non-uniform distribution (highly peaked) I want to resample to create a new dataset with an approximately uniform distribution. My approach:
- Divide the data into bins.
- Target bin level = Smallest number of samples per bin, among all bins.
- Randomly delete samples until each bin count = target bin level.
Is there a better technique?
We know that for a uniform distribution we have
mean = (a+b) / 2
variance = (b-a)^2 / 12
So you could just construct these and sample from a uniform distribution with these parameters, where you either set a = min(data) and b = max(data) or maybe a = mean(lowest_bin) and b = mean(highest_bin) or something like that. How you want to set a and b depends on your data and what you want to accomplish