I have a data set of n = 1000 realizations of a random variable X and is univariate -- X = {x1, x2,...,xn}
. Data is generated by varying a parameter on which the random variable depends. For example, let the r.v be Area of a circle. So, by varying the radius (keeping the dimension fixed - say 2 dimensional circle) I generate n
area for radius in the range r = 5 to n
.
By using fitdist command I can fit distribution to the data set choosing distributions like Normal, Kernel, Binomial etc. Thus, data set is fitted to k distribution. So, I get k distributions. How do I select the Best fit distribution and hence the pdf ?
Also, do I need to normalize (post process) the data always in the range [0,1] before fitting?
If I understand correctly, you are asking how to decide which distribution to choose once you have a few fits.
There are three major metrics (IMO) for measuring "goodness-of-fit":
Which to choose depends on a large number of factors; you can randomly pick one or read the Wiki pages to figure out which suits your need. These tests are also a part of MATLAB.
For instance, you can use
kstest
for the Kolmogrov-Smirnov test. You can provide the data and the hypothesized distribution to the function and evaluate the different options based on the KS test.Alternately, you can use Anderson-Darling through
adtest
or Chi-Squared throughchi2gof
.