Matlab: What are the ways to determine the distribution of the data

1.2k views Asked by At

I have a data set of n = 1000 realizations of a random variable X and is univariate -- X = {x1, x2,...,xn}. Data is generated by varying a parameter on which the random variable depends. For example, let the r.v be Area of a circle. So, by varying the radius (keeping the dimension fixed - say 2 dimensional circle) I generate n area for radius in the range r = 5 to n.

By using fitdist command I can fit distribution to the data set choosing distributions like Normal, Kernel, Binomial etc. Thus, data set is fitted to k distribution. So, I get k distributions. How do I select the Best fit distribution and hence the pdf ?

Also, do I need to normalize (post process) the data always in the range [0,1] before fitting?

1

There are 1 answers

3
Nitish On BEST ANSWER

If I understand correctly, you are asking how to decide which distribution to choose once you have a few fits.

There are three major metrics (IMO) for measuring "goodness-of-fit":

Which to choose depends on a large number of factors; you can randomly pick one or read the Wiki pages to figure out which suits your need. These tests are also a part of MATLAB.

For instance, you can use kstest for the Kolmogrov-Smirnov test. You can provide the data and the hypothesized distribution to the function and evaluate the different options based on the KS test.

Alternately, you can use Anderson-Darling through adtest or Chi-Squared through chi2gof.