As the title suggests, I am trying to find a function that can take an array of floats and find a distribution that fits my data.
From here I'll use it to find the CDF of new data I am passing it.
I have installed and looked through the sciruby Distribution and NArray docs but nothing appears to match the 'fit' method
The python code looks like this
# Approach 2: Model-based percentiles.
# Step 1: Find a Gamma distribution that fits your data
alpha, _, beta = stats.gamma.fit(data, floc = 0.)
# Step 2: Use that distribution's CDF to get percentiles.
scores = 100-100*stats.gamma.cdf(new_data, a = alpha, scale=beta)
print(scores)
Thank you in advance
After a deep dive into other packages and a lot of help from someone from the 'Cross Validated' forum, I have the answer needed.
In order to obtain the needed 'alpha' and 'beta' values that will give the shape and rate of the gamma distribution, you will need to discover what the 'variance' value is in the data.
There are a few approaches to achieving this. See here for more information;
Code examples;
The line 'minus_one' isn't completely necessary but it's done in statistics to reduce the error rate. Look up Bessels correction. You can just get variance from net_square / data.size.
Second option using the 'descriptive_statistics' gem
Once you have these values, you can use the cdf function from the Distribution Gem , docs here
The next stage is then to pass the values into this function which will return a percentile.
Make sure to use the '1 over beta' calculation or it won't work
You may have noticed I have also calculated @theta
This was for a separate function that means I can also return the value from my gamma distribution by passing in the percentile. Used like so
This function is also known as 'inverse cdf', 'inverse cumulative distribution function', 'probability point function' or 'percentile point function'. Here it is simply named 'quantile'.
For more information on gamma distributions, please see the wiki
Gamma Distribution