Ruby version of gamma.fit from scipy.stats

89 views Asked by At

As the title suggests, I am trying to find a function that can take an array of floats and find a distribution that fits my data.

From here I'll use it to find the CDF of new data I am passing it.

I have installed and looked through the sciruby Distribution and NArray docs but nothing appears to match the 'fit' method

The python code looks like this

# Approach 2: Model-based percentiles.
# Step 1: Find a Gamma distribution that fits your data 
alpha, _, beta = stats.gamma.fit(data, floc = 0.)

# Step 2: Use that distribution's CDF to get percentiles.
scores = 100-100*stats.gamma.cdf(new_data, a = alpha, scale=beta)
print(scores)

Thank you in advance

1

There are 1 answers

0
bubbaspaarx On BEST ANSWER

After a deep dive into other packages and a lot of help from someone from the 'Cross Validated' forum, I have the answer needed.

In order to obtain the needed 'alpha' and 'beta' values that will give the shape and rate of the gamma distribution, you will need to discover what the 'variance' value is in the data.

There are a few approaches to achieving this. See here for more information;

Code examples;

data = [<insert your numbers>]
sum = data.sum
sum_square_mean = (sum**2) / data.size
all_square = data.map { |n| n**2 }.sum
net_square = all_square - sum_square_mean
minus_one = data.size - 1
variance = net_square / minus_one
mean = data.sum(0.0) / data.size
mean_squared = mean**2
alpha = mean_squared / variance
beta = mean / variance
theta = variance / mean

The line 'minus_one' isn't completely necessary but it's done in statistics to reduce the error rate. Look up Bessels correction. You can just get variance from net_square / data.size.

Second option using the 'descriptive_statistics' gem

require('descriptive_statistics')
# doesn't account for bessel's correction

@alpha = (data.mean**2) / data.variance
@beta = data.mean / data.variance
@theta = data.variance / data.mean

Once you have these values, you can use the cdf function from the Distribution Gem , docs here

The next stage is then to pass the values into this function which will return a percentile.

Make sure to use the '1 over beta' calculation or it won't work

percentile = 100 - (100 * Distribution::Gamma::Ruby_.cdf(x, alpha, 1 / beta))

You may have noticed I have also calculated @theta

This was for a separate function that means I can also return the value from my gamma distribution by passing in the percentile. Used like so

value = Distribution::Gamma.quantile(0.5, alpha, theta)

This function is also known as 'inverse cdf', 'inverse cumulative distribution function', 'probability point function' or 'percentile point function'. Here it is simply named 'quantile'.

For more information on gamma distributions, please see the wiki

Gamma Distribution