I was trying to learn using SciPy and I want to create a custom distribution with the pdf defined as :

f(x) = k * x ^ (k-1) if 0 < x < 1 else 0

Based on the documentation on the scipy website I've defined the custom distribution as:

k = 7

class custom_distro(stats.rv_continuous):
  def _pdf(self, x):
    return k*math.pow(x,k-1) if (0<=x<=1) else 0

Now the custom_distro.pdf() and custom_distro.cdf() functions are working fine, but using the function .rvs() gives the error:

The function value at x=nan is NaN; solver cannot continue.

I've tried to search for what the issue is but I cannot find any helpful suggestion.

I'd also request that if you have some good resources for the library please do suggest.

1

There are 1 answers

1
Matt Haberland On BEST ANSWER

Now the custom_distro.pdf() and custom_distro.cdf() functions are working fine

This is surprising, because custom_distro is a subclass of rv_continuous, but each SciPy (continuous) distributions is an instance of a subclasses of rv_continuous. For example norm_gen is a subclass of rv_continuous, and scipy.stats.norm is an instance of norm_gen. (It's weird, I know.)

Another part of the problem may have been that your distribution has finite support (your _pdf returned 0 when x was outside the interval [0, 1]), but it is better to define the support of the distribution using the appropriate feature of the infrastructure. The infrastructure will take care of ensuring that the pdf method returns 0 when the argument is outside the support, etc.

Here is code for which the rvs method works.

from scipy import stats
import math

# note that the class is named with _gen appended
class custom_distro_gen(stats.rv_continuous):
  def _pdf(self, x, k):
    return k*math.pow(x, k-1)
    # I'd recommend using operators, e.g.
    # k * x**(k-1)
    # Then you can use NumPy arrays
    # But if `k` can be 0 or less, you'll need to make
    # sure `x` is not of integer type or you could get an
    # error (can't evaluate integer to negative integer power)

  # consider defining the `_argcheck` method if there
  # are restrictions on the parameter `k`

# `custom_distro` should be an instance of the `custom_distro_gen` class
# `a` and `b` define the support of the distribution; the value of the pdf
# is zero outside of the support. 
custom_distro = custom_distro_gen(name='custom_distro', a=0, b=1)
custom_distro.rvs(k=2.)  # 0.875097877164333

# freeze the distribution with k=2
dist = custom_distro(k=2.)
dist.rvs(size=5)  # array([0.89701868, 0.65968075, 0.29795727, 0.56256978, 0.25515445])

There is some additional information about adding new distributions at Adding a New Statistics Distribution. Looking at PRs in which distributions were added will also be helpful, e.g. gh-12694.

I was trying to learn using SciPy...

Creating custom distributions is rather advanced. The process is poorly documented, and there are many issues with the infrastructure, so I would not recommend starting with this. Consider following one of the other tutorials.