I am want to sample from the binomial distribution B(n,p) but with an additional constraint that the sampled value belongs in the range [a,b] (instead of the normal 0 to n range). In other words, I have to sample a value from binomial distribution given that it lies in the range [a,b]. Mathematically, I can write the pmf of this distribution (f(x)
) in terms of the pmf of binomial distribution bin(x) = [(nCx)*(p)^x*(1-p)^(n-x)]
as
sum = 0
for i in range(a,b+1):
sum += bin(i)
f(x) = bin(x)/sum
One way of sampling from this distribution is to sample a uniformly distributed number and apply the inverse of the CDF(obtained using the pmf). However, I don't think this is a good idea as the pmf calculation would easily get very time-consuming.
The values of n,x,a,b
are quite large in my case and this way of computing pmf and then using a uniform random variable to generate the sample seems extremely inefficient due to the factorial terms in nCx
.
What's a nice/efficient way to achieve this?
This is a way to collect all the values of
bin
in a pretty short time:It can be done in a quarter of microsecond for
n=1000
.Sample run:
You can sum specific parts of this array like so:
Remark: For
n>1000
middle values of this distribution requires to use extremely large numbers in multiplication thereforeRuntimeWarning
is raised.Bugfix
You can use
scipy.stats.binom
equivalently:This does the same as above mentioned method quite efficiently (
n=1000000
in a third of second). Alternatively, you can usebinom.cdf(np.arange(n+1), n, p)
which calculate cumulative sum ofbinom.pmf
. Then subtraction ofb
th anda
th items of this array gives an output which is very close to what you expect.