scipy.stats.nhypergeom = <scipy.stats._discrete_distns.nhypergeom_gen object>[source]#

A negative hypergeometric discrete random variable.

Consider a box containing \(M\) balls:, \(n\) red and \(M-n\) blue. We randomly sample balls from the box, one at a time and without replacement, until we have picked \(r\) blue balls. nhypergeom is the distribution of the number of red balls \(k\) we have picked.

As an instance of the rv_discrete class, nhypergeom object inherits from it a collection of generic methods (see below for the full list), and completes them with details specific for this particular distribution.

See also

hypergeom, binom, nbinom


The symbols used to denote the shape parameters (M, n, and r) are not universally accepted. See the Examples for a clarification of the definitions used here.

The probability mass function is defined as,

\[f(k; M, n, r) = \frac{{{k+r-1}\choose{k}}{{M-r-k}\choose{n-k}}} {{M \choose n}}\]

for \(k \in [0, n]\), \(n \in [0, M]\), \(r \in [0, M-n]\), and the binomial coefficient is:

\[\binom{n}{k} \equiv \frac{n!}{k! (n - k)!}.\]

It is equivalent to observing \(k\) successes in \(k+r-1\) samples with \(k+r\)’th sample being a failure. The former can be modelled as a hypergeometric distribution. The probability of the latter is simply the number of failures remaining \(M-n-(r-1)\) divided by the size of the remaining population \(M-(k+r-1)\). This relationship can be shown as:

\[NHG(k;M,n,r) = HG(k;M,n,k+r-1)\frac{(M-n-(r-1))}{(M-(k+r-1))}\]

where \(NHG\) is probability mass function (PMF) of the negative hypergeometric distribution and \(HG\) is the PMF of the hypergeometric distribution.

The probability mass function above is defined in the “standardized” form. To shift distribution use the loc parameter. Specifically, nhypergeom.pmf(k, M, n, r, loc) is identically equivalent to nhypergeom.pmf(k - loc, M, n, r).



Negative Hypergeometric Distribution on Wikipedia


Negative Hypergeometric Distribution from


>>> import numpy as np
>>> from scipy.stats import nhypergeom
>>> import matplotlib.pyplot as plt

Suppose we have a collection of 20 animals, of which 7 are dogs. Then if we want to know the probability of finding a given number of dogs (successes) in a sample with exactly 12 animals that aren’t dogs (failures), we can initialize a frozen distribution and plot the probability mass function:

>>> M, n, r = [20, 7, 12]
>>> rv = nhypergeom(M, n, r)
>>> x = np.arange(0, n+2)
>>> pmf_dogs = rv.pmf(x)
>>> fig = plt.figure()
>>> ax = fig.add_subplot(111)
>>> ax.plot(x, pmf_dogs, 'bo')
>>> ax.vlines(x, 0, pmf_dogs, lw=2)
>>> ax.set_xlabel('# of dogs in our group with given 12 failures')
>>> ax.set_ylabel('nhypergeom PMF')

Instead of using a frozen distribution we can also use nhypergeom methods directly. To for example obtain the probability mass function, use:

>>> prb = nhypergeom.pmf(x, M, n, r)

And to generate random numbers:

>>> R = nhypergeom.rvs(M, n, r, size=10)

To verify the relationship between hypergeom and nhypergeom, use:

>>> from scipy.stats import hypergeom, nhypergeom
>>> M, n, r = 45, 13, 8
>>> k = 6
>>> nhypergeom.pmf(k, M, n, r)
>>> hypergeom.pmf(k, M, n, k+r-1) * (M - n - (r-1)) / (M - (k+r-1))


rvs(M, n, r, loc=0, size=1, random_state=None)

Random variates.

pmf(k, M, n, r, loc=0)

Probability mass function.

logpmf(k, M, n, r, loc=0)

Log of the probability mass function.

cdf(k, M, n, r, loc=0)

Cumulative distribution function.

logcdf(k, M, n, r, loc=0)

Log of the cumulative distribution function.

sf(k, M, n, r, loc=0)

Survival function (also defined as 1 - cdf, but sf is sometimes more accurate).

logsf(k, M, n, r, loc=0)

Log of the survival function.

ppf(q, M, n, r, loc=0)

Percent point function (inverse of cdf — percentiles).

isf(q, M, n, r, loc=0)

Inverse survival function (inverse of sf).

stats(M, n, r, loc=0, moments=’mv’)

Mean(‘m’), variance(‘v’), skew(‘s’), and/or kurtosis(‘k’).

entropy(M, n, r, loc=0)

(Differential) entropy of the RV.

expect(func, args=(M, n, r), loc=0, lb=None, ub=None, conditional=False)

Expected value of a function (of one argument) with respect to the distribution.

median(M, n, r, loc=0)

Median of the distribution.

mean(M, n, r, loc=0)

Mean of the distribution.

var(M, n, r, loc=0)

Variance of the distribution.

std(M, n, r, loc=0)

Standard deviation of the distribution.

interval(confidence, M, n, r, loc=0)

Confidence interval with equal areas around the median.