fisher_exact#
- scipy.stats.fisher_exact(table, alternative='two-sided')[source]#
Perform a Fisher exact test on a 2x2 contingency table.
The null hypothesis is that the true odds ratio of the populations underlying the observations is one, and the observations were sampled from these populations under a condition: the marginals of the resulting table must equal those of the observed table. The statistic returned is the unconditional maximum likelihood estimate of the odds ratio, and the p-value is the probability under the null hypothesis of obtaining a table at least as extreme as the one that was actually observed. There are other possible choices of statistic and two-sided p-value definition associated with Fisher’s exact test; please see the Notes for more information.
- Parameters:
- tablearray_like of ints
A 2x2 contingency table. Elements must be non-negative integers.
- alternative{‘two-sided’, ‘less’, ‘greater’}, optional
Defines the alternative hypothesis. The following options are available (default is ‘two-sided’):
‘two-sided’: the odds ratio of the underlying population is not one
‘less’: the odds ratio of the underlying population is less than one
‘greater’: the odds ratio of the underlying population is greater than one
See the Notes for more details.
- Returns:
- resSignificanceResult
An object containing attributes:
- statisticfloat
This is the prior odds ratio, not a posterior estimate.
- pvaluefloat
The probability under the null hypothesis of obtaining a table at least as extreme as the one that was actually observed.
- Raises:
- ValueError
If table is not a 2x2 contingency table with non-negative entries.
See also
chi2_contingency
Chi-square test of independence of variables in a contingency table. This can be used as an alternative to
fisher_exact
when the numbers in the table are large.contingency.odds_ratio
Compute the odds ratio (sample or conditional MLE) for a 2x2 contingency table.
barnard_exact
Barnard’s exact test, which is a more powerful alternative than Fisher’s exact test for 2x2 contingency tables.
boschloo_exact
Boschloo’s exact test, which is a more powerful alternative than Fisher’s exact test for 2x2 contingency tables.
- Fisher’s exact test
Extended example
Notes
Null hypothesis and p-values
The null hypothesis is that the true odds ratio of the populations underlying the observations is one, and the observations were sampled at random from these populations under a condition: the marginals of the resulting table must equal those of the observed table. Equivalently, the null hypothesis is that the input table is from the hypergeometric distribution with parameters (as used in
hypergeom
)M = a + b + c + d
,n = a + b
andN = a + c
, where the input table is[[a, b], [c, d]]
. This distribution has supportmax(0, N + n - M) <= x <= min(N, n)
, or, in terms of the values in the input table,min(0, a - d) <= x <= a + min(b, c)
.x
can be interpreted as the upper-left element of a 2x2 table, so the tables in the distribution have form:[ x n - x ] [N - x M - (n + N) + x]
For example, if:
table = [6 2] [1 4]
then the support is
2 <= x <= 7
, and the tables in the distribution are:[2 6] [3 5] [4 4] [5 3] [6 2] [7 1] [5 0] [4 1] [3 2] [2 3] [1 4] [0 5]
The probability of each table is given by the hypergeometric distribution
hypergeom.pmf(x, M, n, N)
. For this example, these are (rounded to three significant digits):x 2 3 4 5 6 7 p 0.0163 0.163 0.408 0.326 0.0816 0.00466
These can be computed with:
>>> import numpy as np >>> from scipy.stats import hypergeom >>> table = np.array([[6, 2], [1, 4]]) >>> M = table.sum() >>> n = table[0].sum() >>> N = table[:, 0].sum() >>> start, end = hypergeom.support(M, n, N) >>> hypergeom.pmf(np.arange(start, end+1), M, n, N) array([0.01631702, 0.16317016, 0.40792541, 0.32634033, 0.08158508, 0.004662 ])
The two-sided p-value is the probability that, under the null hypothesis, a random table would have a probability equal to or less than the probability of the input table. For our example, the probability of the input table (where
x = 6
) is 0.0816. The x values where the probability does not exceed this are 2, 6 and 7, so the two-sided p-value is0.0163 + 0.0816 + 0.00466 ~= 0.10256
:>>> from scipy.stats import fisher_exact >>> res = fisher_exact(table, alternative='two-sided') >>> res.pvalue 0.10256410256410257
The one-sided p-value for
alternative='greater'
is the probability that a random table hasx >= a
, which in our example isx >= 6
, or0.0816 + 0.00466 ~= 0.08626
:>>> res = fisher_exact(table, alternative='greater') >>> res.pvalue 0.08624708624708627
This is equivalent to computing the survival function of the distribution at
x = 5
(one less thanx
from the input table, because we want to include the probability ofx = 6
in the sum):>>> hypergeom.sf(5, M, n, N) 0.08624708624708627
For
alternative='less'
, the one-sided p-value is the probability that a random table hasx <= a
, (i.e.x <= 6
in our example), or0.0163 + 0.163 + 0.408 + 0.326 + 0.0816 ~= 0.9949
:>>> res = fisher_exact(table, alternative='less') >>> res.pvalue 0.9953379953379957
This is equivalent to computing the cumulative distribution function of the distribution at
x = 6
:>>> hypergeom.cdf(6, M, n, N) 0.9953379953379957
Odds ratio
The calculated odds ratio is different from the value computed by the R function
fisher.test
. This implementation returns the “sample” or “unconditional” maximum likelihood estimate, whilefisher.test
in R uses the conditional maximum likelihood estimate. To compute the conditional maximum likelihood estimate of the odds ratio, usescipy.stats.contingency.odds_ratio
.References
[1]Fisher, Sir Ronald A, “The Design of Experiments: Mathematics of a Lady Tasting Tea.” ISBN 978-0-486-41151-4, 1935.
[2]“Fisher’s exact test”, https://en.wikipedia.org/wiki/Fisher’s_exact_test
Examples
>>> from scipy.stats import fisher_exact >>> res = fisher_exact([[8, 2], [1, 5]]) >>> res.statistic 20.0 >>> res.pvalue 0.034965034965034975
For a more detailed example, see Fisher’s exact test.