anderson_ksamp#
- scipy.stats.anderson_ksamp(samples, midrank=<object object>, *, variant=<object object>, method=None)[source]#
The Anderson-Darling test for k-samples.
The k-sample Anderson-Darling test is a modification of the one-sample Anderson-Darling test. It tests the null hypothesis that k-samples are drawn from the same population without having to specify the distribution function of that population. The critical values depend on the number of samples.
- Parameters:
- samplessequence of 1-D array_like
Array of sample data in arrays.
- midrankbool, optional
Variant of Anderson-Darling test which is computed. Default (True) is the midrank test applicable to continuous and discrete populations. If False, the right side empirical distribution is used.
- variant{‘midrank’, ‘right’, ‘continuous’}
Variant of Anderson-Darling test to be computed.
'midrank'is applicable to both continuous and discrete populations.'discrete'and'continuous'perform alternative versions of the test for discrete and continuous populations, respectively. When variant is specified, the return object will not be unpackable as a tuple, and only attributesstatisticandpvaluewill be present.- methodPermutationMethod, optional
Defines the method used to compute the p-value. If method is an instance of
PermutationMethod, the p-value is computed usingscipy.stats.permutation_testwith the provided configuration options and other appropriate settings. Otherwise, the p-value is interpolated from tabulated values.
- Returns:
- resAnderson_ksampResult
An object containing attributes:
- statisticfloat
Normalized k-sample Anderson-Darling test statistic.
- critical_valuesarray
The critical values for significance levels 25%, 10%, 5%, 2.5%, 1%, 0.5%, 0.1%.
- pvaluefloat
The approximate p-value of the test. If method is not provided, the value is floored / capped at 0.1% / 25%.
- Raises:
- ValueError
If fewer than 2 samples are provided, a sample is empty, or no distinct observations are in the samples.
Notes
[1] defines three versions of the k-sample Anderson-Darling test: one for continuous distributions and two for discrete distributions, in which ties between samples may occur. The default of this routine is to compute the version based on the midrank empirical distribution function. This test is applicable to continuous and discrete data. If variant is set to
'discrete', the right side empirical distribution is used for a test for discrete data; if variant is'continuous', the same test statistic and p-value are computed for data with no ties, but with less computation. According to [1], the two discrete test statistics differ only slightly if a few collisions due to round-off errors occur in the test not adjusted for ties between samples.The critical values corresponding to the significance levels from 0.01 to 0.25 are taken from [1]. p-values are floored / capped at 0.1% / 25%. Since the range of critical values might be extended in future releases, it is recommended not to test
p == 0.25, but ratherp >= 0.25(analogously for the lower bound).Added in version 0.14.0.
Array API Standard Support
anderson_ksamphas experimental support for Python Array API Standard compatible backends in addition to NumPy. Please consider testing these features by setting an environment variableSCIPY_ARRAY_API=1and providing CuPy, PyTorch, JAX, or Dask arrays as array arguments. The following combinations of backend and device (or other capability) are supported.Library
CPU
GPU
NumPy
✅
n/a
CuPy
n/a
⛔
PyTorch
⛔
⛔
JAX
⛔
⛔
Dask
⛔
n/a
See Support for the array API standard for more information.
References
Examples
>>> import numpy as np >>> from scipy import stats >>> rng = np.random.default_rng() >>> res = stats.anderson_ksamp([rng.normal(size=50), rng.normal(loc=0.5, size=30)], ... variant='midrank') >>> res.statistic, res.pvalue (3.4444310693448936, 0.013106682406720973)
The null hypothesis that the two random samples come from the same distribution can be rejected at the 5% level because the returned p-value is less than 0.05, but not at the 1% level.
>>> samples = [rng.normal(size=50), rng.normal(size=30), ... rng.normal(size=20)] >>> res = stats.anderson_ksamp(samples, variant='continuous') >>> res.statistic, res.pvalue (-0.6309662273193832, 0.25)
As we might expect, the null hypothesis cannot be rejected here for three samples from an identical distribution. The reported p-value (25%) has been capped at the maximum value for which pre-computed p-values are available.
In such cases where the p-value is capped or when sample sizes are small, a permutation test may be more accurate.
>>> method = stats.PermutationMethod(n_resamples=9999, random_state=rng) >>> res = stats.anderson_ksamp(samples, variant='continuous', method=method) >>> res.pvalue 0.699