scipy.stats.

cramervonmises_2samp#

scipy.stats.cramervonmises_2samp(x, y, method='auto', *, axis=0, nan_policy='propagate', keepdims=False)[source]#

Perform the two-sample Cramér-von Mises test for goodness of fit.

This is the two-sample version of the Cramér-von Mises test ([1]): for two independent samples \(X_1, ..., X_n\) and \(Y_1, ..., Y_m\), the null hypothesis is that the samples come from the same (unspecified) continuous distribution.

The test statistic \(T\) is defined as in [1]:

\[T = \frac{nm}{n+m}\omega^2 = \frac{U}{n m (n+m)} - \frac{4 m n - 1}{6(m+n)}\]

where \(U\) is defined as below, and \(\omega^2\) is the Cramér-von Mises criterion. The function \(r(\cdot)\) here denotes the rank of the observed values \(x_i\) and \(y_j\) within the pooled sample of size \(n + m\), with ties assigned mid-rank values:

\[U = n \sum_{i=1}^n (r(x_i)-i)^2 + m \sum_{j=1}^m (r(y_j)-j)^2\]

Parameters:

xarray_like

A 1-D array of observed values of the random variables \(X_i\). Must contain at least two observations.

yarray_like

A 1-D array of observed values of the random variables \(Y_i\). Must contain at least two observations.

method{‘auto’, ‘asymptotic’, ‘exact’}, optional

The method used to compute the p-value, see Notes for details. The default is ‘auto’.

axisint or None, default: 0

If an int, the axis of the input along which to compute the statistic. The statistic of each axis-slice (e.g. row) of the input will appear in a corresponding element of the output. If None, the input will be raveled before computing the statistic.

nan_policy{‘propagate’, ‘omit’, ‘raise’}

Defines how to handle input NaNs.

propagate: if a NaN is present in the axis slice (e.g. row) along which the statistic is computed, the corresponding entry of the output will be NaN.
omit: NaNs will be omitted when performing the calculation. If insufficient data remains in the axis slice along which the statistic is computed, the corresponding entry of the output will be NaN.
raise: if a NaN is present, a ValueError will be raised.

keepdimsbool, default: False

If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.

Returns:

resobject with attributes

statisticfloat: Cramér-von Mises statistic \(T\).
pvaluefloat: The p-value.

See also

cramervonmises, anderson_ksamp, epps_singleton_2samp, ks_2samp

Notes

Added in version 1.7.0.

The statistic is computed according to equation 9 in [2]. The calculation of the p-value depends on the keyword method:

asymptotic: The p-value is approximated by using the limiting distribution of the test statistic.
exact: The exact p-value is computed by enumerating all possible combinations of the test statistic, see [2].

If method='auto', the exact approach is used if both samples contain equal to or less than 20 observations, otherwise the asymptotic distribution is used.

If the underlying distribution is not continuous, the p-value is likely to be conservative (Section 6.2 in [3]). When ranking the data to compute the test statistic, midranks are used if there are ties.

Beginning in SciPy 1.9, np.matrix inputs (not recommended for new code) are converted to np.ndarray before the calculation is performed. In this case, the output will be a scalar or np.ndarray of appropriate shape rather than a 2D np.matrix. Similarly, while masked elements of masked arrays are ignored, the output will be a scalar or np.ndarray rather than a masked array with mask=False.

Array API Standard Support

cramervonmises_2samp has experimental support for Python Array API Standard compatible backends in addition to NumPy. Please consider testing these features by setting an environment variable SCIPY_ARRAY_API=1 and providing CuPy, PyTorch, JAX, or Dask arrays as array arguments. The following combinations of backend and device (or other capability) are supported.

Library	CPU	GPU
NumPy	✅	n/a
CuPy	n/a	⛔
PyTorch	✅	⛔
JAX	⚠️ no JIT	⛔
Dask	⛔	n/a

cramervonmises_2samp also accepts MArrays backed by the backends indicated above; masked values will be treated as though they were not present. Only method=’exact’ is compatible with MArray input.

See Support for the array API standard for more information.

References

[1] (1,2)

https://en.wikipedia.org/wiki/Cramer-von_Mises_criterion

[2] (1,2)

Anderson, T.W. (1962). On the distribution of the two-sample Cramer-von-Mises criterion. The Annals of Mathematical Statistics, pp. 1148-1159.

[3]

Conover, W.J., Practical Nonparametric Statistics, 1971.

Examples

Suppose we wish to test whether two samples generated by scipy.stats.norm.rvs have the same distribution. We choose a significance level of alpha=0.05.

>>> import numpy as np
>>> from scipy import stats
>>> rng = np.random.default_rng()
>>> x = stats.norm.rvs(size=100, random_state=rng)
>>> y = stats.norm.rvs(size=70, random_state=rng)
>>> res = stats.cramervonmises_2samp(x, y)
>>> res.statistic, res.pvalue
(0.29376470588235293, 0.1412873014573014)

The p-value exceeds our chosen significance level, so we do not reject the null hypothesis that the observed samples are drawn from the same distribution.

For small sample sizes, one can compute the exact p-values:

>>> x = stats.norm.rvs(size=7, random_state=rng)
>>> y = stats.t.rvs(df=2, size=6, random_state=rng)
>>> res = stats.cramervonmises_2samp(x, y, method='exact')
>>> res.statistic, res.pvalue
(0.197802197802198, 0.31643356643356646)

The p-value based on the asymptotic distribution is a good approximation even though the sample size is small.

>>> res = stats.cramervonmises_2samp(x, y, method='asymptotic')
>>> res.statistic, res.pvalue
(0.197802197802198, 0.2966041181527128)

Independent of the method, one would not reject the null hypothesis at the chosen significance level in this example.