scipy.stats.

spearmanr#

scipy.stats.spearmanr(a, b=None, axis=0, nan_policy='propagate', alternative='two-sided')[source]#

Calculate a Spearman correlation coefficient with associated p-value.

The Spearman rank-order correlation coefficient is a nonparametric measure of the monotonicity of the relationship between two datasets. Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact monotonic relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases.

The p-value roughly indicates the probability of an uncorrelated system producing datasets that have a Spearman correlation at least as extreme as the one computed from these datasets. Although calculation of the p-value does not make strong assumptions about the distributions underlying the samples, it is only accurate for very large samples (>500 observations). For smaller sample sizes, consider a permutation test (see Examples section below).

Parameters:

a, b1D or 2D array_like, b is optional

One or two 1-D or 2-D arrays containing multiple variables and observations. When these are 1-D, each represents a vector of observations of a single variable. For the behavior in the 2-D case, see under axis, below. Both arrays need to have the same length in the axis dimension.

axisint or None, optional

If axis=0 (default), then each column represents a variable, with observations in the rows. If axis=1, the relationship is transposed: each row represents a variable, while the columns contain observations. If axis=None, then both arrays will be raveled.

nan_policy{‘propagate’, ‘raise’, ‘omit’}, optional

Defines how to handle when input contains nan. The following options are available (default is ‘propagate’):

‘propagate’: returns nan
‘raise’: throws an error
‘omit’: performs the calculations ignoring nan values

alternative{‘two-sided’, ‘less’, ‘greater’}, optional

Defines the alternative hypothesis. Default is ‘two-sided’. The following options are available:

‘two-sided’: the correlation is nonzero
‘less’: the correlation is negative (less than zero)
‘greater’: the correlation is positive (greater than zero)

Added in version 1.7.0.

Returns:

resSignificanceResult

An object containing attributes:

statisticfloat or ndarray (2-D square): Spearman correlation matrix or correlation coefficient (if only 2 variables are given as parameters). Correlation matrix is square with length equal to total number of variables (columns or rows) in a and b combined.
pvaluefloat: The p-value for a hypothesis test whose null hypothesis is that two samples have no ordinal correlation. See alternative above for alternative hypotheses. pvalue has the same shape as statistic.

Raises:

ValueError: If axis is not 0, 1 or None, or if the number of dimensions of a is greater than 2, or if b is None and the number of dimensions of a is less than 2.

Warns:

ConstantInputWarning: Raised if an input is a constant array. The correlation coefficient is not defined in this case, so np.nan is returned.

See also

Spearman correlation coefficient: Extended example

Notes

Array API Standard Support

spearmanr has experimental support for Python Array API Standard compatible backends in addition to NumPy. Please consider testing these features by setting an environment variable SCIPY_ARRAY_API=1 and providing CuPy, PyTorch, JAX, or Dask arrays as array arguments. The following combinations of backend and device (or other capability) are supported.

Library	CPU	GPU
NumPy	✅	n/a
CuPy	n/a	⛔
PyTorch	⛔	⛔
JAX	⛔	⛔
Dask	⛔	n/a

See Support for the array API standard for more information.

References

[1]

Zwillinger, D. and Kokoska, S. (2000). CRC Standard Probability and Statistics Tables and Formulae. Chapman & Hall: New York. 2000. Section 14.7

[2]

Kendall, M. G. and Stuart, A. (1973). The Advanced Theory of Statistics, Volume 2: Inference and Relationship. Griffin. 1973. Section 31.18

Examples

>>> import numpy as np
>>> from scipy import stats
>>> res = stats.spearmanr([1, 2, 3, 4, 5], [5, 6, 7, 8, 7])
>>> res.statistic
0.8207826816681233
>>> res.pvalue
0.08858700531354381

>>> rng = np.random.default_rng()
>>> x2n = rng.standard_normal((100, 2))
>>> y2n = rng.standard_normal((100, 2))
>>> res = stats.spearmanr(x2n)
>>> res.statistic, res.pvalue
(-0.07960396039603959, 0.4311168705769747)

>>> res = stats.spearmanr(x2n[:, 0], x2n[:, 1])
>>> res.statistic, res.pvalue
(-0.07960396039603959, 0.4311168705769747)

>>> res = stats.spearmanr(x2n, y2n)
>>> res.statistic
array([[ 1. , -0.07960396, -0.08314431, 0.09662166],
       [-0.07960396, 1. , -0.14448245, 0.16738074],
       [-0.08314431, -0.14448245, 1. , 0.03234323],
       [ 0.09662166, 0.16738074, 0.03234323, 1. ]])
>>> res.pvalue
array([[0. , 0.43111687, 0.41084066, 0.33891628],
       [0.43111687, 0. , 0.15151618, 0.09600687],
       [0.41084066, 0.15151618, 0. , 0.74938561],
       [0.33891628, 0.09600687, 0.74938561, 0. ]])

>>> res = stats.spearmanr(x2n.T, y2n.T, axis=1)
>>> res.statistic
array([[ 1. , -0.07960396, -0.08314431, 0.09662166],
       [-0.07960396, 1. , -0.14448245, 0.16738074],
       [-0.08314431, -0.14448245, 1. , 0.03234323],
       [ 0.09662166, 0.16738074, 0.03234323, 1. ]])

>>> res = stats.spearmanr(x2n, y2n, axis=None)
>>> res.statistic, res.pvalue
(0.044981624540613524, 0.5270803651336189)

>>> res = stats.spearmanr(x2n.ravel(), y2n.ravel())
>>> res.statistic, res.pvalue
(0.044981624540613524, 0.5270803651336189)

>>> rng = np.random.default_rng()
>>> xint = rng.integers(10, size=(100, 2))
>>> res = stats.spearmanr(xint)
>>> res.statistic, res.pvalue
(0.09800224850707953, 0.3320271757932076)

For small samples, consider performing a permutation test instead of relying on the asymptotic p-value. Note that to calculate the null distribution of the statistic (for all possibly pairings between observations in sample x and y), only one of the two inputs needs to be permuted.

>>> x = [1.76405235, 0.40015721, 0.97873798,
... 2.2408932, 1.86755799, -0.97727788]
>>> y = [2.71414076, 0.2488, 0.87551913,
... 2.6514917, 2.01160156, 0.47699563]

>>> def statistic(x): # permute only `x`
...     return stats.spearmanr(x, y).statistic
>>> res_exact = stats.permutation_test((x,), statistic,
...     permutation_type='pairings')
>>> res_asymptotic = stats.spearmanr(x, y)
>>> res_exact.pvalue, res_asymptotic.pvalue # asymptotic pvalue is too low
(0.10277777777777777, 0.07239650145772594)

For a more detailed example, see Spearman correlation coefficient.