scipy.stats.spearmanr¶

scipy.stats.
spearmanr
(a, b=None, axis=0, nan_policy='propagate')[source]¶ Calculate a Spearman rankorder correlation coefficient and the pvalue to test for noncorrelation.
The Spearman correlation is a nonparametric measure of the monotonicity of the relationship between two datasets. Unlike the Pearson correlation, the Spearman correlation does not assume that both datasets are normally distributed. Like other correlation coefficients, this one varies between 1 and +1 with 0 implying no correlation. Correlations of 1 or +1 imply an exact monotonic relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases.
The pvalue roughly indicates the probability of an uncorrelated system producing datasets that have a Spearman correlation at least as extreme as the one computed from these datasets. The pvalues are not entirely reliable but are probably reasonable for datasets larger than 500 or so.
Parameters: a, b : 1D or 2D array_like, b is optional
One or two 1D or 2D arrays containing multiple variables and observations. When these are 1D, each represents a vector of observations of a single variable. For the behavior in the 2D case, see under
axis
, below. Both arrays need to have the same length in theaxis
dimension.axis : int or None, optional
If axis=0 (default), then each column represents a variable, with observations in the rows. If axis=1, the relationship is transposed: each row represents a variable, while the columns contain observations. If axis=None, then both arrays will be raveled.
nan_policy : {‘propagate’, ‘raise’, ‘omit’}, optional
Defines how to handle when input contains nan. ‘propagate’ returns nan, ‘raise’ throws an error, ‘omit’ performs the calculations ignoring nan values. Default is ‘propagate’.
Returns: correlation : float or ndarray (2D square)
Spearman correlation matrix or correlation coefficient (if only 2 variables are given as parameters. Correlation matrix is square with length equal to total number of variables (columns or rows) in a and b combined.
pvalue : float
The twosided pvalue for a hypothesis test whose null hypothesis is that two sets of data are uncorrelated, has same dimension as rho.
Notes
Changes in scipy 0.8.0: rewrite to add tiehandling, and axis.
References
[R667] Zwillinger, D. and Kokoska, S. (2000). CRC Standard Probability and Statistics Tables and Formulae. Chapman & Hall: New York. 2000. Section 14.7 Examples
>>> from scipy import stats >>> stats.spearmanr([1,2,3,4,5], [5,6,7,8,7]) (0.82078268166812329, 0.088587005313543798) >>> np.random.seed(1234321) >>> x2n = np.random.randn(100, 2) >>> y2n = np.random.randn(100, 2) >>> stats.spearmanr(x2n) (0.059969996999699973, 0.55338590803773591) >>> stats.spearmanr(x2n[:,0], x2n[:,1]) (0.059969996999699973, 0.55338590803773591) >>> rho, pval = stats.spearmanr(x2n, y2n) >>> rho array([[ 1. , 0.05997 , 0.18569457, 0.06258626], [ 0.05997 , 1. , 0.110003 , 0.02534653], [ 0.18569457, 0.110003 , 1. , 0.03488749], [ 0.06258626, 0.02534653, 0.03488749, 1. ]]) >>> pval array([[ 0. , 0.55338591, 0.06435364, 0.53617935], [ 0.55338591, 0. , 0.27592895, 0.80234077], [ 0.06435364, 0.27592895, 0. , 0.73039992], [ 0.53617935, 0.80234077, 0.73039992, 0. ]]) >>> rho, pval = stats.spearmanr(x2n.T, y2n.T, axis=1) >>> rho array([[ 1. , 0.05997 , 0.18569457, 0.06258626], [ 0.05997 , 1. , 0.110003 , 0.02534653], [ 0.18569457, 0.110003 , 1. , 0.03488749], [ 0.06258626, 0.02534653, 0.03488749, 1. ]]) >>> stats.spearmanr(x2n, y2n, axis=None) (0.10816770419260482, 0.1273562188027364) >>> stats.spearmanr(x2n.ravel(), y2n.ravel()) (0.10816770419260482, 0.1273562188027364)
>>> xint = np.random.randint(10, size=(100, 2)) >>> stats.spearmanr(xint) (0.052760927029710199, 0.60213045837062351)