quantile#
- scipy.stats.quantile(x, p, *, method='linear', axis=0, nan_policy='propagate', keepdims=None)[source]#
Compute the p-th quantile of the data along the specified axis.
- Parameters:
- xarray_like of real numbers
Data array.
- parray_like of float
Probability or sequence of probabilities of the quantiles to compute. Values must be between 0 and 1 (inclusive). Must have length 1 along axis unless
keepdims=True
.- methodstr, default: ‘linear’
The method to use for estimating the quantile. The available options, numbered as they appear in [1], are:
‘inverted_cdf’
‘averaged_inverted_cdf’
‘closest_observation’
‘interpolated_inverted_cdf’
‘hazen’
‘weibull’
‘linear’ (default)
‘median_unbiased’
‘normal_unbiased’
‘harrell-davis’ is also available to compute the quantile estimate according to [2]. See Notes for details.
- axisint or None, default: 0
Axis along which the quantiles are computed.
None
ravels both x and p before performing the calculation, without checking whether the original shapes were compatible.- nan_policystr, default: ‘propagate’
Defines how to handle NaNs in the input data x.
propagate
: if a NaN is present in the axis slice (e.g. row) along which the statistic is computed, the corresponding slice of the output will contain NaN(s).omit
: NaNs will be omitted when performing the calculation. If insufficient data remains in the axis slice along which the statistic is computed, the corresponding slice of the output will contain NaN(s).raise
: if a NaN is present, aValueError
will be raised.
If NaNs are present in p, a
ValueError
will be raised.- keepdimsbool, optional
Consider the case in which x is 1-D and p is a scalar: the quantile is a reducing statistic, and the default behavior is to return a scalar. If keepdims is set to True, the axis will not be reduced away, and the result will be a 1-D array with one element.
The general case is more subtle, since multiple quantiles may be requested for each axis-slice of x. For instance, if both x and p are 1-D and
p.size > 1
, no axis can be reduced away; there must be an axis to contain the number of quantiles given byp.size
. Therefore:By default, the axis will be reduced away if possible (i.e. if there is exactly one element of q per axis-slice of x).
If keepdims is set to True, the axis will not be reduced away.
If keepdims is set to False, the axis will be reduced away if possible, and an error will be raised otherwise.
- Returns:
- quantilescalar or ndarray
The resulting quantile(s). The dtype is the result dtype of x and p.
Notes
Given a sample x from an underlying distribution,
quantile
provides a nonparametric estimate of the inverse cumulative distribution function.By default, this is done by interpolating between adjacent elements in
y
, a sorted copy of x:(1-g)*y[j] + g*y[j+1]
where the index
j
and coefficientg
are the integral and fractional components ofp * (n-1)
, andn
is the number of elements in the sample.This is a special case of Equation 1 of H&F [1]. More generally,
j = (p*n + m - 1) // 1
, andg = (p*n + m - 1) % 1
,
where
m
may be defined according to several different conventions. The preferred convention may be selected using themethod
parameter:method
number in H&F
m
interpolated_inverted_cdf
4
0
hazen
5
1/2
weibull
6
p
linear
(default)7
1 - p
median_unbiased
8
p/3 + 1/3
normal_unbiased
9
p/4 + 3/8
Note that indices
j
andj + 1
are clipped to the range0
ton - 1
when the results of the formula would be outside the allowed range of non-negative indices. The-1
in the formulas forj
andg
accounts for Python’s 0-based indexing.The table above includes only the estimators from [1] that are continuous functions of probability p (estimators 4-9). SciPy also provides the three discontinuous estimators from [1] (estimators 1-3), where
j
is defined as above,m
is defined as follows, andg
is0
whenindex = p*n + m - 1
is less than0
and otherwise is defined below.inverted_cdf
:m = 0
andg = int(index - j > 0)
averaged_inverted_cdf
:m = 0
andg = (1 + int(index - j > 0)) / 2
closest_observation
:m = -1/2
andg = 1 - int((index == j) & (j%2 == 1))
A different strategy for computing quantiles from [2],
method='harrell-davis'
, uses a weighted combination of all elements. The weights are computed as:\[w_{n, i} = I_{i/n}(a, b) - I_{(i - 1)/n}(a, b)\]where \(n\) is the number of elements in the sample, \(i\) are the indices \(1, 2, ..., n-1, n\) of the sorted elements, \(a = p (n + 1)\), \(b = (1 - p)(n + 1)\), \(p\) is the probability of the quantile, and \(I\) is the regularized, lower incomplete beta function (
scipy.special.betainc
).References
Examples
>>> import numpy as np >>> from scipy import stats >>> x = np.asarray([[10, 8, 7, 5, 4], ... [0, 1, 2, 3, 5]])
Take the median along the last axis.
>>> stats.quantile(x, 0.5, axis=-1) array([7., 2.])
Take a different quantile along each axis.
>>> stats.quantile(x, [[0.25], [0.75]], axis=-1, keepdims=True) array([[5.], [3.]])
Take multiple quantiles along each axis.
>>> stats.quantile(x, [0.25, 0.75], axis=-1) array([[5., 8.], [1., 3.]])