scipy.spatial.distance.cdist¶

scipy.spatial.distance.
cdist
(XA, XB, metric='euclidean', *args, **kwargs)[source]¶ Compute distance between each pair of the two collections of inputs.
See Notes for common calling conventions.
 Parameters
 XAndarray
An \(m_A\) by \(n\) array of \(m_A\) original observations in an \(n\)dimensional space. Inputs are converted to float type.
 XBndarray
An \(m_B\) by \(n\) array of \(m_B\) original observations in an \(n\)dimensional space. Inputs are converted to float type.
 metricstr or callable, optional
The distance metric to use. If a string, the distance function can be ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘wminkowski’, ‘yule’.
 *argstuple. Deprecated.
Additional arguments should be passed as keyword arguments
 **kwargsdict, optional
Extra arguments to metric: refer to each metric documentation for a list of all possible arguments.
Some possible arguments:
p : scalar The pnorm to apply for Minkowski, weighted and unweighted. Default: 2.
w : ndarray The weight vector for metrics that support weights (e.g., Minkowski).
V : ndarray The variance vector for standardized Euclidean. Default: var(vstack([XA, XB]), axis=0, ddof=1)
VI : ndarray The inverse of the covariance matrix for Mahalanobis. Default: inv(cov(vstack([XA, XB].T))).T
out : ndarray The output array If not None, the distance matrix Y is stored in this array. Note: metric independent, it will become a regular keyword arg in a future scipy version
 Returns
 Yndarray
A \(m_A\) by \(m_B\) distance matrix is returned. For each \(i\) and \(j\), the metric
dist(u=XA[i], v=XB[j])
is computed and stored in the \(ij\) th entry.
 Raises
 ValueError
An exception is thrown if XA and XB do not have the same number of columns.
Notes
The following are common calling conventions:
Y = cdist(XA, XB, 'euclidean')
Computes the distance between \(m\) points using Euclidean distance (2norm) as the distance metric between the points. The points are arranged as \(m\) \(n\)dimensional row vectors in the matrix X.
Y = cdist(XA, XB, 'minkowski', p=2.)
Computes the distances using the Minkowski distance \(uv_p\) (\(p\)norm) where \(p \geq 1\).
Y = cdist(XA, XB, 'cityblock')
Computes the city block or Manhattan distance between the points.
Y = cdist(XA, XB, 'seuclidean', V=None)
Computes the standardized Euclidean distance. The standardized Euclidean distance between two nvectors
u
andv
is\[\sqrt{\sum {(u_iv_i)^2 / V[x_i]}}.\]V is the variance vector; V[i] is the variance computed over all the i’th components of the points. If not passed, it is automatically computed.
Y = cdist(XA, XB, 'sqeuclidean')
Computes the squared Euclidean distance \(uv_2^2\) between the vectors.
Y = cdist(XA, XB, 'cosine')
Computes the cosine distance between vectors u and v,
\[1  \frac{u \cdot v} {{u}_2 {v}_2}\]where \(*_2\) is the 2norm of its argument
*
, and \(u \cdot v\) is the dot product of \(u\) and \(v\).Y = cdist(XA, XB, 'correlation')
Computes the correlation distance between vectors u and v. This is
\[1  \frac{(u  \bar{u}) \cdot (v  \bar{v})} {{(u  \bar{u})}_2 {(v  \bar{v})}_2}\]where \(\bar{v}\) is the mean of the elements of vector v, and \(x \cdot y\) is the dot product of \(x\) and \(y\).
Y = cdist(XA, XB, 'hamming')
Computes the normalized Hamming distance, or the proportion of those vector elements between two nvectors
u
andv
which disagree. To save memory, the matrixX
can be of type boolean.Y = cdist(XA, XB, 'jaccard')
Computes the Jaccard distance between the points. Given two vectors,
u
andv
, the Jaccard distance is the proportion of those elementsu[i]
andv[i]
that disagree where at least one of them is nonzero.Y = cdist(XA, XB, 'chebyshev')
Computes the Chebyshev distance between the points. The Chebyshev distance between two nvectors
u
andv
is the maximum norm1 distance between their respective elements. More precisely, the distance is given by\[d(u,v) = \max_i {u_iv_i}.\]Y = cdist(XA, XB, 'canberra')
Computes the Canberra distance between the points. The Canberra distance between two points
u
andv
is\[d(u,v) = \sum_i \frac{u_iv_i} {u_i+v_i}.\]Y = cdist(XA, XB, 'braycurtis')
Computes the BrayCurtis distance between the points. The BrayCurtis distance between two points
u
andv
is\[d(u,v) = \frac{\sum_i (u_iv_i)} {\sum_i (u_i+v_i)}\]Y = cdist(XA, XB, 'mahalanobis', VI=None)
Computes the Mahalanobis distance between the points. The Mahalanobis distance between two points
u
andv
is \(\sqrt{(uv)(1/V)(uv)^T}\) where \((1/V)\) (theVI
variable) is the inverse covariance. IfVI
is not None,VI
will be used as the inverse covariance matrix.Y = cdist(XA, XB, 'yule')
Computes the Yule distance between the boolean vectors. (see
yule
function documentation)Y = cdist(XA, XB, 'matching')
Synonym for ‘hamming’.
Y = cdist(XA, XB, 'dice')
Computes the Dice distance between the boolean vectors. (see
dice
function documentation)Y = cdist(XA, XB, 'kulsinski')
Computes the Kulsinski distance between the boolean vectors. (see
kulsinski
function documentation)Y = cdist(XA, XB, 'rogerstanimoto')
Computes the RogersTanimoto distance between the boolean vectors. (see
rogerstanimoto
function documentation)Y = cdist(XA, XB, 'russellrao')
Computes the RussellRao distance between the boolean vectors. (see
russellrao
function documentation)Y = cdist(XA, XB, 'sokalmichener')
Computes the SokalMichener distance between the boolean vectors. (see
sokalmichener
function documentation)Y = cdist(XA, XB, 'sokalsneath')
Computes the SokalSneath distance between the vectors. (see
sokalsneath
function documentation)Y = cdist(XA, XB, 'wminkowski', p=2., w=w)
Computes the weighted Minkowski distance between the vectors. (see
wminkowski
function documentation)‘wminkowski’ is deprecated and will be removed in SciPy 1.8.0. Use ‘minkowski’ instead.
Y = cdist(XA, XB, f)
Computes the distance between all pairs of vectors in X using the user supplied 2arity function f. For example, Euclidean distance between the vectors could be computed as follows:
dm = cdist(XA, XB, lambda u, v: np.sqrt(((uv)**2).sum()))
Note that you should avoid passing a reference to one of the distance functions defined in this library. For example,:
dm = cdist(XA, XB, sokalsneath)
would calculate the pairwise distances between the vectors in X using the Python function
sokalsneath
. This would result in sokalsneath being called \({n \choose 2}\) times, which is inefficient. Instead, the optimized C version is more efficient, and we call it using the following syntax:dm = cdist(XA, XB, 'sokalsneath')
Examples
Find the Euclidean distances between four 2D coordinates:
>>> from scipy.spatial import distance >>> coords = [(35.0456, 85.2672), ... (35.1174, 89.9711), ... (35.9728, 83.9422), ... (36.1667, 86.7833)] >>> distance.cdist(coords, coords, 'euclidean') array([[ 0. , 4.7044, 1.6172, 1.8856], [ 4.7044, 0. , 6.0893, 3.3561], [ 1.6172, 6.0893, 0. , 2.8477], [ 1.8856, 3.3561, 2.8477, 0. ]])
Find the Manhattan distance from a 3D point to the corners of the unit cube:
>>> a = np.array([[0, 0, 0], ... [0, 0, 1], ... [0, 1, 0], ... [0, 1, 1], ... [1, 0, 0], ... [1, 0, 1], ... [1, 1, 0], ... [1, 1, 1]]) >>> b = np.array([[ 0.1, 0.2, 0.4]]) >>> distance.cdist(a, b, 'cityblock') array([[ 0.7], [ 0.9], [ 1.3], [ 1.5], [ 1.5], [ 1.7], [ 2.1], [ 2.3]])