scipy.spatial.distance.jaccard#

scipy.spatial.distance.jaccard(u, v, w=None)[source]#

Compute the Jaccard-Needham dissimilarity between two boolean 1-D arrays.

The Jaccard-Needham dissimilarity between 1-D boolean arrays u and v, is defined as

$\frac{c_{TF} + c_{FT}} {c_{TT} + c_{FT} + c_{TF}}$

where $$c_{ij}$$ is the number of occurrences of $$\mathtt{u[k]} = i$$ and $$\mathtt{v[k]} = j$$ for $$k < n$$.

Parameters:
u(N,) array_like, bool

Input array.

v(N,) array_like, bool

Input array.

w(N,) array_like, optional

The weights for each value in u and v. Default is None, which gives each value a weight of 1.0

Returns:
jaccarddouble

The Jaccard distance between vectors u and v.

Notes

When both u and v lead to a 0/0 division i.e. there is no overlap between the items in the vectors the returned distance is 0. See the Wikipedia page on the Jaccard index [1], and this paper [2].

Changed in version 1.2.0: Previously, when u and v lead to a 0/0 division, the function would return NaN. This was changed to return 0 instead.

References

[2]

S. Kosub, “A note on the triangle inequality for the Jaccard distance”, 2016, arXiv:1612.02696

Examples

>>> from scipy.spatial import distance
>>> distance.jaccard([1, 0, 0], [0, 1, 0])
1.0
>>> distance.jaccard([1, 0, 0], [1, 1, 0])
0.5
>>> distance.jaccard([1, 0, 0], [1, 2, 0])
0.5
>>> distance.jaccard([1, 0, 0], [1, 1, 1])
0.66666666666666663