scipy.stats.wasserstein_distance#
- scipy.stats.wasserstein_distance(u_values, v_values, u_weights=None, v_weights=None)[source]#
Compute the Wasserstein-1 distance between two discrete distributions.
The Wasserstein distance, also called the Earth mover’s distance or the optimal transport distance, is a similarity metric between two probability distributions. In the discrete case, the Wasserstein distance can be understood as the cost of an optimal transport plan to convert one distribution into the other. The cost is calculated as the product of the amount of probability mass being moved and the distance it is being moved. A brief and intuitive introduction can be found at [2].
New in version 1.0.0.
- Parameters:
- u_values1d or 2d array_like
A sample from a probability distribution or the support (set of all possible values) of a probability distribution. Each element along axis 0 is an observation or possible value. If two-dimensional, axis 1 represents the dimensionality of the distribution; i.e., each row is a vector observation or possible value.
- v_values1d or 2d array_like
A sample from or the support of a second distribution.
- u_weights, v_weights1d array_like, optional
Weights or counts corresponding with the sample or probability masses corresponding with the support values. Sum of elements must be positive and finite. If unspecified, each value is assigned the same weight.
- Returns:
- distancefloat
The computed distance between the distributions.
Notes
Given two probability mass functions, \(u\) and \(v\), the first Wasserstein distance between the distributions is:
\[l_1 (u, v) = \inf_{\pi \in \Gamma (u, v)} \int_{\mathbb{R} \times \mathbb{R}} |x-y| \mathrm{d} \pi (x, y)\]where \(\Gamma (u, v)\) is the set of (probability) distributions on \(\mathbb{R} \times \mathbb{R}\) whose marginals are \(u\) and \(v\) on the first and second factors respectively. For a given value \(x\), \(u(x)\) gives the probability of \(u\) at position \(x\), and the same for \(v(x)\).
In the 1-dimensional case, let \(U\) and \(V\) denote the respective CDFs of \(u\) and \(v\), this distance also equals to:
\[l_1(u, v) = \int_{-\infty}^{+\infty} |U-V|\]See [3] for a proof of the equivalence of both definitions.
In the more general (higher dimensional) and discrete case, it is also called the optimal transport problem or the Monge problem. Let the finite point sets \(\{x_i\}\) and \(\{y_j\}\) denote the support set of probability mass function \(u\) and \(v\) respectively. The Monge problem can be expressed as follows,
Let \(\Gamma\) denote the transport plan, \(D\) denote the distance matrix and,
\[\begin{split}x = \text{vec}(\Gamma) \\ c = \text{vec}(D) \\ b = \begin{bmatrix} u\\ v\\ \end{bmatrix}\end{split}\]The \(\text{vec}()\) function denotes the Vectorization function that transforms a matrix into a column vector by vertically stacking the columns of the matrix. The transport plan \(\Gamma\) is a matrix \([\gamma_{ij}]\) in which \(\gamma_{ij}\) is a positive value representing the amount of probability mass transported from \(u(x_i)\) to \(v(y_i)\). Summing over the rows of \(\Gamma\) should give the source distribution \(u\) : \(\sum_j \gamma_{ij} = u(x_i)\) holds for all \(i\) and summing over the columns of \(\Gamma\) should give the target distribution \(v\): \(\sum_i \gamma_{ij} = v(y_j)\) holds for all \(j\). The distance matrix \(D\) is a matrix \([d_{ij}]\), in which \(d_{ij} = d(x_i, y_j)\).
Given \(\Gamma\), \(D\), \(b\), the Monge problem can be transformed into a linear programming problem by taking \(A x = b\) as constraints and \(z = c^T x\) as minimization target (sum of costs) , where matrix \(A\) has the form
\[ \begin{align}\begin{aligned}\begin{array} {rrrr|rrrr|r|rrrr} 1 & 1 & \dots & 1 & 0 & 0 & \dots & 0 & \dots & 0 & 0 & \dots & 0 \cr 0 & 0 & \dots & 0 & 1 & 1 & \dots & 1 & \dots & 0 & 0 &\dots & 0 \cr \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \cr 0 & 0 & \dots & 0 & 0 & 0 & \dots & 0 & \dots & 1 & 1 & \dots & 1 \cr \hline\\ 1 & 0 & \dots & 0 & 1 & 0 & \dots & \dots & \dots & 1 & 0 & \dots & 0 \cr 0 & 1 & \dots & 0 & 0 & 1 & \dots & \dots & \dots & 0 & 1 & \dots & 0 \cr \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \cr 0 & 0 & \dots & 1 & 0 & 0 & \dots & 1 & \dots & 0 & 0 & \dots & 1 \end{array}\end{aligned}\end{align} \]By solving the dual form of the above linear programming problem (with solution \(y^*\)), the Wasserstein distance \(l_1 (u, v)\) can be computed as \(b^T y^*\).
The above solution is inspired by Vincent Herrmann’s blog [5] . For a more thorough explanation, see [4] .
The input distributions can be empirical, therefore coming from samples whose values are effectively inputs of the function, or they can be seen as generalized functions, in which case they are weighted sums of Dirac delta functions located at the specified values.
References
[1]“Wasserstein metric”, https://en.wikipedia.org/wiki/Wasserstein_metric
[2]Lili Weng, “What is Wasserstein distance?”, Lil’log, https://lilianweng.github.io/posts/2017-08-20-gan/#what-is- wasserstein-distance.
[3]Ramdas, Garcia, Cuturi “On Wasserstein Two Sample Testing and Related Families of Nonparametric Tests” (2015). arXiv:1509.02237.
[4]Peyré, Gabriel, and Marco Cuturi. “Computational optimal transport.” Center for Research in Economics and Statistics Working Papers 2017-86 (2017).
[5]Hermann, Vincent. “Wasserstein GAN and the Kantorovich-Rubinstein Duality”. https://vincentherrmann.github.io/blog/wasserstein/.
Examples
>>> from scipy.stats import wasserstein_distance >>> wasserstein_distance([0, 1, 3], [5, 6, 8]) 5.0 >>> wasserstein_distance([0, 1], [0, 1], [3, 1], [2, 2]) 0.25 >>> wasserstein_distance([3.4, 3.9, 7.5, 7.8], [4.5, 1.4], ... [1.4, 0.9, 3.1, 7.2], [3.2, 3.5]) 4.0781331438047861
Compute the Wasserstein distance between two three-dimensional samples, each with two observations.
>>> wasserstein_distance([[0, 2, 3], [1, 2, 5]], [[3, 2, 3], [4, 2, 5]]) 3.0
Compute the Wasserstein distance between two two-dimensional distributions with three and two weighted observations, respectively.
>>> wasserstein_distance([[0, 2.75], [2, 209.3], [0, 0]], ... [[0.2, 0.322], [4.5, 25.1808]], ... [0.4, 5.2, 0.114], [0.8, 1.5]) 174.15840245217169