scipy.sparse.csgraph.

min_weight_full_bipartite_matching#

scipy.sparse.csgraph.min_weight_full_bipartite_matching(biadjacency, maximize=False)#

Returns the minimum weight full matching of a bipartite graph.

Added in version 1.6.0.

Parameters:

biadjacencysparse array or matrix: Biadjacency matrix of the bipartite graph: A sparse array in CSR, CSC, or COO format whose rows represent one partition of the graph and whose columns represent the other partition. An edge between two vertices is indicated by the corresponding entry in the matrix, and the weight of the edge is given by the value of that entry. This should not be confused with the full adjacency matrix of the graph, as we only need the submatrix defining the bipartite structure.
maximizebool (default: False): Calculates a maximum weight matching if true.

Returns:

row_ind, col_indarray: An array of row indices and one of corresponding column indices giving the optimal matching. The total weight of the matching can be computed as graph[row_ind, col_ind].sum(). The row indices will be sorted; in the case of a square matrix they will be equal to numpy.arange(graph.shape[0]).

Notes

Let \(G = ((U, V), E)\) be a weighted bipartite graph with non-zero weights \(w : E \to \mathbb{R} \setminus \{0\}\). This function then produces a matching \(M \subseteq E\) with cardinality

\[\lvert M \rvert = \min(\lvert U \rvert, \lvert V \rvert),\]

which minimizes the sum of the weights of the edges included in the matching, \(\sum_{e \in M} w(e)\), or raises an error if no such matching exists.

When \(\lvert U \rvert = \lvert V \rvert\), this is commonly referred to as a perfect matching; here, since we allow \(\lvert U \rvert\) and \(\lvert V \rvert\) to differ, we follow Karp [1] and refer to the matching as full.

This function implements the LAPJVsp algorithm [2], short for “Linear assignment problem, Jonker–Volgenant, sparse”.

The problem it solves is equivalent to the rectangular linear assignment problem. [3] As such, this function can be used to solve the same problems as scipy.optimize.linear_sum_assignment. That function may perform better when the input is dense, or for certain particular types of inputs, such as those for which the \((i, j)\)’th entry is the distance between two points in Euclidean space.

If no full matching exists, this function raises a ValueError. For determining the size of the largest matching in the graph, see maximum_bipartite_matching.

We require that weights are non-zero only to avoid issues with the handling of explicit zeros when converting between different sparse representations. Zero weights can be handled by adding a constant to all weights, so that the resulting matrix contains no zeros.

If multiple valid solutions are possible, output may vary with SciPy and Python version.

References

[1]

Richard Manning Karp: An algorithm to Solve the m x n Assignment Problem in Expected Time O(mn log n). Networks, 10(2):143-152, 1980.

[2]

Roy Jonker and Anton Volgenant: A Shortest Augmenting Path Algorithm for Dense and Sparse Linear Assignment Problems. Computing 38:325-340, 1987.

[3]

https://en.wikipedia.org/wiki/Assignment_problem

Examples

>>> from scipy.sparse import csr_array
>>> from scipy.sparse.csgraph import min_weight_full_bipartite_matching

Let us first consider an example in which all weights are equal:

>>> biadjacency = csr_array([[1, 1, 1], [1, 0, 0], [0, 1, 0]])

Here, all we get is a perfect matching of the graph:

>>> print(min_weight_full_bipartite_matching(biadjacency)[1])
[2 0 1]

That is, the first, second, and third rows are matched with the third, first, and second column respectively. Note that in this example, the 0 in the input matrix does not correspond to an edge with weight 0, but rather a pair of vertices not paired by an edge.

Note also that in this case, the output matches the result of applying maximum_bipartite_matching:

>>> from scipy.sparse.csgraph import maximum_bipartite_matching
>>> biadjacency = csr_array([[1, 1, 1], [1, 0, 0], [0, 1, 0]])
>>> print(maximum_bipartite_matching(biadjacency, perm_type='column'))
[2 0 1]

When multiple edges are available, the ones with lowest weights are preferred:

>>> biadjacency = csr_array([[3, 3, 6], [4, 3, 5], [10, 1, 8]])
>>> row_ind, col_ind = min_weight_full_bipartite_matching(biadjacency)
>>> print(col_ind)
[0 2 1]

The total weight in this case is \(3 + 5 + 1 = 9\):

>>> print(biadjacency[row_ind, col_ind].sum())
9

When the matrix is not square, i.e. when the two partitions have different cardinalities, the matching is as large as the smaller of the two partitions:

>>> biadjacency = csr_array([[0, 1, 1], [0, 2, 3]])
>>> row_ind, col_ind = min_weight_full_bipartite_matching(biadjacency)
>>> print(row_ind, col_ind)
[0 1] [2 1]
>>> biadjacency = csr_array([[0, 1], [3, 1], [1, 4]])
>>> row_ind, col_ind = min_weight_full_bipartite_matching(biadjacency)
>>> print(row_ind, col_ind)
[0 2] [1 0]

When one or both of the partitions are empty, the matching is empty as well:

>>> biadjacency = csr_array((2, 0))
>>> row_ind, col_ind = min_weight_full_bipartite_matching(biadjacency)
>>> print(row_ind, col_ind)
[] []

In general, we will always reach the same sum of weights as if we had used scipy.optimize.linear_sum_assignment but note that for that one, missing edges are represented by a array entry of float('inf'). Let us generate a random sparse array with integer entries between 1 and 10:

>>> import numpy as np
>>> from scipy.sparse import random_array
>>> from scipy.optimize import linear_sum_assignment
>>> sparse = random_array((10, 10), rng=42, density=.5, format='coo') * 10
>>> sparse.data = np.ceil(sparse.data)
>>> dense = sparse.toarray()
>>> dense = np.full(sparse.shape, np.inf)
>>> dense[sparse.row, sparse.col] = sparse.data
>>> sparse = sparse.tocsr()
>>> row_ind, col_ind = linear_sum_assignment(dense)
>>> print(dense[row_ind, col_ind].sum())
25.0
>>> row_ind, col_ind = min_weight_full_bipartite_matching(sparse)
>>> print(sparse[row_ind, col_ind].sum())
25.0