Return the root nodes in a hierarchical clustering.

Returns the root nodes in a hierarchical clustering corresponding to a cut defined by a flat cluster assignment vector T. See the fcluster function for more information on the format of T.

For each flat cluster $$j$$ of the $$k$$ flat clusters represented in the n-sized flat cluster assignment vector T, this function finds the lowest cluster node $$i$$ in the linkage tree Z, such that:

• leaf descendants belong only to flat cluster j (i.e., T[p]==j for all $$p$$ in $$S(i)$$, where $$S(i)$$ is the set of leaf ids of descendant leaf nodes with cluster node $$i$$)

• there does not exist a leaf that is not a descendant with $$i$$ that also belongs to cluster $$j$$ (i.e., T[q]!=j for all $$q$$ not in $$S(i)$$). If this condition is violated, T is not a valid cluster assignment vector, and an exception will be thrown.

Parameters
Zndarray

The hierarchical clustering encoded as a matrix. See linkage for more information.

Tndarray

The flat cluster assignment vector.

Returns
Lndarray

The leader linkage node id’s stored as a k-element 1-D array, where k is the number of flat clusters found in T.

L[j]=i is the linkage cluster node id that is the leader of flat cluster with id M[j]. If i < n, i corresponds to an original observation, otherwise it corresponds to a non-singleton cluster.

Mndarray

The leader linkage node id’s stored as a k-element 1-D array, where k is the number of flat clusters found in T. This allows the set of flat cluster ids to be any arbitrary set of k integers.

For example: if L[3]=2 and M[3]=8, the flat cluster with id 8’s leader is linkage node 2.

fcluster

for the creation of flat cluster assignments.

Examples

>>> from scipy.cluster.hierarchy import ward, fcluster, leaders
>>> from scipy.spatial.distance import pdist


Given a linkage matrix Z - obtained after apply a clustering method to a dataset X - and a flat cluster assignment array T:

>>> X = [[0, 0], [0, 1], [1, 0],
...      [0, 4], [0, 3], [1, 4],
...      [4, 0], [3, 0], [4, 1],
...      [4, 4], [3, 4], [4, 3]]

>>> Z = ward(pdist(X))
>>> Z
array([[ 0.        ,  1.        ,  1.        ,  2.        ],
[ 3.        ,  4.        ,  1.        ,  2.        ],
[ 6.        ,  7.        ,  1.        ,  2.        ],
[ 9.        , 10.        ,  1.        ,  2.        ],
[ 2.        , 12.        ,  1.29099445,  3.        ],
[ 5.        , 13.        ,  1.29099445,  3.        ],
[ 8.        , 14.        ,  1.29099445,  3.        ],
[11.        , 15.        ,  1.29099445,  3.        ],
[16.        , 17.        ,  5.77350269,  6.        ],
[18.        , 19.        ,  5.77350269,  6.        ],
[20.        , 21.        ,  8.16496581, 12.        ]])

>>> T = fcluster(Z, 3, criterion='distance')
>>> T
array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4], dtype=int32)


scipy.cluster.hierarchy.leaders returns the indices of the nodes in the dendrogram that are the leaders of each flat cluster:

>>> L, M = leaders(Z, T)
>>> L
array([16, 17, 18, 19], dtype=int32)


(remember that indices 0-11 point to the 12 data points in X, whereas indices 12-22 point to the 11 rows of Z)

scipy.cluster.hierarchy.leaders also returns the indices of the flat clusters in T:

>>> M
array([1, 2, 3, 4], dtype=int32)