scipy.cluster.hierarchy.fclusterdata(X, t, criterion='inconsistent', metric='euclidean', depth=2, method='single', R=None)[source]

Cluster observation data using a given metric.

Clusters the original observations in the n-by-m data matrix X (n observations in m dimensions), using the euclidean distance metric to calculate distances between original observations, performs hierarchical clustering using the single linkage algorithm, and forms flat clusters using the inconsistency method with t as the cut-off threshold.

A one-dimensional array T of length n is returned. T[i] is the index of the flat cluster to which the original observation i belongs.

X : (N, M) ndarray

N by M data matrix with N observations in M dimensions.

t : float

The threshold to apply when forming flat clusters.

criterion : str, optional

Specifies the criterion for forming flat clusters. Valid values are ‘inconsistent’ (default), ‘distance’, or ‘maxclust’ cluster formation algorithms. See fcluster for descriptions.

metric : str, optional

The distance metric for calculating pairwise distances. See distance.pdist for descriptions and linkage to verify compatibility with the linkage method.

depth : int, optional

The maximum depth for the inconsistency calculation. See inconsistent for more information.

method : str, optional

The linkage method to use (single, complete, average, weighted, median centroid, ward). See linkage for more information. Default is “single”.

R : ndarray, optional

The inconsistency matrix. It will be computed if necessary if it is not passed.

fclusterdata : ndarray

A vector of length n. T[i] is the flat cluster number to which original observation i belongs.

See also

pairwise distance metrics


This function is similar to the MATLAB function clusterdata.


>>> from scipy.cluster.hierarchy import fclusterdata

This is a convenience method that abstracts all the steps to perform in a typical Scipy’s hierarchical clustering workflow.

>>> X = [[0, 0], [0, 1], [1, 0],
...      [0, 4], [0, 3], [1, 4],
...      [4, 0], [3, 0], [4, 1],
...      [4, 4], [3, 4], [4, 3]]
>>> fclusterdata(X, t=1)
array([3, 3, 3, 4, 4, 4, 2, 2, 2, 1, 1, 1], dtype=int32)

The output here (for the dataset X, distance threshold t, and the default settings) is four clusters with three data points each.

Previous topic


Next topic