SciPy

Hierarchical clustering (scipy.cluster.hierarchy)

These functions cut hierarchical clusterings into flat clusterings or find the roots of the forest formed by a cut by providing the flat cluster ids of each observation.

fcluster(Z, t[, criterion, depth, R, monocrit]) Form flat clusters from the hierarchical clustering defined by the given linkage matrix.
fclusterdata(X, t[, criterion, metric, ...]) Cluster observation data using a given metric.
leaders(Z, T) Return the root nodes in a hierarchical clustering.

These are routines for agglomerative clustering.

linkage(y[, method, metric]) Perform hierarchical/agglomerative clustering.
single(y) Perform single/min/nearest linkage on the condensed distance matrix y.
complete(y) Perform complete/max/farthest point linkage on a condensed distance matrix.
average(y) Perform average/UPGMA linkage on a condensed distance matrix.
weighted(y) Perform weighted/WPGMA linkage on the condensed distance matrix.
centroid(y) Perform centroid/UPGMC linkage.
median(y) Perform median/WPGMC linkage.
ward(y) Perform Ward’s linkage on a condensed distance matrix.

These routines compute statistics on hierarchies.

cophenet(Z[, Y]) Calculate the cophenetic distances between each observation in the hierarchical clustering defined by the linkage Z.
from_mlab_linkage(Z) Convert a linkage matrix generated by MATLAB(TM) to a new linkage matrix compatible with this module.
inconsistent(Z[, d]) Calculate inconsistency statistics on a linkage matrix.
maxinconsts(Z, R) Return the maximum inconsistency coefficient for each non-singleton cluster and its descendents.
maxdists(Z) Return the maximum distance between any non-singleton cluster.
maxRstat(Z, R, i) Return the maximum statistic for each non-singleton cluster and its descendents.
to_mlab_linkage(Z) Convert a linkage matrix to a MATLAB(TM) compatible one.

Routines for visualizing flat clusters.

dendrogram(Z[, p, truncate_mode, ...]) Plot the hierarchical clustering as a dendrogram.

These are data structures and routines for representing hierarchies as tree objects.

ClusterNode(id[, left, right, dist, count]) A tree node class for representing a cluster.
leaves_list(Z) Return a list of leaf node ids.
to_tree(Z[, rd]) Convert a linkage matrix into an easy-to-use tree object.
cut_tree(Z[, n_clusters, height]) Given a linkage matrix Z, return the cut tree.

These are predicates for checking the validity of linkage and inconsistency matrices as well as for checking isomorphism of two flat cluster assignments.

is_valid_im(R[, warning, throw, name]) Return True if the inconsistency matrix passed is valid.
is_valid_linkage(Z[, warning, throw, name]) Check the validity of a linkage matrix.
is_isomorphic(T1, T2) Determine if two different cluster assignments are equivalent.
is_monotonic(Z) Return True if the linkage passed is monotonic.
correspond(Z, Y) Check for correspondence between linkage and condensed distance matrices.
num_obs_linkage(Z) Return the number of original observations of the linkage matrix passed.

Utility routines for plotting:

set_link_color_palette(palette) Set list of matplotlib color codes for use by dendrogram.

References

[R1]“Statistics toolbox.” API Reference Documentation. The MathWorks. http://www.mathworks.com/access/helpdesk/help/toolbox/stats/. Accessed October 1, 2007.
[R2]“Hierarchical clustering.” API Reference Documentation. The Wolfram Research, Inc. http://reference.wolfram.com/mathematica/HierarchicalClustering/tutorial/ HierarchicalClustering.html. Accessed October 1, 2007.
[R3]Gower, JC and Ross, GJS. “Minimum Spanning Trees and Single Linkage Cluster Analysis.” Applied Statistics. 18(1): pp. 54–64. 1969.
[R4]Ward Jr, JH. “Hierarchical grouping to optimize an objective function.” Journal of the American Statistical Association. 58(301): pp. 236–44. 1963.
[R5]Johnson, SC. “Hierarchical clustering schemes.” Psychometrika. 32(2): pp. 241–54. 1966.
[R6]Sneath, PH and Sokal, RR. “Numerical taxonomy.” Nature. 193: pp. 855–60. 1962.
[R7]Batagelj, V. “Comparing resemblance measures.” Journal of Classification. 12: pp. 73–90. 1995.
[R8]Sokal, RR and Michener, CD. “A statistical method for evaluating systematic relationships.” Scientific Bulletins. 38(22): pp. 1409–38. 1958.
[R9]Edelbrock, C. “Mixture model tests of hierarchical clustering algorithms: the problem of classifying everybody.” Multivariate Behavioral Research. 14: pp. 367–84. 1979.
[10]Jain, A., and Dubes, R., “Algorithms for Clustering Data.” Prentice-Hall. Englewood Cliffs, NJ. 1988.
[11]Fisher, RA “The use of multiple measurements in taxonomic problems.” Annals of Eugenics, 7(2): 179-188. 1936
  • MATLAB and MathWorks are registered trademarks of The MathWorks, Inc.
  • Mathematica is a registered trademark of The Wolfram Research, Inc.