Skip to content
This repository has been archived by the owner on Jun 2, 2022. It is now read-only.

dedupeio/hcluster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hcluster

Tests Passing

This library provides Python functions for hierarchical clustering. Its features include

  • generating hierarchical clusters from distance matrices
  • computing distance matrices from observation vectors
  • computing statistics on clusters
  • cutting linkages to generate flat clusters
  • and visualizing clusters with dendrograms. The interface is very similar to MATLAB's Statistics Toolbox API to make code easier to port from MATLAB to Python/Numpy. The core implementation of this library is in C for efficiency.

It is a fork of clustering and distance functions from the scipy that removes all the dependencies on scipy. It preserves the API of hcluster 0.2.

Part of the Dedupe.io cloud service and open source toolset for de-duplicating and finding fuzzy matches in your data.