GitHub - shitian-ni/Same-Size-K-Means: A k-means variation that produces clusters of the same size utilizing the scikit-learn API and related utilities

Equal Groups K-Means Clustering

This is a k-means variation that produces clusters of the same size utilizing the scikit-learn Kmeans methods and associated utilities.

The same-size k-Means logic is the same as found in the Elki Same-size k-Means Variation tutorial.

https://elki-project.github.io/tutorial/same-size_k_means

Please note that this implementation only works in scikit-learn 17.X as later versions having breaking changes to this implementation. Also sparse matrices are not yet supported.

Usage

Use just like you would utilize the scikit-learn Kmeans class

from clustering.equal_groups import EqualGroupsKMeans

import numpy as np

X = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]])

clf = EqualGroupsKMeans(n_clusters=2)

clf.fit(X)

clf.labels_

Performance

The performance of this implementation is very slow. It is relatively quick if the number observations is less than 500.

Optimizations are readily accepted via pull-requests.

To Dos

More test coverage
Add support for sparse matrices
Package for pypi
Potentially speed up with cython
scikit-learn 18.X implementation

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
clustering		clustering
notebooks		notebooks
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Equal Groups K-Means Clustering

Usage

Performance

To Dos

About

Releases

Packages

Languages

License

shitian-ni/Same-Size-K-Means

Folders and files

Latest commit

History

Repository files navigation

Equal Groups K-Means Clustering

Usage

Performance

To Dos

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages