You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 31, 2021. It is now read-only.
Currently, the top-level items are selected randomly [1], and are then pruned by removing items that are too similar [2]. This procedure might result in suboptimal top-level items being selected.
One trick for selecting the cluster centroids that has worked exceptionally well for k-means is the k-means++ initialisation [3]. The idea is to start with a randomly chosen first centroid, and then to select following centroids with a probability proportional to the distance to the already selected centroids.
An implementation of this algorithm should be relatively straightforward, with potentially large benefits. Would there be any interesting in adding such an initialisation to PySparNN?
@lsorber - apologies for missing this. if you still have interest in submitting a PR I would happily integrate it. I think @kchaliki has a very good alternative which I plan on getting integrated over the next few days.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Currently, the top-level items are selected randomly [1], and are then pruned by removing items that are too similar [2]. This procedure might result in suboptimal top-level items being selected.
One trick for selecting the cluster centroids that has worked exceptionally well for k-means is the k-means++ initialisation [3]. The idea is to start with a randomly chosen first centroid, and then to select following centroids with a probability proportional to the distance to the already selected centroids.
An implementation of this algorithm should be relatively straightforward, with potentially large benefits. Would there be any interesting in adding such an initialisation to PySparNN?
[1] https://github.com/facebookresearch/pysparnn/blob/master/pysparnn/cluster_index.py#L127
[2] https://github.com/facebookresearch/pysparnn/blob/master/pysparnn/cluster_index.py#L135
[3] https://en.wikipedia.org/wiki/K-means%2B%2B#Improved_initialization_algorithm
The text was updated successfully, but these errors were encountered: