Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] UMAP Python API to accept kNN graph directly #1733

Closed
cjnolet opened this issue Feb 22, 2020 · 0 comments
Closed

[FEA] UMAP Python API to accept kNN graph directly #1733

cjnolet opened this issue Feb 22, 2020 · 0 comments
Labels
Algorithm API Change For tracking changes to algorithms that might effect the API Cython / Python Cython or Python issue feature request New feature or request

Comments

@cjnolet
Copy link
Member

cjnolet commented Feb 22, 2020

Both t-SNE and UMAP compute KNN indices and distances in the first stage of their computation and it is the kNN stage that needs to support different distance measures and metrics.

There are several reasons why being able to accept the kNN distances and indices is beneficial:

  1. It allows the user to compute new distance measures that we may not necessarily support right away.
  2. It gives the user an option of using other kNN algorithms, such as approximate variants
  3. While FAISS is working on integrating most of the distance measures found in scipy to the brute force kNN, which is dense, this allows users to construct a KNN externally based on their own distance metrics, use a CPU-based KNN, or use any other GPU-accelerated KNN.
  4. While we don't currently have a Sparse kNN option in cuML or on RAPIDS, this will enable sparse support in UMAP but giving users the option to use the outputs from libraries like SnapML (https://ibmsoe.github.io/snap-ml-doc/v1.4.0/knndoc.html) which accept sparse inputs.
  5. This allows visualization tools to compute the kNN once for a specific metric and n_neighbors setting, only requiring the embeddings to be re-optimized when other parameters change. This offers a significant performance boost.
  6. For the same reason as fix notebook bugs and add running results #5, this allows the user to compute the kNN once for a specific metric and n_neighbors setting, affording its use in parameter optimization or allowing the user to build many different layouts with the same NearestNeighbors inputs.

There is also a project on pypi called “umap-learn-modified” (https://pypi.org/project/umap-learn-modified/) that we might be able to use for API inspiration: https://github.com/lilab-bcb/umap/blob/master/umap/umap_.py#L1316.

This should mostly require python changes, as the input would just be bypassing the building of the kNN on the c++ side. It’s important that if fit() takes a kNN, the transform() also take a kNN, and assume the same metric was used.

@cjnolet cjnolet added feature request New feature or request ? - Needs Triage Need team to review and classify Cython / Python Cython or Python issue Algorithm API Change For tracking changes to algorithms that might effect the API labels Feb 22, 2020
@cjnolet cjnolet removed the ? - Needs Triage Need team to review and classify label Mar 2, 2020
@cjnolet cjnolet changed the title [FEA] UMAP and t-SNE Python APIs to accept kNN graph directly [FEA] UMAP (and t-SNE) Python APIs to accept kNN graph directly Mar 2, 2020
@cjnolet cjnolet changed the title [FEA] UMAP (and t-SNE) Python APIs to accept kNN graph directly [FEA] UMAP Python API to accept kNN graph directly Mar 2, 2020
@cjnolet cjnolet closed this as completed Mar 17, 2020
rapids-bot bot pushed a commit that referenced this issue Dec 11, 2020
Closes #1780

Adding kNN graph input functionality to t-SNE, a request broken off of the issue #1733. t-SNE gathers kNN indices and distances in the first stage of it's computation, by allowing the user to input their own kNN graph, they can skip this step. This should follow #1815 as closely as possible.

**Benefits of this**:
- allow user custom run of kNN algorithm
- can use different distance function instead of t-SNE euclidean default
- allows for speedup if performing grid search by storing and reusing kNN graph

**Includes:**
- [x] Abstracted `extract_knn_graph` so it can be used for both UMAP and t-SNE
- [x] Implemented kNN graph input to Python/Cython layer and C++/CUDA layer
- [x] C++/CUDA Barnes Hut and Exact t-SNE tests
- [x] Python t-SNE tests
- [x] General code cleanup wherever needed

Authors:
  - Aleksander Ficek <[email protected]>
  - Corey J. Nolet <[email protected]>
  - Ray Douglass <[email protected]>
  - Corey J. Nolet <[email protected]>

Approvers:
  - Corey J. Nolet

URL: #2592
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algorithm API Change For tracking changes to algorithms that might effect the API Cython / Python Cython or Python issue feature request New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant