Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nmslib: efficient cross-platform similarity search library and a toolkit for evaluation #913

Open
1 task
ShellLM opened this issue Aug 22, 2024 · 1 comment
Labels
Algorithms Sorting, Learning or Classifying. All algorithms go here. Git-Repo Source code repository like gitlab or gh github gh tools like cli, Actions, Issues, Pages human-verified <INST>NEVER PICK THIS LABEL</INST> Knowledge-Dataset MachineLearning ML Models, Training and Inference Papers Research papers python Python code, tools, info RAG Retrieval Augmented Generation for LLMs

Comments

@ShellLM
Copy link
Collaborator

ShellLM commented Aug 22, 2024

Non-Metric Space Library (NMSLIB)

Important Notes

  • NMSLIB is generic but fast, see the results of ANN benchmarks.
  • A standalone implementation of our fastest method HNSW also exists as a header-only library.
  • All the documentation (including using Python bindings and the query server, description of methods and spaces, building the library, etc) can be found on this page.
  • For generic questions/inquiries, please, use the Gitter chat: GitHub issues page is for bugs and feature requests.

Objectives

Non-Metric Space Library (NMSLIB) is an efficient cross-platform similarity search library and a toolkit for evaluation of similarity search methods. The core-library does not have any third-party dependencies. It has been gaining popularity recently. In particular, it has become a part of Amazon Elasticsearch Service.

The goal of the project is to create an effective and comprehensive toolkit for searching in generic and non-metric spaces. Even though the library contains a variety of metric-space access methods, our main focus is on generic and approximate search methods, in particular, on methods for non-metric spaces. NMSLIB is possibly the first library with a principled support for non-metric space searching.

NMSLIB is an extendible library, which means that is possible to add new search methods and distance functions. NMSLIB can be used directly in C++ and Python (via Python bindings). In addition, it is also possible to build a query server, which can be used from Java (or other languages supported by Apache Thrift (version 0.12). Java has a native client, i.e., it works on many platforms without requiring a C++ library to be installed.

Authors: Bilegsaikhan Naidan, Leonid Boytsov, Yury Malkov, David Novak. With contributions from Ben Frederickson, Lawrence Cayton, Wei Dong, Avrelin Nikita, Dmitry Yashunin, Bob Poekert, @orgoro, @gregfriedland, Scott Gigante, Maxim Andreev, Daniel Lemire, Nathan Kurz, Alexander Ponomarenko.

Brief History

NMSLIB started as a personal project of Bilegsaikhan Naidan, who created the initial code base, the Python bindings, and participated in earlier evaluations. The most successful class of methods--neighborhood/proximity graphs--is represented by the Hierarchical Navigable Small World Graph (HNSW) due to Malkov and Yashunin (see the publications below). Other most useful methods, include a modification of the VP-tree due to Boytsov and Naidan (2013), a Neighborhood APProximation index (NAPP) proposed by Tellez et al. (2013) and improved by David Novak, as well as a vanilla uncompressed inverted file.

Credits and Citing

If you find this library useful, feel free to cite our SISAP paper [BibTex] as well as other papers listed in the end. One crucial contribution to cite is the fast Hierarchical Navigable World graph (HNSW) method [BibTex]. Please, also check out the stand-alone HNSW implementation by Yury Malkov, which is released as a header-only HNSWLib library.

License

The code is released under the Apache License Version 2.0 http://www.apache.org/licenses/. Older versions of the library include additional components, which have different licenses (but this does not apply to NMLISB 2.x):

Older versions of the library included the following components:

  • The LSHKIT, which is embedded in our library, is distributed under the GNU General Public License, see http://www.gnu.org/licenses/.
  • The k-NN graph construction algorithm NN-Descent due to Dong et al. 2011 (see the links below), which is also embedded in our library, seems to be covered by a free-to-use license, similar to Apache 2.
  • FALCONN library's licence is MIT.

Funding

Leonid Boytsov was supported by the Open Advancement of Question Answering Systems (OAQA) group and the following NSF grant #1618159: "Matching and Ranking via Proximity Graphs: Applications to Question Answering and Beyond". Bileg was supported by the iAd Center.

Related Publications

Most important related papers are listed below in the chronological order:

Suggested labels

None

@ShellLM ShellLM added Algorithms Sorting, Learning or Classifying. All algorithms go here. Git-Repo Source code repository like gitlab or gh github gh tools like cli, Actions, Issues, Pages Papers Research papers python Python code, tools, info labels Aug 22, 2024
@ShellLM
Copy link
Collaborator Author

ShellLM commented Aug 22, 2024

Related content

#749 similarity score: 0.87
#386 similarity score: 0.86
#848 similarity score: 0.86
#868 similarity score: 0.86
#860 similarity score: 0.86
#678 similarity score: 0.86

@irthomasthomas irthomasthomas added the RAG Retrieval Augmented Generation for LLMs label Aug 22, 2024
@irthomasthomas irthomasthomas changed the title nmslib/README.md at master · nmslib/nmslib nmslib: efficient cross-platform similarity search library and a toolkit for evaluation Aug 22, 2024
@irthomasthomas irthomasthomas added MachineLearning ML Models, Training and Inference Knowledge-Dataset human-verified <INST>NEVER PICK THIS LABEL</INST> labels Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algorithms Sorting, Learning or Classifying. All algorithms go here. Git-Repo Source code repository like gitlab or gh github gh tools like cli, Actions, Issues, Pages human-verified <INST>NEVER PICK THIS LABEL</INST> Knowledge-Dataset MachineLearning ML Models, Training and Inference Papers Research papers python Python code, tools, info RAG Retrieval Augmented Generation for LLMs
Projects
None yet
Development

No branches or pull requests

2 participants