Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Approximate Nearest Neighbors algorithms #73

Closed
Datseris opened this issue Sep 11, 2018 · 14 comments
Closed

Approximate Nearest Neighbors algorithms #73

Datseris opened this issue Sep 11, 2018 · 14 comments

Comments

@Datseris
Copy link

Datseris commented Sep 11, 2018

Hello!

I am guessing @KristofferC already knows this, but there is a "new" approach to nearest neighbors called "approximate nearest neigbhors" for the kNN problem. For many applications these approaches are just as good as the "exact" methods. JuliaDynamics is very interested on this!

From the discussions that lead to #69 , I was told that the "state of the art"/"fastest" algorithm (faster than the one proposed in #69), is the one described here: https://arxiv.org/abs/1603.09320 that uses "Hierarchical Navigable Small World" graphs (hnsw). So that if I wanted to implement a faster kNN approach, I should do this one instead of the one in #69 .

In addition it is very interesting to share these benchmarks: https://github.com/erikbern/ann-benchmarks which are in python but they have a lot of different algortithms. This gives a nice perspective.

What is the consensus on having the hnsw method in this repo? Do we agree on that (because of the "approximate")? I think it fits, but a new repo in JuliaDynamics can also be made if putting it here is a problem.

By the way, there is already some code for this method: (C++, header only https://github.com/nmslib/hnsw )

@andyferris I am tagging you as well because you were interested in the discussion in #69 . @JonasIsensee was also mentioned in the proposal of this method.


p.s.: I intend to use some funding I got to get someone contribute a faster kNN method

@KristofferC
Copy link
Owner

Makes sense to have it here in my opinion.

@KristofferC
Copy link
Owner

Would be nice to run that benchmark here as well.

@JonasIsensee
Copy link

They should surely be connected and share as much of the same interface as possible.
However I really like NearestNeighbors.jl not having a binary dependency...
Maybe a separate extension package would be better here?

@KristofferC
Copy link
Owner

Hm, I didn't realize there was any thought of adding any binary dependencies to this repo.

@JonasIsensee
Copy link

Oh, maybe it was just me.
I figured, we would first try a wrapper for the existing C++ implementation.
But if someone wants to reimplement the whole thing...

@KristofferC
Copy link
Owner

A wrapper should definitely be in a separate package. A native version can live here.

@zgornel
Copy link

zgornel commented Oct 20, 2018

I can attest that nmslib is extremely fast - takes on the order of a few milliseconds to perform a search in millions of samples for a dimensionality >100. That's a couple of orders of magnitude faster than this implementation, which is fast only for a small number of dimensions.

@zgornel
Copy link

zgornel commented Oct 20, 2018

Thanks @Datseris for the links awesome :) Reffering to #69, the simplest (and maybe fastest) approach would be to implement a minimalistic ann directly in Julia ... I will look into this in the coming months if time allows.

@dillondaudert
Copy link

I am working on an implementation of the NN Descent algorithm from this paper: https://github.com/dillondaudert/NNDescent.jl . If it's something that people think is appropriate for NearestNeighbors.jl, I can look to merge it into this repo once I have a reasonably efficient implementation done.

@Datseris
Copy link
Author

There seems to be many pieces of work happening in parallel; I think it is more fitting for a JuliaNeighbors to be created (hopefully with a better name, as this JuliaProperty org name is getting beyond boring).

@JonasIsensee
Copy link

@zgornel
I've already invested some time into my own implementation of the hnsw algorithm in Julia.
(Working, but further optimization required )
I plan to put it online within the next few weeks.

@Datseris : This naming convention may be boring but I'd say that they're at least quite telling.

@Datseris
Copy link
Author

booooooooooooooooooooooooooring. Why don't we name it TheHoodOfJulia?

@zgornel
Copy link

zgornel commented Oct 21, 2018

@JonasIsensee great, I can help with testing and further optimization if needed.

@KristofferC
Copy link
Owner

For now I think this package will be kept to exact searches to not bloat its scope too much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants