Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Omitting results to points themselves #49

Open
kskyten opened this issue Jun 11, 2017 · 8 comments
Open

Omitting results to points themselves #49

kskyten opened this issue Jun 11, 2017 · 8 comments

Comments

@kskyten
Copy link

kskyten commented Jun 11, 2017

I would like to get the k nearest points that are not the querying points when using the same data for building the tree and querying it. Is it possible to do this with the skip function? The skip function seems to take integer arguments corresponding to tree indices, but it is unclear to me what this means.

@KristofferC
Copy link
Owner

Yes it should be documented better (I didn't add that functionality hehe). skip takes one argument which is the index of a point. If the function returns true, the point is skipped. Maybe https://github.com/KristofferC/NearestNeighbors.jl/blob/master/src/tree_ops.jl#L101 can help you see how it is used.

@kskyten
Copy link
Author

kskyten commented Jun 11, 2017

Seems to me like the easiest way would be to pass tree.data[inx] and point to the skip function.

@KristofferC
Copy link
Owner

Yeah, passing the point itself as an argument makes sense. Hmm, not sure it can be done on a backwards compatible way.

@kskyten
Copy link
Author

kskyten commented Jun 11, 2017

How about checking if the distance is zero? Testing equality of the points seems more robust though.

@KristofferC
Copy link
Owner

It would be up to the user to define how equality should be checked by constructing the skip function. The package only need to provide the necessary data to that function.

@kskyten
Copy link
Author

kskyten commented Jun 11, 2017

Adding arguments to the skip function works for omitting the equal points. However, now a wrong number of points are returned and I'm not sure where to fix this.

@kskyten
Copy link
Author

kskyten commented Jul 11, 2017

What I actually needed was the distances to the k-th nearest neighbours. Here's what I ended up with

k = 3
X = rand(3, 10);
tree = KDTree(X)
ind, dist = knn(tree, X, k+1, true)
kdist = [x[end] for x in dist]

I couldn't figure out how to make a consistent API for the skip function though.

@ghyatzo
Copy link

ghyatzo commented Jul 1, 2024

Bumping here, although this could probably be an issue on its own regarding the skip function.
I could not find a way to exclude the current point being evaluated when calling the function over multiple points.
For example

M = # matrix data (column points)
d, n = size(M)
tree = KDTree(M)
j = 10
_, dists = nn(tree, M[:, j], i -> begin println(i); return false end )

will print out all column indexes, which means that invoking the same function for multiple points will make it impossible to know which point is currently being evaluated.

_, dists = nn(tree, M, i -> ???)

and

M = # matrix data (column points)
d, n = size(M)
tree = KDTree(M)
_, dists = nn(tree, M)

in this case dists == zeros(n) since every point recognizes himself as the closest one, not very helpful.
The only workaround I found was

M = # matrix data (column points)
d, n = size(M)
tree = KDTree(M)

dists = zeros(n)
for j in 1:n
  _ , dist = nn(tree, M[:, j], i -> i == j) 
  dists[j] = dist
end

which makes the multiple points API a bit useless, and requires extra allocations/manipulation by the user to get the same results.
So maybe having the skip function pass over also the current index/point being processed when considering multiple points is a good first solution, maybe for 1.0 if it's breaking?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants