Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch to Euclidean squared distance #61

Merged
merged 1 commit into from
Dec 1, 2023
Merged

Conversation

robfitzgerald
Copy link
Collaborator

while working on #58, i discovered i had the wrong understanding of the API for rstar::RTree. it turns out the distance metric should be the squared euclidean distance.

our points are in lat/lon, which is a spherical 2D plane aka non-Euclidean. i would assume a haversine distance formula would produce more accurate values. however, after making this change, a simple test case produced an invalid result.

this PR changes the edge rtree distance function to match the vertex rtree distance function, using the squared Euclidean (lat/lon) distance between points. we can have further discussion in #58 as to whether these should be calculated directly as this is or via some transform for accuracy.

Copy link
Collaborator

@nreinicke nreinicke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. It is weird that the haversine calculation work in this context..

@robfitzgerald
Copy link
Collaborator Author

It is weird that the haversine calculation work in this context..

no, that's the point, it doesn't. but i figured it would, i did something like this:

let dist = haversine::get_distance_meters(this, that).unwrap_or(Distance::new(f64::MAX)).as_f64();
return dist * dist

because i think of

$dy * dy + dx * dx == dz * dz$ (Pythagoras)

and

$dz ~= haversine(a, b)$

but it didn't. the test case had 3 points, (0,0), (1,1) and (2,2), and matched to query o/d (0.1,0.1) and (1.9,2.1). for some reason, it thought (0,0) was closest to both o and d. even when i swapped the coordinates with real lat/lon coords (the Denver beer run query), it still didn't work, so it wasn't on account of some edge condition in the spherical approximation. so i'm clearly not getting something about this. i mean, ya, the haversine function is on a spherical plane and not strictly Euclidean, but, it's like, approximately so. but enough difference to blow things up? 🤔

@robfitzgerald robfitzgerald merged commit d6c2365 into main Dec 1, 2023
5 checks passed
@robfitzgerald robfitzgerald deleted the rjf/edge-rtree-dist branch December 1, 2023 19:30
@nreinicke
Copy link
Collaborator

nreinicke commented Dec 1, 2023

It is weird that the haversine calculation work in this context..

oops, that should have included the word doesn't, my bad!

But, yeah I also tried your branch with actual lat lon values that were very far from each other and the squared haversine distance failed to match the right point which is suspicious..

I suppose another approach would be to project everything into euclidean space for these types of calculations. I wonder if there are any edge cases where we might not return the true nearest neighbor for euclidean distance using lat/lon?

@robfitzgerald
Copy link
Collaborator Author

robfitzgerald commented Dec 1, 2023

But, yeah I also tried your branch with actual lat lon values that were very far from each other and the squared haversine distance failed to match the right point which is suspicious

something i just thought of, maybe the rtree algorithm needs the distance unit to match the rtree's units. that would explain why using lat/lon degrees and meters may not have worked. it doesn't say that's a requirement; i just figured the pairs were getting dropped in a priority queue somewhere with my distances, keeping the tree neighbor lookup and eventual distance metric separate. but i'm probably wrong about that.

as a fix, we would want to convert incoming coordinates into a metric projection and use those to build the tree. but given our recent issues with the rust proj library, maybe we hold off on that for now.

if there were an angle we could still improve, it is how we match queries to edges. as @jakeholden mentioned, there are probably more accurate ways to compute point-to-edge distances than going from point to the linestring "centroid".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants