Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reach-based pruning for bidirectional astar #3257

Merged
merged 10 commits into from
Aug 12, 2021
Merged

Conversation

genadz
Copy link
Contributor

@genadz genadz commented Aug 10, 2021

Issue

closes #2928 (incorrect stop criterion in the bidirectional astar algorithm).

There are following changes were made.

  • Fixed cost threshold in bidirectional astar: use full route cost instead of sortcost.
  • Implemented reach-based pruning technique described in the article from the comment Interrupt bidirectional search as soon as we found all alternates #2884 (comment) . The main idea is to estimate lower bound cost for the path that goes through the current edge using min sortcost in the queue of the opposing search.
  • Added logic that exhausts hierarchy limits more or less simultaneously for both directions (if we reached the limit in the forward/reverse direction - wait until the opposing direction also will expand this hierarchy level). The main intention of this logic is to provide (almost) valid conditions for the reach-based pruning.
  • "Synchronize" shortcuts usage. Apply the same behavior for the same shortcut in both directions, i.e., if the shortcut was skipped by the forward/reverse search - it will be also skipped by the opposing search. And, if the shortcut was used by the forward/reverse search - it will be also used by the opposing search. This logic also helps to avoid false positive prunings.

Tasklist

  • Add tests
  • Add #fixes with the issue number that this PR addresses
  • Update the docs with any new request parameters or changes to behavior described
  • Update the changelog
  • If you made changes to the lua files, update the taginfo too.

Requirements / Relations

Link any requirements here. Other pull requests this PR is based on?


graph_tile_ptr t2 = nullptr;
baldr::GraphId opp_edge_id;
const auto get_opp_edge_data = [&t2, &opp_edge_id, &graphreader, &meta, &tile]() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we call this logic in several places, so moved out this logic into a separate lambda function

// encountered on the opposing search we should do the same now: skip or traverse.
if ((opp_edge_set != EdgeSet::kSkipped &&
hierarchy_limits[meta.edge_id.level() + 1].StopExpanding(pred.distance())) ||
opp_edge_set == EdgeSet::kPermanent || opp_edge_set == EdgeSet::kTemporary) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

choose to skip or use the shortcut based on the opposing search results

const auto opp_status = edgestatus_reverse_.Get(fwd_pred.opp_edgeid());
if (opp_status.set() == EdgeSet::kPermanent ||
(opp_status.set() == EdgeSet::kTemporary &&
edgelabels_reverse_[opp_status.index()].predecessor() == kInvalidLabel)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can pass through the destination edge (in case we have several snapping edges) and prune all branches earlier than pull out this edge from the queue. In order to prevent loosing such paths , we can add this connection even if the opposing edge marked as "temporary"

for (size_t level = TileHierarchy::levels().size() - 1; level > 0; --level) {
if (hierarchy_limits_reverse_[level].StopExpanding(rev_pred.distance()) &&
!hierarchy_limits_forward_[level].StopExpanding(fwd_pred.distance())) {
force_forward = true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

force forward pass in case the reverse search exhausted hierarchy limits for this level

break;
} else if (hierarchy_limits_forward_[level].StopExpanding(fwd_pred.distance()) &&
!hierarchy_limits_reverse_[level].StopExpanding(rev_pred.distance())) {
force_reverse = true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

force reverse pass in case the forward search exhausted hierarchy limits for this level

// Estimate lower bound cost for the shortest path that goes through the current edge.
float route_lower_bound =
fwd_pred_pred.cost().cost + fwd_pred.transition_cost().cost + rev_pred.sortcost() -
astarheuristic_reverse_.Get(pred_tile->get_node_ll(fwd_pred_pred.endnode()));
Copy link
Contributor Author

@genadz genadz Aug 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Estimate lower bound cost for the route that goes through the current edge. The technique from the article was adopted to the edges (in the article used vertices). Let me describe the formula for vertices. Let's suppose the current vertex we pulled out of the forward queue is v (v' - a vertex with min sortcost in the reverse queue), so the are following equations are true:

cost_f(v) + sortcost_r(v') - h_r(v) <= cost_f(v) + sortcost_r(v) - h_r(v) =
= cost_f(v) + cost_r(v) + h_r(v) - h_r(v) =
= cost_f(v)  + cost_r(v) = cost(shortest path through v)

where
cost, sortcost, h - costing function, sortcost, heuristic function;
*_f, *_r - indicates forward and reverse directions respectively.

these equations say that cost_f(v) + sortcost_r(v') - h_r(v) is a lower bound cost for the route through v. Using this logic we can replace vertices by edges and get approximately the same formula for our case.

if (desired_paths_count_ == 1) {
cost_threshold_ = sortcost + kThresholdDelta;
cost_threshold_ = c + kThresholdDelta;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct cost threshold. it guarantees that the optimal route will not be missed

Copy link
Contributor Author

@genadz genadz Aug 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dnesbitt61 looks like we can skip kThresholdDelta in case of ignore_hierarchy_limits=true (no hierarchy prunings, no shortcuts). But it should be tested on some big bicycle/pedestrian dataset that I don't have right now. It can be done in a separate PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, on my local planet I didn't notice any difference on pedestrian_roundabout_routes.txt, bicycle_routes.txt, and pedestrian_routes.txt from test_requests folder

@kevinkreiser
Copy link
Member

i have to ask it 😄 whats the performance change, is it similar to the comment you posted before when you first started this approach?

also thank you so much for the extensive description it goes a long way to helping make the changes reviewable. i will take an in depth look at the end of the day

@genadz
Copy link
Contributor Author

genadz commented Aug 10, 2021

Comparing

Shortest path

Performance

passed requests with response_time field: 13631
master

5%	10%	50%	90%	95%	99%	100%	
0.0068	0.0098	0.0625	0.2281	0.4471	1.5088	3.1784	
mean: 0.1261

branch

5%	10%	50%	90%	95%	99%	100%	
0.0065	0.0095	0.0644	0.2288	0.5087	1.7072	3.3404	
mean: 0.1363

branch-with-no-pruning

5%	10%	50%	90%	95%	99%	100%	
0.0070	0.0101	0.0765	0.3554	0.8619	2.1825	3.9858	
mean: 0.1851

I measured timings for the master branch, current branch and current branch with disabled pruning to showcase the influence of pruning technique on performance. As we can see, the average response time increased only by 8% for the branch and by 46% for the branch with disabled pruning.

Routes quality

Number of routes to analyze: 13797

This branch affected 687 (out of 13797) routes, it's about 5%. Final cost of the optimal path was improved in 581 cases and degraded in 106 cases (in terms of ETA, the numbers are the following: 532 <-> 156). So, we can say that in 85% of cases changes were positive.

It's important to notice, that I got only 10 routes (out of 13797) that were different for the branch with pruning and without pruning. So, in 99.9% of cases we can be sure that the pruning technique didn't cut a branch with more optimal route.

The reasonable question may be here: why we actually got a difference ? - after analyzing differences I found out that the main reason is this one #2979 . We can't completely "synchronize" shortcuts usage for bidirectional search, there may be a cases when the shortcut was expanded in forward direction but we came into the middle of the shortcut using reverse direction. To solve this we probably should recover shortcuts while traversing trees, but it's too expensive now.

Alternatives

Performance

passed requests with response_time field: 13631
master

5%	10%	50%	90%	95%	99%	100%	
0.0076	0.0112	0.0862	0.3995	0.7794	1.8486	6.1558	
mean: 0.1826

branch

5%	10%	50%	90%	95%	99%	100%	
0.0072	0.0104	0.0896	0.3531	0.7460	1.8462	4.2131	
mean: 0.1788

We see that the timings in case of alternatives routes search became even a little bit faster.

Alternatives count

branch 1-st alternative count 2-nd alternative count
master 7327 4105
branch 8025 (+10%) 4817 (+17%)

In the table above we can see the difference in the number of alternatives that were found for test routes. Significant improvement in case of the branch.

@genadz
Copy link
Contributor Author

genadz commented Aug 10, 2021

i have to ask it whats the performance change, is it similar to the comment you posted before when you first started this approach?

also thank you so much for the extensive description it goes a long way to helping make the changes reviewable. i will take an in depth look at the end of the day

Thank you! I posted results for the "shortest path" case, also will add results for alternatives.

@dnesbitt61
Copy link
Member

I tried some sort of reach based pruning a couple of years ago and saw some performance gains for bicycle routes but there were too many changes to routes for me to feel comfortable.

I tried this branch for my long, test bicycle route and performance decreased by 21%. I would advocate creating a simpler, bidirectional A* algorithm for use where no hierarchies exist (that is how I create my data for bicycling) or are needed (e.g. bicycle and pedestrian routes).

Comment on lines +875 to +880
// Keep the best ones at the front all others to the back
best_connections_.emplace_back(CandidateConnection{fwd_edge_id, rev_pred.edgeid(), c});

if (c < best_connections_.front().cost)
std::swap(best_connections_.front(), best_connections_.back());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any significance to moving this down here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yes i see it, in the first if second part of the or clause, you want ot check the current best connection against this one

} else {
cost_threshold_ = sortcost + std::max(kAlternativeCostExtend * sortcost, kThresholdDelta);
cost_threshold_ = (1.f + kAlternativeCostExtend) * c + kThresholdDelta;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just set the constant to 1.1 and explain its a multiplier in the comment for it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also i notice the semantic difference of always adding the thresholdDelta rather than using it only if its greater than the sortcost. maybe this is worth a description? i guess its because now we dont double the sortcost we use the actual cost of both sides of the route added together, scale that by 1.1 to do the extension and then we add some fluff (delta) just in case the route is super short and the scaling doesnt do much. anyway a comment to that effect would make this less magic. i know we have some above but i feel like they dont capture the full picture

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, for short routes scale 1.1 is not enough to find alternatives. Ok, will add a comment about that

@@ -617,11 +643,31 @@ BidirectionalAStar::GetBestPath(valhalla::Location& origin,
return {};
}

// Exhaust hierarchy limits simultaneously in both directions. As soon as forward/reverse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dnesbitt61 i am wondering about your comment on performance for routes that dont do hierarchy culling. where do you think that the slow down is coming from? the way this code reads to me is that we are trying to balance the searches, but wouldnt this never happen for bike and ped routes? since stop expanding would never return true? i'm just wondering where the heck the performance drop is coming, maybe its all just the updated heuristic using the actual cost. did you notice if it was more iterations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on my local machine I noticed that iterations increased by ~6%

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can also take into account the fact if we use hierarchies or not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is more iterations on an already long search: about 6% increase on forward direction and 10% on reverse. Bicycle and pedestrian routes need a way to decrease the search space/# of iterations and I was hoping this might since the title has the word pruning!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I already mentioned this PR actually fixes a "bug". If it wasn't, we would discuss iterations decrease now

(opp_status.set() == EdgeSet::kTemporary &&
edgelabels_reverse_[opp_status.index()].predecessor() == kInvalidLabel)) {
if (SetForwardConnection(graphreader, fwd_pred) &&
opp_status.set() == EdgeSet::kPermanent) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am wondering what this added check does its not clear to me why we check for permanent here. is this the part you are refering to when you say

wasn't still pulled out of the queue

Copy link
Contributor Author

@genadz genadz Aug 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this just says that if the opposing edge marked as "temporary" we should keep searching. We can skip this branch only when both forward and reverse searches marked the edge as "permanent"

return false;

const auto& opp_edgestatus = FORWARD ? edgestatus_reverse_ : edgestatus_forward_;
const auto opp_edge_set = opp_edgestatus.Get(opp_edge_id).set();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess this hash lookup isnt 100% free 😄 maybe for modes of transport that dont use shortcuts we can disable this to help bike and ped performance

what do you think @dnesbitt61 and @genadz ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense

@kevinkreiser
Copy link
Member

kevinkreiser commented Aug 10, 2021

honestly that wasnt too bad to review, i had a couple small questions but it made sense to me mostly. the performance numbers for the auto routes look fine to me. the basic gist is that the 99% percentile of an already skewed request set towards very long routes, shows a pretty big performance drop. honestly i dont care too much about p99 but i guess my quesiton is, is @dnesbitt61 's route a p99 route or a p50 route? is bike routing particularly impacted or is it just as bad as auto and its that we are focused on the 99th percentile? i feel like we shouldnt focus on the 99th percentile so much tbh. i mean i understand we want to be performant on very long routes, but if those constitute 1% of the traffic that would come to a running service it doesnt make sense to focus so much effort on making them efficient at the cost of getting worse routes.

another track to take with that is we could relax some of the optimality when we know the route is going to be very long. one could argue that on a very long route optimality of somethign that is already a combination of imperfect measurements isnt that important when we are talking about costs that differ by 1% or something over thousands of miles. i'm not sure if implementing this wouldnt make the code spaghetti but it is a possibility. the problem here is that we cant really just measure performance in the typical way to see what we need to optimize. here the algorithm itself causes more iterations, we're probably already optimized on the cpu cycles that an iteration takes so the only thing we can do is intelligently remove iterations by culling search space.

to avoid useless mainpulation with hiearchies in case of bicycle and pedestrian costs
@genadz
Copy link
Contributor Author

genadz commented Aug 11, 2021

@dnesbitt61

I tried some sort of reach based pruning a couple of years ago and saw some performance gains for bicycle routes but there were too many changes to routes for me to feel comfortable.

That's why I did some stuff about shortcuts "synchronization" and balancing hierarchies during trees expansion. It eliminated false positive prunings.

I tried this branch for my long, test bicycle route and performance decreased by 21%. I would advocate creating a simpler, bidirectional A* algorithm for use where no hierarchies exist (that is how I create my data for bicycling) or are needed (e.g. bicycle and pedestrian routes).

As I mentioned above, on my local machine I noticed increase about 6% in number of iterations. But, the same time this test

./valhalla_run_route --config ../../data/config.json --multi-run 100 -j '{"locations":[{"lat":39.64225,"lon":-76.10536,"type":"break"},{"lat":38.899677,"lon":-77.050916,"type":"break"}],"costing":"bicycle","costing_options":{"bicycle":{"bicycle_type":"Road","use_roads":"0.0","cycling_speed":"25.0","use_hills":"0.5"}}}' | grep GetBestPath

that you proposed some time ago showed greater performance drop. With last changes (using ignore_hierarchy_limits flag) it's about 10%.

I think, taking into account all described here #3257 (comment) is acceptable drop. (we shouldn't forget that this PR fixes actually a bug)

@genadz
Copy link
Contributor Author

genadz commented Aug 11, 2021

@kevinkreiser

another track to take with that is we could relax some of the optimality when we know the route is going to be very long. one could argue that on a very long route optimality of somethign that is already a combination of imperfect measurements isnt that important when we are talking about costs that differ by 1% or something over thousands of miles. i'm not sure if implementing this wouldnt make the code spaghetti but it is a possibility. the problem here is that we cant really just measure performance in the typical way to see what we need to optimize. here the algorithm itself causes more iterations, we're probably already optimized on the cpu cycles that an iteration takes so the only thing we can do is intelligently remove iterations by culling search space.

interesting thoughts. Probably we can do something similar to what alternatives routes search does: after we found a route - limit number of additional iterations. should it be implemented in this PR ?

@genadz
Copy link
Contributor Author

genadz commented Aug 11, 2021

@kevinkreiser

honestly that wasnt too bad to review,

Thank you! I will consider this a compliment)))

@kevinkreiser
Copy link
Member

kevinkreiser commented Aug 11, 2021

I saw you did some of the optimizations where certain costings dont do hierarchy culling, did you mean that neither of them had any effect? thats too bad. from my perspective this is ready to be shipped. @dnesbitt61 do you think we should further investigate ped and bike performance before merging? i think it would be ok to do a follow up where we look into reducing the number of iterations for ped and bike for longer routes specifically where accuracy isnt as important

@genadz
Copy link
Contributor Author

genadz commented Aug 11, 2021

@kevinkreiser

I saw you did some of the optimizations where certain costings dont do hierarchy culling, did you mean that neither of them had any effect?

not really, here #3257 (comment) I mentioned that it's still slower than master by ~10% (on my local machine and planet) using David's test. (before using ignore_hierarchy_limits flag it was about 15%)

./valhalla_run_route --config ../../data/config.json --multi-run 100 -j '{"locations":[{"lat":39.64225,"lon":-76.10536,"type":"break"},{"lat":38.899677,"lon":-77.050916,"type":"break"}],"costing":"bicycle","costing_options":{"bicycle":{"bicycle_type":"Road","use_roads":"0.0","cycling_speed":"25.0","use_hills":"0.5"}}}' | grep GetBestPath

@dnesbitt61 could you check if performance changed after introducing ignore_hierarchy_limits flag in your local environment ?

@kevinkreiser
Copy link
Member

@dnesbitt61 "use_roads": 0.0 ouch!

@dnesbitt61
Copy link
Member

maybe it should be "prefer_roads": 0.0 - meaning the bicycle route should try to stay on cycleways and bicycle lanes (basically a safer route if possible).

@kevinkreiser
Copy link
Member

@dnesbitt61 yep that bit is clear i was just pointing it out because i assume it puts a lot of penalties which would i think make a worst case for expansion (big costing numbers)

Copy link
Member

@kevinkreiser kevinkreiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good to me but i'd like @dnesbitt61 to weigh in on whether we need to do performance optimization for bike and ped now or we can do it in a separate pr

@dnesbitt61
Copy link
Member

I am fine doing performance optimization in another PR. I may use a simplified bidirectional A* for cycling in the meantime, selecting it if the data set does not have hierarchies and shortcuts (as noted in the mjolnir config).

@genadz genadz merged commit 2369460 into master Aug 12, 2021
@purew purew deleted the kgv_yet_another_bidir branch August 23, 2021 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bidirectional astar finds nonoptimal route
3 participants