Reach-based pruning for bidirectional astar #3257

genadz · 2021-08-10T13:00:09Z

Issue

closes #2928 (incorrect stop criterion in the bidirectional astar algorithm).

There are following changes were made.

Fixed cost threshold in bidirectional astar: use full route cost instead of sortcost.
Implemented reach-based pruning technique described in the article from the comment Interrupt bidirectional search as soon as we found all alternates #2884 (comment) . The main idea is to estimate lower bound cost for the path that goes through the current edge using min sortcost in the queue of the opposing search.
Added logic that exhausts hierarchy limits more or less simultaneously for both directions (if we reached the limit in the forward/reverse direction - wait until the opposing direction also will expand this hierarchy level). The main intention of this logic is to provide (almost) valid conditions for the reach-based pruning.
"Synchronize" shortcuts usage. Apply the same behavior for the same shortcut in both directions, i.e., if the shortcut was skipped by the forward/reverse search - it will be also skipped by the opposing search. And, if the shortcut was used by the forward/reverse search - it will be also used by the opposing search. This logic also helps to avoid false positive prunings.

Tasklist

Add tests
Add #fixes with the issue number that this PR addresses
Update the docs with any new request parameters or changes to behavior described
Update the changelog
If you made changes to the lua files, update the taginfo too.

Requirements / Relations

Link any requirements here. Other pull requests this PR is based on?

genadz · 2021-08-10T13:05:02Z

src/thor/bidirectional_astar.cc

+
+  graph_tile_ptr t2 = nullptr;
+  baldr::GraphId opp_edge_id;
+  const auto get_opp_edge_data = [&t2, &opp_edge_id, &graphreader, &meta, &tile]() {


we call this logic in several places, so moved out this logic into a separate lambda function

genadz · 2021-08-10T13:05:59Z

src/thor/bidirectional_astar.cc

+    // encountered on the opposing search we should do the same now: skip or traverse.
+    if ((opp_edge_set != EdgeSet::kSkipped &&
+         hierarchy_limits[meta.edge_id.level() + 1].StopExpanding(pred.distance())) ||
+        opp_edge_set == EdgeSet::kPermanent || opp_edge_set == EdgeSet::kTemporary) {


choose to skip or use the shortcut based on the opposing search results

genadz · 2021-08-10T13:10:56Z

src/thor/bidirectional_astar.cc

+        const auto opp_status = edgestatus_reverse_.Get(fwd_pred.opp_edgeid());
+        if (opp_status.set() == EdgeSet::kPermanent ||
+            (opp_status.set() == EdgeSet::kTemporary &&
+             edgelabels_reverse_[opp_status.index()].predecessor() == kInvalidLabel)) {


we can pass through the destination edge (in case we have several snapping edges) and prune all branches earlier than pull out this edge from the queue. In order to prevent loosing such paths , we can add this connection even if the opposing edge marked as "temporary"

genadz · 2021-08-10T13:12:10Z

src/thor/bidirectional_astar.cc

+    for (size_t level = TileHierarchy::levels().size() - 1; level > 0; --level) {
+      if (hierarchy_limits_reverse_[level].StopExpanding(rev_pred.distance()) &&
+          !hierarchy_limits_forward_[level].StopExpanding(fwd_pred.distance())) {
+        force_forward = true;


force forward pass in case the reverse search exhausted hierarchy limits for this level

genadz · 2021-08-10T13:12:23Z

src/thor/bidirectional_astar.cc

+        break;
+      } else if (hierarchy_limits_forward_[level].StopExpanding(fwd_pred.distance()) &&
+                 !hierarchy_limits_reverse_[level].StopExpanding(rev_pred.distance())) {
+        force_reverse = true;


force reverse pass in case the forward search exhausted hierarchy limits for this level

genadz · 2021-08-10T13:31:40Z

src/thor/bidirectional_astar.cc

+          // Estimate lower bound cost for the shortest path that goes through the current edge.
+          float route_lower_bound =
+              fwd_pred_pred.cost().cost + fwd_pred.transition_cost().cost + rev_pred.sortcost() -
+              astarheuristic_reverse_.Get(pred_tile->get_node_ll(fwd_pred_pred.endnode()));


Estimate lower bound cost for the route that goes through the current edge. The technique from the article was adopted to the edges (in the article used vertices). Let me describe the formula for vertices. Let's suppose the current vertex we pulled out of the forward queue is v (v' - a vertex with min sortcost in the reverse queue), so the are following equations are true:

cost_f(v) + sortcost_r(v') - h_r(v) <= cost_f(v) + sortcost_r(v) - h_r(v) = = cost_f(v) + cost_r(v) + h_r(v) - h_r(v) = = cost_f(v) + cost_r(v) = cost(shortest path through v) where cost, sortcost, h - costing function, sortcost, heuristic function; *_f, *_r - indicates forward and reverse directions respectively.

these equations say that cost_f(v) + sortcost_r(v') - h_r(v) is a lower bound cost for the route through v. Using this logic we can replace vertices by edges and get approximately the same formula for our case.

genadz · 2021-08-10T13:33:59Z

src/thor/bidirectional_astar.cc

    if (desired_paths_count_ == 1) {
-      cost_threshold_ = sortcost + kThresholdDelta;
+      cost_threshold_ = c + kThresholdDelta;


correct cost threshold. it guarantees that the optimal route will not be missed

@dnesbitt61 looks like we can skip kThresholdDelta in case of ignore_hierarchy_limits=true (no hierarchy prunings, no shortcuts). But it should be tested on some big bicycle/pedestrian dataset that I don't have right now. It can be done in a separate PR

hm, on my local planet I didn't notice any difference on pedestrian_roundabout_routes.txt, bicycle_routes.txt, and pedestrian_routes.txt from test_requests folder

valhalla/thor/edgestatus.h

that wasn't still pulled out of the queue

kevinkreiser · 2021-08-10T14:34:18Z

i have to ask it 😄 whats the performance change, is it similar to the comment you posted before when you first started this approach?

also thank you so much for the extensive description it goes a long way to helping make the changes reviewable. i will take an in depth look at the end of the day

genadz · 2021-08-10T14:34:45Z

Comparing

Shortest path

Performance

passed requests with response_time field: 13631
master

5%	10%	50%	90%	95%	99%	100%	
0.0068	0.0098	0.0625	0.2281	0.4471	1.5088	3.1784	
mean: 0.1261

branch

5%	10%	50%	90%	95%	99%	100%	
0.0065	0.0095	0.0644	0.2288	0.5087	1.7072	3.3404	
mean: 0.1363

branch-with-no-pruning

5%	10%	50%	90%	95%	99%	100%	
0.0070	0.0101	0.0765	0.3554	0.8619	2.1825	3.9858	
mean: 0.1851

I measured timings for the master branch, current branch and current branch with disabled pruning to showcase the influence of pruning technique on performance. As we can see, the average response time increased only by 8% for the branch and by 46% for the branch with disabled pruning.

Routes quality

Number of routes to analyze: 13797

This branch affected 687 (out of 13797) routes, it's about 5%. Final cost of the optimal path was improved in 581 cases and degraded in 106 cases (in terms of ETA, the numbers are the following: 532 <-> 156). So, we can say that in 85% of cases changes were positive.

It's important to notice, that I got only 10 routes (out of 13797) that were different for the branch with pruning and without pruning. So, in 99.9% of cases we can be sure that the pruning technique didn't cut a branch with more optimal route.

The reasonable question may be here: why we actually got a difference ? - after analyzing differences I found out that the main reason is this one #2979 . We can't completely "synchronize" shortcuts usage for bidirectional search, there may be a cases when the shortcut was expanded in forward direction but we came into the middle of the shortcut using reverse direction. To solve this we probably should recover shortcuts while traversing trees, but it's too expensive now.

Alternatives

Performance

passed requests with response_time field: 13631
master

5%	10%	50%	90%	95%	99%	100%	
0.0076	0.0112	0.0862	0.3995	0.7794	1.8486	6.1558	
mean: 0.1826

branch

5%	10%	50%	90%	95%	99%	100%	
0.0072	0.0104	0.0896	0.3531	0.7460	1.8462	4.2131	
mean: 0.1788

We see that the timings in case of alternatives routes search became even a little bit faster.

Alternatives count

branch	1-st alternative count	2-nd alternative count
master	7327	4105
branch	8025 (+10%)	4817 (+17%)

In the table above we can see the difference in the number of alternatives that were found for test routes. Significant improvement in case of the branch.

genadz · 2021-08-10T14:36:39Z

i have to ask it whats the performance change, is it similar to the comment you posted before when you first started this approach?

also thank you so much for the extensive description it goes a long way to helping make the changes reviewable. i will take an in depth look at the end of the day

Thank you! I posted results for the "shortest path" case, also will add results for alternatives.

dnesbitt61 · 2021-08-10T16:28:29Z

I tried some sort of reach based pruning a couple of years ago and saw some performance gains for bicycle routes but there were too many changes to routes for me to feel comfortable.

I tried this branch for my long, test bicycle route and performance decreased by 21%. I would advocate creating a simpler, bidirectional A* algorithm for use where no hierarchies exist (that is how I create my data for bicycling) or are needed (e.g. bicycle and pedestrian routes).

kevinkreiser · 2021-08-10T22:56:45Z

src/thor/bidirectional_astar.cc

+  // Keep the best ones at the front all others to the back
+  best_connections_.emplace_back(CandidateConnection{fwd_edge_id, rev_pred.edgeid(), c});
+
+  if (c < best_connections_.front().cost)
+    std::swap(best_connections_.front(), best_connections_.back());
+


any significance to moving this down here?

oh yes i see it, in the first if second part of the or clause, you want ot check the current best connection against this one

kevinkreiser · 2021-08-10T23:00:01Z

src/thor/bidirectional_astar.cc

    } else {
-      cost_threshold_ = sortcost + std::max(kAlternativeCostExtend * sortcost, kThresholdDelta);
+      cost_threshold_ = (1.f + kAlternativeCostExtend) * c + kThresholdDelta;


why not just set the constant to 1.1 and explain its a multiplier in the comment for it?

also i notice the semantic difference of always adding the thresholdDelta rather than using it only if its greater than the sortcost. maybe this is worth a description? i guess its because now we dont double the sortcost we use the actual cost of both sides of the route added together, scale that by 1.1 to do the extension and then we add some fluff (delta) just in case the route is super short and the scaling doesnt do much. anyway a comment to that effect would make this less magic. i know we have some above but i feel like they dont capture the full picture

yeah, for short routes scale 1.1 is not enough to find alternatives. Ok, will add a comment about that

kevinkreiser · 2021-08-10T23:24:01Z

src/thor/bidirectional_astar.cc

@@ -617,11 +643,31 @@ BidirectionalAStar::GetBestPath(valhalla::Location& origin,
      return {};
    }

+    // Exhaust hierarchy limits simultaneously in both directions. As soon as forward/reverse


@dnesbitt61 i am wondering about your comment on performance for routes that dont do hierarchy culling. where do you think that the slow down is coming from? the way this code reads to me is that we are trying to balance the searches, but wouldnt this never happen for bike and ped routes? since stop expanding would never return true? i'm just wondering where the heck the performance drop is coming, maybe its all just the updated heuristic using the actual cost. did you notice if it was more iterations?

on my local machine I noticed that iterations increased by ~6%

we can also take into account the fact if we use hierarchies or not

It is more iterations on an already long search: about 6% increase on forward direction and 10% on reverse. Bicycle and pedestrian routes need a way to decrease the search space/# of iterations and I was hoping this might since the title has the word pruning!

as I already mentioned this PR actually fixes a "bug". If it wasn't, we would discuss iterations decrease now

kevinkreiser · 2021-08-10T23:25:49Z

src/thor/bidirectional_astar.cc

+            (opp_status.set() == EdgeSet::kTemporary &&
+             edgelabels_reverse_[opp_status.index()].predecessor() == kInvalidLabel)) {
+          if (SetForwardConnection(graphreader, fwd_pred) &&
+              opp_status.set() == EdgeSet::kPermanent) {


i am wondering what this added check does its not clear to me why we check for permanent here. is this the part you are refering to when you say

wasn't still pulled out of the queue

this just says that if the opposing edge marked as "temporary" we should keep searching. We can skip this branch only when both forward and reverse searches marked the edge as "permanent"

kevinkreiser · 2021-08-10T23:36:00Z

src/thor/bidirectional_astar.cc

+      return false;
+
+    const auto& opp_edgestatus = FORWARD ? edgestatus_reverse_ : edgestatus_forward_;
+    const auto opp_edge_set = opp_edgestatus.Get(opp_edge_id).set();


i guess this hash lookup isnt 100% free 😄 maybe for modes of transport that dont use shortcuts we can disable this to help bike and ped performance

what do you think @dnesbitt61 and @genadz ?

kevinkreiser · 2021-08-10T23:41:19Z

honestly that wasnt too bad to review, i had a couple small questions but it made sense to me mostly. the performance numbers for the auto routes look fine to me. the basic gist is that the 99% percentile of an already skewed request set towards very long routes, shows a pretty big performance drop. honestly i dont care too much about p99 but i guess my quesiton is, is @dnesbitt61 's route a p99 route or a p50 route? is bike routing particularly impacted or is it just as bad as auto and its that we are focused on the 99th percentile? i feel like we shouldnt focus on the 99th percentile so much tbh. i mean i understand we want to be performant on very long routes, but if those constitute 1% of the traffic that would come to a running service it doesnt make sense to focus so much effort on making them efficient at the cost of getting worse routes.

another track to take with that is we could relax some of the optimality when we know the route is going to be very long. one could argue that on a very long route optimality of somethign that is already a combination of imperfect measurements isnt that important when we are talking about costs that differ by 1% or something over thousands of miles. i'm not sure if implementing this wouldnt make the code spaghetti but it is a possibility. the problem here is that we cant really just measure performance in the typical way to see what we need to optimize. here the algorithm itself causes more iterations, we're probably already optimized on the cpu cycles that an iteration takes so the only thing we can do is intelligently remove iterations by culling search space.

to avoid useless mainpulation with hiearchies in case of bicycle and pedestrian costs

genadz · 2021-08-11T09:34:48Z

@dnesbitt61

I tried some sort of reach based pruning a couple of years ago and saw some performance gains for bicycle routes but there were too many changes to routes for me to feel comfortable.

That's why I did some stuff about shortcuts "synchronization" and balancing hierarchies during trees expansion. It eliminated false positive prunings.

I tried this branch for my long, test bicycle route and performance decreased by 21%. I would advocate creating a simpler, bidirectional A* algorithm for use where no hierarchies exist (that is how I create my data for bicycling) or are needed (e.g. bicycle and pedestrian routes).

As I mentioned above, on my local machine I noticed increase about 6% in number of iterations. But, the same time this test

./valhalla_run_route --config ../../data/config.json --multi-run 100 -j '{"locations":[{"lat":39.64225,"lon":-76.10536,"type":"break"},{"lat":38.899677,"lon":-77.050916,"type":"break"}],"costing":"bicycle","costing_options":{"bicycle":{"bicycle_type":"Road","use_roads":"0.0","cycling_speed":"25.0","use_hills":"0.5"}}}' | grep GetBestPath

that you proposed some time ago showed greater performance drop. With last changes (using ignore_hierarchy_limits flag) it's about 10%.

I think, taking into account all described here #3257 (comment) is acceptable drop. (we shouldn't forget that this PR fixes actually a bug)

genadz · 2021-08-11T09:44:42Z

@kevinkreiser

another track to take with that is we could relax some of the optimality when we know the route is going to be very long. one could argue that on a very long route optimality of somethign that is already a combination of imperfect measurements isnt that important when we are talking about costs that differ by 1% or something over thousands of miles. i'm not sure if implementing this wouldnt make the code spaghetti but it is a possibility. the problem here is that we cant really just measure performance in the typical way to see what we need to optimize. here the algorithm itself causes more iterations, we're probably already optimized on the cpu cycles that an iteration takes so the only thing we can do is intelligently remove iterations by culling search space.

interesting thoughts. Probably we can do something similar to what alternatives routes search does: after we found a route - limit number of additional iterations. should it be implemented in this PR ?

genadz · 2021-08-11T09:47:12Z

@kevinkreiser

honestly that wasnt too bad to review,

Thank you! I will consider this a compliment)))

kevinkreiser · 2021-08-11T13:43:47Z

I saw you did some of the optimizations where certain costings dont do hierarchy culling, did you mean that neither of them had any effect? thats too bad. from my perspective this is ready to be shipped. @dnesbitt61 do you think we should further investigate ped and bike performance before merging? i think it would be ok to do a follow up where we look into reducing the number of iterations for ped and bike for longer routes specifically where accuracy isnt as important

genadz · 2021-08-11T13:52:58Z

@kevinkreiser

I saw you did some of the optimizations where certain costings dont do hierarchy culling, did you mean that neither of them had any effect?

not really, here #3257 (comment) I mentioned that it's still slower than master by ~10% (on my local machine and planet) using David's test. (before using ignore_hierarchy_limits flag it was about 15%)

./valhalla_run_route --config ../../data/config.json --multi-run 100 -j '{"locations":[{"lat":39.64225,"lon":-76.10536,"type":"break"},{"lat":38.899677,"lon":-77.050916,"type":"break"}],"costing":"bicycle","costing_options":{"bicycle":{"bicycle_type":"Road","use_roads":"0.0","cycling_speed":"25.0","use_hills":"0.5"}}}' | grep GetBestPath

@dnesbitt61 could you check if performance changed after introducing ignore_hierarchy_limits flag in your local environment ?

kevinkreiser · 2021-08-11T14:01:07Z

@dnesbitt61 "use_roads": 0.0 ouch!

dnesbitt61 · 2021-08-11T14:34:24Z

maybe it should be "prefer_roads": 0.0 - meaning the bicycle route should try to stay on cycleways and bicycle lanes (basically a safer route if possible).

kevinkreiser · 2021-08-11T15:07:31Z

@dnesbitt61 yep that bit is clear i was just pointing it out because i assume it puts a lot of penalties which would i think make a worst case for expansion (big costing numbers)

kevinkreiser

this looks good to me but i'd like @dnesbitt61 to weigh in on whether we need to do performance optimization for bike and ped now or we can do it in a separate pr

dnesbitt61 · 2021-08-12T12:14:10Z

I am fine doing performance optimization in another PR. I may use a simplified bidirectional A* for cycling in the meantime, selecting it if the data set does not have hierarchies and shortcuts (as noted in the mjolnir config).

genadz commented Aug 10, 2021

View reviewed changes

valhalla/thor/edgestatus.h Show resolved Hide resolved

genadz force-pushed the kgv_yet_another_bidir branch from 51a326e to 00eb289 Compare August 10, 2021 14:04

genadz added 5 commits August 10, 2021 17:07

Fix cost threshold for the bidirectional astar

6a7b62d

Reach-based pruning for bidirectional astar

01b6868

Synchronize shortcuts usage in the bidirectional astar

a58190b

Set connection if we encountered the destination edge

7e8141e

that wasn't still pulled out of the queue

Updated changelog

711c629

genadz force-pushed the kgv_yet_another_bidir branch from 00eb289 to 711c629 Compare August 10, 2021 14:07

genadz requested review from merkispavel and dnesbitt61 August 10, 2021 14:37

kevinkreiser reviewed Aug 10, 2021

View reviewed changes

Use 'ignore_hierarchy_limits' flag in bidirectional astar

a89f1c9

to avoid useless mainpulation with hiearchies in case of bicycle and pedestrian costs

genadz force-pushed the kgv_yet_another_bidir branch from b7943c7 to a89f1c9 Compare August 11, 2021 09:29

Add more comments

5a8383c

genadz requested a review from kevinkreiser August 11, 2021 12:11

Merge branch 'master' into kgv_yet_another_bidir

2474ab1

kevinkreiser approved these changes Aug 11, 2021

View reviewed changes

genadz added 2 commits August 12, 2021 13:49

Merge branch 'master' into kgv_yet_another_bidir

e4e8d8e

Merge branch 'master' into kgv_yet_another_bidir

8e97a1c

dnesbitt61 approved these changes Aug 12, 2021

View reviewed changes

genadz merged commit 2369460 into master Aug 12, 2021

genadz mentioned this pull request Aug 19, 2021

Bidirectional astar finds nonoptimal route #2928

Closed

purew deleted the kgv_yet_another_bidir branch August 23, 2021 16:22

Reach-based pruning for bidirectional astar #3257

Reach-based pruning for bidirectional astar #3257

Conversation

genadz commented Aug 10, 2021 • edited Loading

Issue

Tasklist

Requirements / Relations

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

genadz Aug 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

genadz Aug 11, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevinkreiser commented Aug 10, 2021

genadz commented Aug 10, 2021 • edited Loading

Comparing

Shortest path

Performance

Routes quality

Alternatives

Performance

Alternatives count

genadz commented Aug 10, 2021

dnesbitt61 commented Aug 10, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

genadz Aug 11, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevinkreiser commented Aug 10, 2021 • edited Loading

genadz commented Aug 11, 2021 • edited Loading

genadz commented Aug 11, 2021

genadz commented Aug 11, 2021

kevinkreiser commented Aug 11, 2021 • edited Loading

genadz commented Aug 11, 2021 • edited Loading

kevinkreiser commented Aug 11, 2021

dnesbitt61 commented Aug 11, 2021

kevinkreiser commented Aug 11, 2021

kevinkreiser left a comment

Choose a reason for hiding this comment

dnesbitt61 commented Aug 12, 2021

genadz commented Aug 10, 2021 •

edited

Loading

genadz Aug 10, 2021 •

edited

Loading

genadz Aug 11, 2021 •

edited

Loading

genadz commented Aug 10, 2021 •

edited

Loading

genadz Aug 11, 2021 •

edited

Loading

kevinkreiser commented Aug 10, 2021 •

edited

Loading

genadz commented Aug 11, 2021 •

edited

Loading

kevinkreiser commented Aug 11, 2021 •

edited

Loading

genadz commented Aug 11, 2021 •

edited

Loading