Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SmacPlannerHybrid creates kinematically infeasible path #4271

Closed
tonynajjar opened this issue Apr 22, 2024 · 72 comments · Fixed by #4423
Closed

SmacPlannerHybrid creates kinematically infeasible path #4271

tonynajjar opened this issue Apr 22, 2024 · 72 comments · Fixed by #4423

Comments

@tonynajjar
Copy link
Contributor

tonynajjar commented Apr 22, 2024

Bug report

Required Info:

  • Operating System:
    • Ubuntu 22
  • ROS2 Version:
    • Humble
  • Version or commit hash:
    • It's a fork that tracks commit d509fbf
  • DDS implementation:
    • Fast-RTPS

Steps to reproduce issue

I tried it in my custom simulation but I reckon it should also happen on tb3 simulation. You could try to set the global footprint bigger than the local one and try navigating close to a lethal obstacle on the global costmap.

This is the planner/costmap config:

global_costmap:
  global_costmap:
    ros__parameters:
      use_sim_time: False
      global_frame: map
      map_topic: /no_global_map_topic
      always_send_full_costmap: True
      rolling_window: True 
      track_unknown_space: False
      publish_frequency: 5.0
      update_frequency: 10.0
      resolution: 0.05
      width: 50 
      height: 50 
      lethal_cost_threshold: 99
      # footprint_padding: 0.5
      robot_base_frame: base_footprint
      footprint: '[ [0.215, 0.42], [0.215, -0.42], [-1.01, -0.42], [-1.01, 0.42] ]'
      plugins:
        - 'static_layer'
        - 'inflation_layer'
      static_layer:
        plugin: "nav2_costmap_2d::StaticLayer"
        map_subscribe_transient_local: True
        map_topic: /mission/geofence/global_costmap
      inflation_layer:
        plugin: 'nav2_costmap_2d::InflationLayer'
        cost_scaling_factor: 1.5
        inflation_radius: 0.7 
        inscribe: false 

planner_server:
  ros__parameters:
    use_sim_time: False
    expected_planner_frequency: 20.0
    planner_plugins: ["GridBased"]
    GridBased:
      plugin: "nav2_smac_planner/SmacPlannerHybrid"
      tolerance: 0.2 
      downsample_costmap: false 
      downsampling_factor: 1 
      allow_unknown: true 
      max_iterations: 1000000
      max_on_approach_iterations: 1000 
      max_planning_time: 1.5 
      motion_model_for_search: "REEDS_SHEPP" 
      cost_travel_multiplier: 2.0 
      angle_quantization_bins: 64 
      analytic_expansion_ratio: 3.5 
      minimum_turning_radius: 0.4 
      reverse_penalty: 2.0 
      change_penalty: 0.15
      non_straight_penalty: 1.50 
      cost_penalty: 5.5 
      lookup_table_size: 10.0 
      cache_obstacle_heuristic: False 
      debug_visualizations: true
      smoother:
        max_iterations: 1000
        w_smooth: 0.3
        w_data: 0.2
        tolerance: 1.0e-10

Expected behavior

Create kinematically feasible path

Actual behavior

Resulting path makes planned footprints collide with global costmap

red thin rectangle = global and local footprint
red thicker rectangle = TEB debug visualization of the infeasible projected footprint
red path = SmacPlannerHybrid global plan

image

I also tried with a circular footprint, same result

Additional information

@tonynajjar tonynajjar changed the title SmacPlannerHybrid's path creates kinematically infeasible path SmacPlannerHybrid creates kinematically infeasible path Apr 22, 2024
@SteveMacenski
Copy link
Member

SteveMacenski commented Apr 22, 2024

Questions:

  • Can you verify that the costmap being visualized is the global costmap and those footprints are from the global planner & that in the costmap color map that this really is marked as lethal?
  • What happens if you start further from the wall but have the goal near the wall?
  • Have you made any modifications to Smac locally or perhaps missed some syncs which has introduced a bug by branch management? Follow up would be have you tested Nav2 Humble & Rolling and see this in both?
  • Is there anything going on with a regular update to the footprint topic (for there to potentially be some race condition issues)?

I've not seen this ever in my testing and deployment of Smac so I don't have an immediate thesis of what's going on beyond asking probing follow up questions hoping one will yield some insightful result.

inscribe: false

What is this? Are you perhaps using a custom inflation layer?

Not that I think this is the issue, but what if you use the params from the Global Costmap from nav2_bringup + Smac configuration guide params? Perhaps there's some subtle problem that we see with your setup that could help us narrow into it

@tonynajjar
Copy link
Contributor Author

tonynajjar commented Apr 22, 2024

Can you verify that the costmap being visualized is the global costmap and those footprints are from the global planner

This is for sure the global costmap and footprint.

& that in the costmap color map that this really is marked as lethal?

In another test, with circular footprint, I visualized the costmap with the "costmap color scheme" and it shows like this:
image

Do you have some way of getting the cost of a specific square of the costmap, by e,g, clicking on it to be even more sure?

What happens if you start further from the wall but have the goal near the wall?

In all cases SMAC ignores that the projected footprint collides. It just throws an error if the goal is exactly at the lethal obstacle

Have you made any modifications to Smac locally or perhaps missed some syncs which has introduced a bug by branch management? Follow up would be have you tested Nav2 Humble & Rolling and see this in both?

I do have a particular setup yes, it's a fork in between humble and rolling but with no changes to SMAC. Anyway I get it that we should remove this variable by testing on standard Nav2 Humble and/or Rolling. I hit some blockers when trying to do so:

  • I tried to setup rolling with the devcontainer but that's broken
  • I tried testing on Humble but for some reason the tb3 demo wouldn't launch properly.
  • I was thinking it would be hard to reproduce with TB3's circular footprint because the controller might block it from navigating first but I guess it should be feasible if I make the global footprint bigger.

What is this? Are you perhaps using a custom inflation layer?

oops, yeah artifact of the fork but I also reverted this to the default and have the same issue.

but what if you use the params from the Global Costmap from nav2_bringup + Smac configuration guide params?

Bottom line, I think this has to be done on Rolling to narrow down the issue. Let me know if it's a quick thing you can do if you have a working rolling setup already, otherwise I'll give it another shot soon

@tonynajjar
Copy link
Contributor Author

One question, assuming this is a real bug, is there some kind of test that would detect this?

@SteveMacenski
Copy link
Member

SteveMacenski commented Apr 22, 2024

I visualized the costmap with the "costmap color scheme" and it shows like this

Is the inflation radius you set smaller than the robot's actual minimum radius? I see in the color scheme now that there is no inflation outside of the critically inscribed space. I'm not even sure in the greyscale visualization why that showed a gradient - it looks to be all 253's. Try increasing your inflation radius / decreasing your cost factor to create a potential field. If a 2+m radius (idk exact, just "enough" to test) does the job, I have a thesis on why but I would have expected that you'd see some logging at the warning/error level at least once about a problem from Smac.

Irregardless, you should have a smoother potential field, it makes Smac behavior better and actually runs faster.

I don't understand your image though, considering the footprint is in the middle completely disconnected from the path + why the path is orbiting so many times.

Bottom line, I think this has to be done on Rolling to narrow down the issue

Agreed, or at least to the extent that you can test it to check if its an issue there or not to know if its something in the stack itself or something going on on your managed fork. This should be a Planning Server specific issue, so you should be able to use the simple commander API + a launch/params file to bundle up a reproduction example for me to bring up only the planner server and execute the request the triggers this for working the problem.

One question, assuming this is a real bug, is there some kind of test that would detect this?

Almost certainly, but I think we need some better understanding of what's actually happening first.

@tonynajjar
Copy link
Contributor Author

So, I spend a lot of time on trying to make it work on Rolling but I'm still not there yet. So I tried on Humble and I think I could reproduce, unfortunately without the planned_footprints as these are not on Humble (and I couldn't backport easily). We can see that a lethal part of the global costmap is in the footprint.

image

The screen recording is like 8GB so here is a screenshot instead. I made the global footprint a bit bigger than the local one. These are the changes from the default nav2 params:

image

@SteveMacenski
Copy link
Member

Can you try what I asked? You still have the inflation way too small - there is no potential field established - and I want to know if that is causing the issue

@tonynajjar
Copy link
Contributor Author

Oops I forgot that part. I'lll try it tomorrow, but assuming it works, we can't just call it a day right? I would think the planner should not generate kinematically infeasible path under any circumstances (throw an error instead). Or am I wrong?

@SteveMacenski
Copy link
Member

SteveMacenski commented Apr 23, 2024

If that's the case, I know the ~3 places I need to look and think about and I can come back with a better answer about "why" and "what" and what a solution would look like, if one is needed. If nothing else, some kind of warning or error or rejection of optimizations should occur in that situation but maybe I missed something. However we don't even know if that'll help, so one step at a time 😄 But no, that's not the end of it, it just gives me somewhere to start! Right now, if that isn't the cause, I don't have many other answers :laugh:

@tonynajjar
Copy link
Contributor Author

tonynajjar commented Apr 23, 2024

I see in the color scheme now that there is no inflation outside of the critically inscribed space. I'm not even sure in the greyscale visualization why that showed a gradient - it looks to be all 253's

sorry for the confusion, this was actually a different configuration

Anyway I think I have the results of what you asked for. robot_radius is 0.3 and inflation_radius is 0.55 but it can still happen. With the default values I couldn't reproduce easily but I don't think it's impossible.

image

image

P.S: The localization of the TB3 demo is very jumpy, it's hard to tell for sure if the plan is unfeasible or if it just "jumped" into the lethal obstacle. I'm trying to get more convincing examples

I don't understand your image though, considering the footprint is in the middle completely disconnected from the path + why the path is orbiting so many times.

To answer this, the circles are not a path orbiting, the red circular line is the global footprint I set and the blue circles are the debug /planned_footprints. (the red rectangle is the local footprint, not important here I guess) In this use case the robot radius is much bigger than the inflation layer, which you disadvised, but I was just testing my assumption that "SmacPlannerHybrid should never plan a kinematically unfeasible path".

@SteveMacenski
Copy link
Member

SteveMacenski commented Apr 23, 2024

What's the black? Something seems a little odd. Note that when you use robot radius, it uses center point-checks, not the SE2 footprint. Please use a footprint so that we can have something analog to what you're seeing :-) They're checking different things if circular or non-circular (i.e. center point costs vs full footprint line segment iteration)

P.S: The localization of the TB3 demo is very jumpy, it's hard to tell for sure if the plan is unfeasible or if it just "jumped" into the lethal obstacle. I'm trying to get more convincing examples

Note that we clear the robot's current cell, since it has to be clear if the robot is cough there. So what you showed isn't odd assuming the very next point in the path isn't in collision. That's to prevent deadlocks so that if you're in collision or just up close to something you can never get out of it. What I'd want to see is collisions mid-path!

@tonynajjar
Copy link
Contributor Author

tonynajjar commented Apr 23, 2024

What's the black?

Not sure 😕 but it's gone with my next configuration

Please use a footprint

Alright let's take this example I cooked up

It's a bit harder to cause a collision mid-path so I created this example which is a (potential) collision end-of-path - I think it points the the same thing right?. I positioned the robot in front of an obstacle and sent it a goal forward towards the obstacle. Result, the path is created but the controller, in this case TEB complains that the path is not feasible

image

image
image

@tonynajjar
Copy link
Contributor Author

tonynajjar commented Apr 23, 2024

Note that we clear the robot's current cell, since it has to be clear if the robot is cough there.

hmm I'm assuming you're talking about footprint_clearing_enabled parameter? I see that's enabled for the obstacle and voxel layer but it doesn't seem to exists for the static layer. I ran a simple experiment to confirm this:

image

the cylindrical pillar is part of the static_layer whereas the rectangular one on the left is not (only part of the obstacle layer). You can see that only the obstacle layer is cleared by the footprint.

Maybe it would make sense to also have it for the static layer? Should I create an issue for it?

@SteveMacenski
Copy link
Member

SteveMacenski commented Apr 23, 2024

Lets simplify this, show this happening with your previous environment setup you originally posted with the new configuration - don't try to make it work on the TB3 example / some other situation. Lets focus on the situation you presented. You had showed it happening for many intermediate poses that was unquestionably problematic, show me that again with the new configs so we know if it helped or not. I don't want us to get bogged down in some first/last point details right now - we should be able to see this clear as day as you showed it before with setting A and gone with setting B if we found the problem.

I think some of what you're using as a proxy (the robot's current localized location) is not a good proxy for planning, so lets remove that estimated proxy and look at the exact things as much as possible

I just grabbed the Smac Hybrid params from the docs and ran it against the same 0.3m robot radius + 0.8 inflation radius on the default Nav2 configs. All seems good to me:

image

(not in the left photo that it looks like its colliding; its not, the marker array line thinkness is just set wide. Zooming in its fine.)

When I set the inflation radius to be less than the actual robot radius, it clearly occurs

Screenshot from 2024-04-23 15-26-17

Please show me that you see that too 😉

@SteveMacenski
Copy link
Member

SteveMacenski commented Apr 23, 2024

So thinking about this logically: if you have a 20m wide robot (for example) but set the inflation radius to 0.1m, then yeah, that seems like a problem because the costmap doesn't even know to look for a collision because its all just 0 cost free space, including where you're literally in collision.

So, option:

  • Log some complaint that you set the inflation radius less than your actual robot's radius in the inflation layer. Lame, but at least exposes it
  • Force the inflation layer's radius to be at least the inscribed radius of the robot and also log the warning to users to tell them that they did it wrong and its not safe. I like this option, along probably others in addition to.
  • Insert some checks in the smac planner for this situation and log errors that are missing now for it (maybe good in addition to the bullet above)

But at its root, your configuration had set the inflation radius less than your robot's actual radius - so that's a fix for the problem, but not a fix to make sure its not possible. Point2 makes it no longer possible so that's my recommendation and should be only a few line change to check the 2 radii and take the maximum of the two. I can show that when they're equal, things work again fine (though gets very close to collision, as you'd expect without a potential field incentivizing against it; but still valid)

@tonynajjar
Copy link
Contributor Author

tonynajjar commented Apr 24, 2024

So thinking about this logically: if you have a 20m wide robot (for example) but set the inflation radius to 0.1m, then yeah, that seems like a problem because the costmap doesn't even know to look for a collision because its all just 0 cost free space, including where you're literally in collision.

Ah I see, I thought collision checking works differently: I thought for every potential path, it checks if any of the planned_footprints is in collision (like polygon to polygon intersection), and rejects the path if so. Do I understand correctly that the way it actually works is by checking only the cost at the projected poses and reasons based on that? (Anywhere I could read more about it apart from code?)

If that's the case. I like your fix suggestions. We will come back to this first issue but I think, even with the inflation radius set correctly, SMAC can still plan for infeasible paths, when the starting pose of the robot is already close to an obstacle (which is what the original issue is about). Here are some examples:

image
image
image

Here is the global planner/costmap config for these screenshots:

global_costmap:
  global_costmap:
    ros__parameters:
      use_sim_time: False
      global_frame: map
      map_topic: /no_global_map_topic
      always_send_full_costmap: True
      rolling_window: True
      track_unknown_space: True
      publish_frequency: 5.0
      update_frequency: 10.0
      resolution: 0.05
      width: 50 
      height: 50
      lethal_cost_threshold: 99
      robot_base_frame: base_footprint
      footprint: '[ [0.215, 0.42], [0.215, -0.42], [-1.01, -0.42], [-1.01, 0.42] ]'
      plugins:
        - 'static_layer'
        - 'stvl_mixed_layer'
        - 'inflation_layer'
      static_layer:
        plugin: "nav2_costmap_2d::StaticLayer"
        map_subscribe_transient_local: True
        map_topic: /mission/geofence/global_costmap
      stvl_mixed_layer:
        PERFORM_INCLUDE: layers/stvl_cameras.yaml
      inflation_layer:
        plugin: 'nav2_costmap_2d::InflationLayer'
        cost_scaling_factor: 3.0
        inflation_radius: 1.0

planner_server:
  ros__parameters:
    use_sim_time: False
    expected_planner_frequency: 20.0
    planner_plugins: ["GridBased"]
    GridBased:
      plugin: "nav2_smac_planner/SmacPlannerHybrid"
      tolerance: 0.2
      downsample_costmap: false
      downsampling_factor: 1
      allow_unknown: true 
      max_iterations: 1000000 
      max_on_approach_iterations: 1000 
      max_planning_time: 1.5 
      motion_model_for_search: "REEDS_SHEPP"
      cost_travel_multiplier: 2.0
      angle_quantization_bins: 64 
      analytic_expansion_ratio: 3.5
      minimum_turning_radius: 0.4 
      reverse_penalty: 2.0 
      change_penalty: 0.15 
      non_straight_penalty: 1.50 
      cost_penalty: 5.5 
      lookup_table_size: 10.0 
      cache_obstacle_heuristic: False
      debug_visualizations: true
      smoother:
        max_iterations: 1000
        w_smooth: 0.3
        w_data: 0.2
        tolerance: 1.0e-10

@SteveMacenski
Copy link
Member

SteveMacenski commented Apr 25, 2024

First, these are really pretty pictures! The footprint projections are really cool. But anyway -

What is the green? Curious if you know this wasn't an issue before some update or this is always been an issue. If it was a regression, fast forwarding changes until it happens could be a good way to see where it was introduced if new

This is quite hard for me to debug from afar. What I'm wondering is if there's an issue with collision checking in the analytic expansions. These paths are so trivial that its likely that these are totally formed up of only the analytic expansions without any search. Can you do this again with some more complex request so that search is strictly required at some point and its not trivial to a point that it could be just analytic expansions? That would help answer the following: Do you consistently see this at the start or end of the paths or both?

If it only happens at the end (i.e. when analytic expansions are used), that narrows things down. If it happens at the start (uniquely or combined) that also tells us something.

I cannot replicate this on my side so far, so I feel like this is still a configuration derived issue, but I don't discount a real bug either. However there are a bunch of companies with large non-circular robots using this and it hasn't been reported by them unless its a newly introduced bug.

  rolling_window: True

This is still suspicious to me, can we not do that for testing purposes or can you show that it happens with a normal global map without this? Again, collecting data. If we find that's the case, we can dig into why

    inflation_radius: 1.0 ... footprint: '[ [0.215, 0.42], [0.215, -0.42], [-1.01, -0.42], [-1.01, 0.42] ]'

I'm reading this that your robot is 1.01 from the pivot point but your inflation radius is only 1.0. What if you make the inflation radius 2.0 (just sufficiently larger than 1.0)?

       cost_travel_multiplier: 2.0

This isn't a param for Hybrid-A*, this is making me suspicious that there is something really odd going on with your params. What if you use the params from https://navigation.ros.org/configuration/packages/smac/configuring-smac-hybrid.html#example exactly?


Note: When I say "start" and "end" I don't want to even look at the very first or very last points, just generally collisions are being recorded near the start/end. Lets not miss the forest for the trees.

@tonynajjar
Copy link
Contributor Author

tonynajjar commented Apr 26, 2024

What is the green?

Green is TEB's path/control sequence

Curious if you know this wasn't an issue before some update or this is always been an issue.

I can't pinpoint a moment where that "started" happening, I think it was always there

Can you do this again with some more complex request so that search is strictly required at some point and its not trivial to a point that it could be just analytic expansions? That would help answer the following: Do you consistently see this at the start or end of the paths or both?

Since the bug mostly occurs when being already close to an obstacle (being in circumscribed area), I would say that it occurs mostly at the start or end of a path i.e. first/last 5 poses. It does not happen when going from free point A to free point B while avoiding an obstacle midway.

I cannot replicate this on my side so far

That's odd. Do you have all the information needed to try and reproduce it? Important factors:

  • Send many goals as close as the lethal region as possible
  • it doesn't always reproduce from the first time - maybe try reproducing exactly the start and end goal that you see in the screenshots
  • rectangular footprint
  • use the same configuration as mine (maybe replace STVL with obstacle layer) - I also tried with the default configuration and will try again
  • Are you testing on rolling or humble? I'm using Nav2 rolling but on Humble ROS system with minor changes just so that it builds (e.g QOS)

However there are a bunch of companies with large non-circular robots using this and it hasn't been reported by them
It could be that their controllers workaround the infeasible plan

This is still suspicious to me, can we not do that for testing purposes or can you show that it happens with a normal global map without this?

Will do. I'll test again with as much "standard" config as possible.

What if you make the inflation radius 2.0 (just sufficiently larger than 1.0)?

will do

This isn't a param for Hybrid-A*, this is making me suspicious

that was an oversight, the default config was just copied as is and changed what is needed. Removed

Since you can't reproduce easily, do you have ideas what debug logs I could add inside the code that could give some insight?

@tonynajjar
Copy link
Contributor Author

tonynajjar commented Apr 26, 2024

Let's start with more pictures.

First some successes:

image
image

the failures:

image
image
image
image
image

Default config used:

global_costmap:
  global_costmap:
    ros__parameters:
      use_sim_time: True
      global_frame: map
      map_topic: /no_global_map_topic
      always_send_full_costmap: True
      rolling_window: False
      track_unknown_space: True
      publish_frequency: 1.0
      update_frequency: 1.0
      resolution: 0.05
      width: 50 
      height: 50
      robot_base_frame: base_footprint
      footprint: '[ [0.215, 0.42], [0.215, -0.42], [-1.01, -0.42], [-1.01, 0.42] ]'
      plugins:
        - 'static_layer'
        - 'stvl_mixed_layer'
        - 'inflation_layer'
      static_layer:
        plugin: "nav2_costmap_2d::StaticLayer"
        map_subscribe_transient_local: True
        map_topic: /mission/geofence/global_costmap
      stvl_mixed_layer:
        PERFORM_INCLUDE: layers/stvl_cameras.yaml
      inflation_layer:
        plugin: 'nav2_costmap_2d::InflationLayer'
        inflation_radius: 2.0
        cost_scaling_factor: 0.7

planner_server:
  ros__parameters:
    use_sim_time: True
    planner_plugins: ["GridBased"]
    GridBased:
      plugin: "nav2_smac_planner/SmacPlannerHybrid"
      downsample_costmap: false           # whether or not to downsample the map
      downsampling_factor: 1              # multiplier for the resolution of the costmap layer (e.g. 2 on a 5cm costmap would be 10cm)
      tolerance: 0.25                     # dist-to-goal heuristic cost (distance) for valid tolerance endpoints if exact goal cannot be found.
      allow_unknown: true                 # allow traveling in unknown space
      max_iterations: 1000000             # maximum total iterations to search for before failing (in case unreachable), set to -1 to disable
      max_on_approach_iterations: 1000    # Maximum number of iterations after within tolerances to continue to try to find exact solution
      max_planning_time: 5.0              # max time in s for planner to plan, smooth
      motion_model_for_search: "DUBIN"    # Hybrid-A* Dubin, Redds-Shepp
      angle_quantization_bins: 72         # Number of angle bins for search
      analytic_expansion_ratio: 3.5       # The ratio to attempt analytic expansions during search for final approach.
      analytic_expansion_max_length: 3.0  # For Hybrid/Lattice nodes: The maximum length of the analytic expansion to be considered valid to prevent unsafe shortcutting
      analytic_expansion_max_cost: 200.0  # The maximum single cost for any part of an analytic expansion to contain and be valid, except when necessary on approach to goal
      analytic_expansion_max_cost_override: false  #  Whether or not to override the maximum cost setting if within critical distance to goal (ie probably required)
      minimum_turning_radius: 0.40        # minimum turning radius in m of path / vehicle
      reverse_penalty: 2.0                # Penalty to apply if motion is reversing, must be => 1
      change_penalty: 0.0                 # Penalty to apply if motion is changing directions (L to R), must be >= 0
      non_straight_penalty: 1.2           # Penalty to apply if motion is non-straight, must be => 1
      cost_penalty: 2.0                   # Penalty to apply to higher cost areas when adding into the obstacle map dynamic programming distance expansion heuristic. This drives the robot more towards the center of passages. A value between 1.3 - 3.5 is reasonable.
      retrospective_penalty: 0.015
      lookup_table_size: 20.0             # Size of the dubin/reeds-sheep distance window to cache, in meters.
      cache_obstacle_heuristic: false     # Cache the obstacle map dynamic programming distance expansion heuristic between subsiquent replannings of the same goal location. Dramatically speeds up replanning performance (40x) if costmap is largely static.
      debug_visualizations: false         # For Hybrid nodes: Whether to publish expansions on the /expansions topic as an array of poses (the orientation has no meaning) and the path's footprints on the /planned_footprints topic. WARNING: heavy to compute and to display, for debug only as it degrades the performance.
      use_quadratic_cost_penalty: False
      downsample_obstacle_heuristic: True
      allow_primitive_interpolation: False
      smooth_path: True                   # If true, does a simple and quick smoothing post-processing to the path
      debug_visualizations: true
      smoother:
        max_iterations: 1000
        w_smooth: 0.3
        w_data: 0.2
        tolerance: 1.0e-10
        do_refinement: true
        refinement_num: 2

as a workaround, is there a config that specifies how far away the robot_base_frame or the footprint should stay away from lethal regions?

@SteveMacenski
Copy link
Member

SteveMacenski commented Apr 26, 2024

Hi,

Got it. Huh. Honestly not sure what to say about that off hand. I still can't seem to make that happen. I believe you that there is a problem if that's what you're seeing, but I'm at a loss if I can't reproduce it. Can you make a repository with the costmap, planner configs, launch file, world, and a script to call the server to go from some example A->B that you're using here so that I can try 1:1 what you're doing on my side and use that as a test case to look into?

If I can poke around, that would be ideal. Otherwise, I'd ask you to look into the collision checker object and log information that you think might help point to a particular issue.

I know, for instance, the last point could potentially be slightly in collision due to the exact goal pose added at the end of the path to have precise solution even though the plan is made in a binned space. But any intermediary point should never be possible.

A few of the failures (images 1,2,5) I look at and see that it appears to be the start/end point which is something we should deal with as a separate matter. Image 4 also has that but I see the 2nd to last pose could (?) be in collision. Its hard to tell given the thickness of the line. Image 3 looks less ambiguous but still not overly clear unlike some of your original images

@tonynajjar
Copy link
Contributor Author

Yeah I get that it can be hard to debug if you can't reproduce. Instead of creating a repo from scratch, how about I use what I can from TB3 demo, change to a rectangular footprint + config used above + script and send you my branch?

@SteveMacenski
Copy link
Member

SteveMacenski commented Apr 26, 2024

See the note above I edited, you responded faster than I thought 😉

I would like something standalone with the planner server / demo script so we can look at this in isolation so we're not messing with the full simulator with localization updates and whatnot that can make the waters a little more murky. Usually what you describe would be fine for me (even preferable so I don't have to change my usual workflows), but this looks to be subtle now unlike before, which was a clear and obvious problem, so I don't want to leave anything to interpretation

@tonynajjar
Copy link
Contributor Author

I've been delaying creating the reproducible example because it seems like a lot of work. Instead I tried to debug it more myself and I think I have a good lead: by commenting out this check and consequently making the collision checker run all the time, I could not reproduce the issue anymore. So I'm guessing footprint_cost_ or possible_collision_cost_ is wrong.

I'm finding it hard "proving" that one or the other is wrong but that's where I'm at right now, if you have any leads, shoot

@tonynajjar
Copy link
Contributor Author

tonynajjar commented May 6, 2024

One more insight, by adding this log:

image

I sometimes get a footprint_cost_ of 0 when it's clearly not 0 for any point on the path:

image
image

Any idea how that could happen?

@SteveMacenski
Copy link
Member

SteveMacenski commented May 6, 2024

by commenting out this check and consequently making the collision checker run all the time, I could not reproduce the issue anymore. So I'm guessing footprint_cost_ or possible_collision_cost_ is wrong.

Not shocked by that I suppose, but unclear. For the second image, there are clearly points in free space, so those 0.0s might be real. What processor are you running this on?

I don't have an immediate answer. Some costmap resource not being locked correctly in your fork (or main)? I forget if you're showing this based on your managed fork or Nav2 mainline

  update_frequency: 10.0

What if this is set to 1.0? Curious.

@tonynajjar
Copy link
Contributor Author

Not shocked by that I suppose

It's already useful to know that the problem is likely not in the collision checking itself but in the fact that collision checking is not triggered

What if this is set to 1.0?

Same thing

What processor are you running this on?

AMD Ryzen 7 7840HS w/ Radeon 780M Graphics

there are clearly points in free space, so those 0.0s might be real.

with huge inflation layer still happens:
image

My next step would be to query manually again for the cost at these coordinates and see if it's like you said an issue with locking the costmap. Do you know of an easier way than to create a service that calls costmap_->getCost?

@SteveMacenski
Copy link
Member

Not off hand, but the rviz color scheme seems clear that it is not actually 0. Hence suspecting threading / locking issues getting corrupted data perhaps

@tonynajjar
Copy link
Contributor Author

tonynajjar commented May 6, 2024

Hence suspecting threading / locking issues getting corrupted data perhaps

Looks like it's the case, I plugged in the coordinates in a custom service that get the cost and it doesn't return 0. I'm really going down the rabbit hole with this bug...
Sounds like it's not your first rodeo with /threading/locking issues, anything you could link at from the past that could help me?

@tonynajjar
Copy link
Contributor Author

This is next up for me

It didn't take long, @SteveMacenski try it with: git clone https://github.com/angsa-robotics/planner_playground -b reproduce-collision-at-the-start

@SteveMacenski
Copy link
Member

SteveMacenski commented May 23, 2024

EDIT: Looking at it again, I doubt it's an issue because it looks like a completely different path was selected, as opposed to clipping two final poses from the original path. Will leave my comment there anyway for you to be the judge.

Correct. Because this is no longer finding the exact analytic solution, its finding another solution within your specified tolerance parameter. Its not just removing the last point (which was the second option I set in my write up above).

@tonynajjar
Copy link
Contributor Author

This is next up for me

It didn't take long, @SteveMacenski try it with: git clone https://github.com/angsa-robotics/planner_playground -b reproduce-collision-at-the-start

@SteveMacenski any estimate when you'll be able to take a look at this?

@SteveMacenski
Copy link
Member

SteveMacenski commented May 28, 2024

The Jazzy release stuff has put this as a secondary problem for me until thats all squared away for a release. I hoped to get to this over the weekend but that didn't pan out. I expect that this is a much simpler problem though if you wanted to take a look. You shouldn't have to go too deep since we convert the start to a grid-space early on and then collision check it. I expect we have some clearing under start that we say is "OK" that could be made into an option

@SteveMacenski
Copy link
Member

@tonynajjar this is easily explained.

  // Note: We do not check the if the start is valid because it is cleared
  clearStart();

We clear the start since the robot is there, it is thus "clear" because we are physically there already. It must be clear (though perhaps in contact with something) if the robot is in this location. When I add in while removing clearStart():

  // Check if starting point is valid
  if (!_start->isNodeValid(_traverse_unknown, _collision_checker)) {
    throw std::runtime_error("Starting point in lethal space! Cannot create feasible plan.");
  }

It triggers that there is a collision and refuses to plan. This is the behavior we have in Humble & Iron but not main. I think this is a sensible behavior to have, don't you? Else you could end up in a forever loop of not being able to move when in collision with something in the static map that moved and thus is no longer actually there (or if in a tight space cannot get out).

I suppose we could make this an option about if you want to clear it or check it, that would be straight forward. I however don't see much reason you'd want to be so strict about it unless you were generating starting point paths where the robot physically was not correctly located for testing (or I suppose precomputation).

Let me know what you think. Its an easy resolution either way (closing this as finally completed; and/or adding in the option)

@tonynajjar
Copy link
Contributor Author

That's my bad, I didn't mean for the starting pose to be in collision. The problem I'm trying to reproduce is when the collision is at some pose at the beginning (but not the first pose). I was able to reproduce and colored the first pose in red so you can be sure that it's not the first pose that's in collision:

image

I updated the branch reproduce-collision-at-the-start so you can pull and try again

@tonynajjar
Copy link
Contributor Author

tonynajjar commented Jun 10, 2024

Any chance this could be looked at this week again? (sorry to pressure but deployment is soon and that's still an issue 🙈). I would look at it myself if I didn't already for > 5 hours, I just don't know enough about how this planner works yet.

@SteveMacenski
Copy link
Member

SteveMacenski commented Jun 11, 2024

@tonynajjar I'm going to look into it this afternoon, but I think even before starting to program I know what it is. I've been having this on the back of my mind and I guess my brain was crunching on it subconsciously after you posted the updated image.

I think what's happening is that we set the starting cell to clear. So, if you also happen to rotate in place within the same cell, that would also be free at any orientation, not just the starting pose. And that's what it looks like is happening with your example. Its easy for me (once this compiles and I come back from lunch) to test if that's the case with the same experiment I did above removing clearStart.

Then, I'll just need to change how we handle the starting point as "special" rather than editing the costmap for how we process it. Its a little unclear to me how it is that we have a virtual rotate-in-place for hybrid-A* at the start, since that's not a behavioral option. But, that is a behavioral option for State Lattice so that's a bug nonetheless. I gauge this to be done by EOD. Lunch first.

@SteveMacenski
Copy link
Member

Yup, just removing the clearStart fixes this. I'll just find a workaround for how to handle the first starting pose differently 👍

image

@SteveMacenski
Copy link
Member

SteveMacenski commented Jun 11, 2024

#4418

I think this squares up this ticket :-)

Thanks for filing this @tonynajjar and these are good details to have worked out. The start pose issue that we just handled is actually unique to main so there's no need to backport to iron/humble, those just reject requests where the start is occupied straight-up so there's no problem for existing binary users.

Obviously happy to reopen if there were other topics, but I reviewed the thread and I think we hit all the others

@tonynajjar
Copy link
Contributor Author

Cool! Will try it out tomorrow. I think there is one more thing to do in this issue - warn or block if the inflation layer is not configured big enough. We discussed it in the beginning

@SteveMacenski
Copy link
Member

I thought that was something you were going to open a PR for 🙃

@tonynajjar
Copy link
Contributor Author

I wanted to gather all the TODOs for this issue and tackle them at once but that's not how it panned out 😅 I'll reopen just in case my short-term memory fails me until I create another ticket (or PR) for it tmrw

@tonynajjar tonynajjar reopened this Jun 12, 2024
@SteveMacenski
Copy link
Member

Add a ERROR log instead of a ticket and that should suffice! At least that way someone looking through logs to debug would see that they made a mistake even if we don't actively prevent them from making that error.

@marcusvinicius178
Copy link

marcusvinicius178 commented Jun 20, 2024

Hi @SteveMacenski @tonynajjar @doisyg I am using ROS2 Iron, the clearStart fix is not needed as you have mentioned. However I am still facing collision issues.

I have already increased the radius_inflation parameter either on local and global costmaps to be greater than the robot's footprint and set the resolution of costmaps from small values (0.01) to huge values ( 10.0).
Unfortunately, the result is always the same: A) Collision against obstacle or B) Trucking getting stuck in front of my obstacle reattempting several new paths, as you can check on this Video and on the pictures below:

Can you please assist me to fix this issue?

My nav2 parameters are in this link: truck_nav2_params.yaml
And the issue is also described here: collision with smac planner

Screenshot from 2024-06-20 13-17-04
Screenshot from 2024-06-20 13-16-46
Screenshot from 2024-06-20 13-14-16
Screenshot from 2024-06-20 13-13-55
Screenshot from 2024-06-20 13-13-47
Screenshot from 2024-06-20 13-13-14
Screenshot from 2024-06-20 13-12-46
Screenshot from 2024-06-20 13-12-23

@tonynajjar
Copy link
Contributor Author

Any chance your could provide reproducible examples like I did using the planner_playground?

@marcusvinicius178
Copy link

@tonynajjar I was busy while you were fixing the issue, I am reworking the planner tasks now...I've missed this part, I will check how to use this tool you have mentioned and then I can submit the results here. One question for now, In your case, you don't need to avoid an obstacle in the middle of your trajectory like me?

@tonynajjar
Copy link
Contributor Author

I do, but that was never a problem if the inflation layer is configured correctly, only collisions at the start and the end

@marcusvinicius178
Copy link

marcusvinicius178 commented Jun 20, 2024

I do, but that was never a problem if the inflation layer is configured correctly, only collisions at the start and the end

Oh, that’s concerning 😬. Could I take a look at your configuration file and get the dimensions of your robot (footprint or radius)? For reference, my truck is 10 meters in length and 4 meters in width. I’ve set the inflation layer values to be greater than these dimensions, even using a value of 50.0. However, I’ve noticed that the costmap's obstacle layers seem to have a numerical value threshold. Visually, the inflation seems to stop increasing after setting a value around 5.0.
Curious to see how you managed this avoidance issue (or this is a specific issue from my side having a huge robot hehe).

@tonynajjar
Copy link
Contributor Author

You can get my planner config from my fork of the planner_playground I shared a few comments up

@iradionov
Copy link
Contributor

iradionov commented Jul 11, 2024

Hi, guys!
The problem seems not connected with the inflation radius value (it doesn't guarantee getting a footprint collision-free trajectory). I guess the problem is located in this condition. We compare the shifted (?) point cost with possible_collision_cost_ value and most of the time the method footprintCost(...) is not executed.

The same problem is discussed here and here regarding MPPI algorithm.

Here is how the default behavior looks like:
default planner

Here is the result with this modification:
modified planner

@SteveMacenski
Copy link
Member

SteveMacenski commented Jul 11, 2024

That simply removes the optimization and you pay for that in compute time. The issue is that your inflation layer is not properly setup as discussed at length in the ticket above. If using main / rolling, you should see an error message about this now: https://github.com/ros-navigation/navigation2/pull/4423/files -- though that covers in the inscribed radius.

What happens if you set your inflation radius to be larger than your circumscribed radius? The tuning guide, ROSCon talks, and example configurations all highly suggest that you use a smoother potential field for performance and path quality improvements. While actual collisions shouldn't happen, this is why most people don't see this issue since having a tiny inflation with pure 0-cost space is a bad configuration for those algorithms.

We find the cost to be possibly in collision using the circumscribed cost (https://github.com/iradionov/navigation2/blob/4b112116c5ae728185e7c4a7874c6a5879abf7ce/nav2_smac_planner/src/smac_planner_hybrid.cpp#L226), which in your case should render to 0 since you don't have a sufficient inflation radius to have cost out at that distance. You should actually be seeing that log error https://github.com/iradionov/navigation2/blob/main/nav2_smac_planner/src/collision_checker.cpp#L123-L127

@iradionov
Copy link
Contributor

Thank you for your prompt response.
There wasn't an error message because the inflation layer radius is larger than the inscribed radius.
I understand the idea of compute time optimization but extending an inflation layer radius leads to increasing costs of this region.
So instead of getting trajectories like this
sample1
we increase the inflation layer radius up to the circumscribed radius and can observe something like
sample2
In my case (coverage task) this is not acceptable, but is acceptable for most tasks.

The last question: why do we get the cost of the shifted point?

@SteveMacenski
Copy link
Member

The last question: why do we get the cost of the shifted point?

It has to do with the fact that casting it to an unsigned int floors the value (example: 4.9 will become 4u). Adding 0.5 makes it so that we round to the nearest whole number when down casting (in our example now 4.9 would be come 5u)

we increase the inflation layer radius up to the circumscribed radius and can observe something like

That is a tuning thing with the cost_penalty. You could set that low so that there is no incentive to stay away from higher cost areas (outside of collision, which will always be rejected) :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants