-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding gamma term on noise smoothing to base implementation (ready for review) #99
Conversation
@artofnothingness can you test this and sanity check that you see the same improvements (or at least non-regressions) I do? You should see basically the same behavior, but moves 6% faster while doing so. Its potentially a hair more smooth as well, but I didn't try to quantify it. |
@SteveMacenski This is basically correct. But, 'gamma/std' should be 'gamma/(std^2)'. |
@SteveMacenski i'll take a look on this and test after reading the paper |
OK - adding the squared version, testing now the parameters to make sure they still are the right answer with that change. |
Adding the squared version significantly damages performance, we no longer hit 0.4m/s (of the 0.5m/s desired speed). Instead, we're at 0.2m/s. I changed gamma to 1.0 and robot didn't move (honestly not shocking though, sounds about right). Down to 0.0 back to total (since has no impact). I'm seeing something like I found if I square all of the other critic functions though, Is there something in the theoretical foundations that requires us to use |
That is a valuable observation. First of all, the std in the control cost should be squared in theory. The control cost is theoretically derived from the free energy and importance sampling. The origin of this is from probability density functions (pdf) for I also don't see any relation between the form of the cost function and the std stuff. MPPI allows any type of cost function (which is denoted by |
OK. Squaring an inverse of a small number makes a very small denominator which blows out the other critics as currently tuned - ex 0.2 std (1/0.2 = 5) is 0.04 (1/0.04 = 25) variance. Our options are to retune every critic to use this value with a target gamma of 0.1, or we can drop gamma to something very low (0.01-0.02). The critics have their cost functions squared whereas this one virtually just has a squared weight But reading "Information Theoretic Model Predictive Control: Theory and Applications to Autonomous Driving", the term is where
Not using the bounded noises (e.g. Edit: Though in this paper, I see the ... which is really annoying. I'm not sure which, either, or neither is right 🤦 |
Got it, that's sensible. You're right that it would not impact after My tentative plan tomorrow, if I get enough time, is to retune for squared critic functions to be able to compare against the best performance I can get with the current setup. I will, as part of that, play with values of Though, to be principled, I will probably need to start with a more basic setup and create some performance metrics to tell if / when the control cost term squared with large error washes out the linear / quadratic critic cost functions with large errors. This is mostly what will take a long time that I can't probably do pre-IROS, but I'm hoping that there are clearly defined performance differences between linear vs quadratic so I can be a little more hand wavey. If there's not, then I'll need to make something quantitative to make a decision based off of. What I mean by that is that since this term added in this PR is If, for instance, Separate of that, I did some thinking today, I think the reason why our lambda / gamma are so much smaller than you're used to is because our cost functions are linear and not quadratic. Since they're linear, the costs are more closely bunched together so we need to take a smaller band of them to find the valid trajectory. If they were convex, the differences between the best and worst cases would be more defined and we could widen the set of controls to impact our final results a bit. Also, I think its because we have more and complex cost functions going on, so small differences in top-performing trajectories are more important. But if we make the cost functions convex, I think we should be re-entering the range of values you expect. Thanks for the information, as always. Let me know if that sparks any thoughts! |
That's an insightful opinion. I agree there is no generally acceptable golden parameters for In my case, I tune the weight of each critic function in |
Is this perhaps something that can become more formalized in the MPPI method (could be one contribution in a paper)? Its a small contribution considering Understood on the impulse penalties, that's just like our collision cost penalty. In practice, it looks like the MPPI papers really only have a single consistent critic function penalizing for full speed, whereas we have many. For less trivial applications / constraints, it seems like having some formalization of the weighting between the low-energy/smoothness balance and the state cost balance would be beneficial. We can "just do it" here as part of our Its sounding more and more to me that a paper in effect of Application of MPPI with Multi-Objective State Costs (Of Mobile Robots or For Path Tracking or some specific task?) would be valuable describing:
That doesn't feel like "alot", since its all kind of engineering design rather than adding something new to the MPPI theory. This is kind of why it feels like a paper would need something mathematical contribution to add to the MPPI method motivated by our application, and then the rest of this is added on as experimental description / design to fulfill the application that motivated the MPPI variation. So as we're chatting in these tickets / email, let me know if you think there's some method variation that comes to mind that can help for our practical mobile robotics application - that could unlock the rest of this for an IROS 2023 paper. Perhaps some variation to help with unsmooth/jittery S(V) critic functions like we have here causing wobbly behavior 😉 Or something else, of course Edit: See other ticket for some discussion on today's progress -- these PRs are getting warped together now. Right now, I'm working on with PR's code as the paired down version of just the energy/smoothness term (which we'll apply over there eventually). |
I temporarily closed #98, so that we could resolve the issues from gamma and control cost first. @artofnothingness Please see the latest comments from @SteveMacenski in #98 about the ideas of the critic function.
I totally agree with this. It would be a valuable contribution even just by writing down the discussions we have had done here after we solve all the issues. Some mathematical contribution would make these much more important. Some recent variants of MPPI are focusing on changing the sampling method to effectively avoid obstacles. But this way is not suitable for our case. We need to narrow it down to the "critic function" part. Perhaps, as just a pop-up idea, we could begin with the goal critics and path critics to see is there any decent connection to the mathematical interpretation of MPPI. As mentioned by @SteveMacenski , we may argue that a quadratic speed cost is not acceptable for robots in crowded human-filled spaces, and these critic functions (maybe with mathematical description) are good alternatives. But I yet haven't heavily thought about it. |
So I'm able to make things quite a bit more smooth if I use more points at a smaller model_dt, but also with some retuning to try to maximize smoothness, its lowered the maximum speed to about 70% desired. I can't seem to find a way to get it up higher without losing the gains in smoothness. See the updates on this branch. Basically, without PathAlign higher, we skirt really close up to obstacles since the PathFollow and Goal critic are pushing to the robot deeper into the path around corners, causing "short cutting". But we can't really reduce PathFollow or Goal since those are what's driving forward progress. Pushing PathAlign up will make the system follow the path pretty exactly - which is then away from skirting obstacles - but then it can't diverge or do "smooth" turn arounds with prefer forward. Perhaps this can actually be tuned to be in balance, but a bit fragile and it would be nice if we didn't have to rely on paths to avoid skirting up against collisions for more free-style navigation (e.g. "follow this object" tasks vs having a path plan). Increasing the Obstacle critic doesn't really solve the problem, because we're already setting a very high value for in-collision trajectories, so we end up just skirting VERY close up to collision (and then often into collision with realistic following error) where its technically valid. If we make the Obstacle critic costs high to compete with the Path Follow / Goal, we drop our speeds + get very wobbly / unstable. Regardless, it doesn't seem to actually solve the problem even with the Obstacle weight at It seems like we need a new way to drive the behavior forward outside of GoalCritic. If I try to push that higher, the entire thing starts to blow up, even when only using the Goal Critic with no other critics. I'm not sure why it would be so unstable just by itself. We need something else to push the system forward so we can get up to speed. Then we need something to help with distance from obstacles more smoothly. Those seem like the main tasks from here. I tried to tune the system with only the GoalCritic at 1, 2, and 3rd order powers with the weights, temperature, and lambda and was still unable to get up to 90% speed / smoothly. This tells me that I think we need something else to drive the robot forward (if none of the reasonable orders for the critic functions, nor weights applied to it, nor softmax/energy gains work to do this. If the most basic MVP can't accomplish it, then adding in more critics isn't going to do it. Something's wrong with the Goal Critic, its use, or the inputs to it for driving to full speed) Goal to keep being able to be further from obstacles, follow the path (but be able to deviate / do those 3 pt turns), and move relatively close to the maximum requested speed. Edit minor: @artofnothingness do you have thoughts on how we can improve the collision avoidance behavior without the PathAlign critic? I have new thoughts on an observation I made late this evening on the speed / smoothness topic in the edit below. I'm at this branch with these parameters:
Edit - important discovery: What's especially interesting to me is that I can see if I just increase the speed limit from For example, with this branch, I showed that if I removed We have some distribution around zero-mean to start, we find the best controls, then iterate. After a few iterations, we're starting to clamp ~half of the distribution to max speeds and are scored similarly/poorly for whatever reason (maybe because at max speed + some My speculation is this is due to the softmax function, but I'm not a traditional ML guy, so I don't have a built up intuition on its behavior to know if what I"m describing is behaviorally consistent with that. However, from this experiment, I can't see any other source of a virtual ceiling on the noised control behavior. Its clearly not critic related and the only other operations are clamping and the update control sequence stuff with softmax -- everything else is just mathematical integration of the content as given. So questions @tkkim-robot
I didn't get a chance to test, but I think this would let me drop down the Goal / PathFollow critic a little which would stop us from short cutting as we don't need as strong of a "forward pull" anymore. Might help with the collision stuff above. Still need probably a refinement to the ObstacleCritic, but this observation I think can be the thread to resolve the rest. Thoughts? This seems like a plausible path forward to me. slides over in chair its been a long day. |
I have looked over your thoughts and results today. The bottom line is that this is valid but it would be better to add some components here. First, the “clamping” for the action values is definitely mandatory when there are physical limits on the actuator. For example, when controlling a servo motor, we should prevent action values that are larger than the maximum motor angle from being passed through the robot dynamics, because the resulting motions from them are not physically plausible. This is a “hard” constraint. So we should clamp the action values properly. Similarly, as we have done before, we clamp the actions before evaluating them. This is reasonable, but at the same time, it essentially violates the theoretic aspects of MPPI. From both the Stochastic Optimal Control Theory and the Information Theory, the final form of MPPI originates from the assumption that the injected noises have multivariate normal distributions. But when we clamp the perturbed action values, the distribution becomes like this: Therefore, in this situation, the MPPI might not output an optimal result (maybe the control cost and the KL divergence stuffs all go wrong). Although I don’t know how much it degrades the performance, but theoretically, it does. This is the first issue. The second issue with action clamping is that the controller is disadvantageous to output a control value that is near to the constraint boundaries. Let me give you a biased yet straightforward example: When an optimal action value from the previous step was 0.5 m/s (equal to the max speed), the parallel samples would be like:
If the critic function do not give a large difference on the 0.5 m/s samples and the 0.4 m/s samples, the resulting output must be smaller than 0.5 m/s after applying softmax function.
This maybe results from the above two perspectives. I think there are two possible solution to this issue. The first one would be using truncated normal distributions. Although this might somewhat alleviate the problem, it still is not a normal distribution. The second one would be penalizing constraints. It imposes some penalties on the samples that fall outside of the constraint boundaries (let me call this the
I think what you did here was the second option, but with zero penalty on the In our case, the longitudinal speed is an action variable, but at the same time, it is a state variable. Although the motor specs may dictate the maximum speed of a robot, there is no “physical” restriction to this value. My point is, we may remove the clamping before evaluation. After that, we clamp the entire action trajectory to ensure that all future optimal actions are within the available action range of the system. Let me go back to the example:
If a Anyway, this is a great observation! Thank you for sharing your valuable results. |
I agree, that this feels like a sub-optimal choice since we're giving potentially impossible trajectories to be scored and are then weighted in the
I think this is what we're actually experiencing - you describe it better 😄
The problem here is that if we penalize the difference between 0.4 and 0.5, then we're incentivizing the robot to always go full speed, which then has problem stopping or slowing down when needing to reverse. I tried that, its not good. Could we only penalize values outside of the feasible range (eg only things > 0.5m/s)? That way, even if we don't clamp the noised velocities, they're weighted lower when we compute the optimal trajectory for the cycle so that we rely on the This is actually a half-bridge to what I was doing with the tl;dr Is this a good way forward?
I'll play with this today, but I only have like ~1 hour I can give so not sure I'll get far. Edit: Pushed the updates. Things here are working pretty well, though, from the added constraint critic, I typically see performance get much more unstable when this value is ~1.0 (which I would have thought to be reasonable, on the scale as others). I have it set around 0.3. I tested with and without this critic without any clamping at all (so free to go above 0.5 if it chooses). I found that this critic cuts the max velocity above the threshold in half (0.56m/s without the critic to 0.53m/s with it) without clamping. Setting to I'll need to do some more tuning based on these changes, but I'm pretty happy with what I'm seeing. I do think though now actually going full speed, we do need that 1% sampling in #96 probably to have the option to slow down closer to collisions. We can definitely push up the temperature now. Params
|
This is reasonable.
The goal/path critic will incentivize the robot move faster and faster. Am I right? So, my point was, we should adjust the magnitude of
I think what you did here was the same thing. All in all, everything is clear and reasonable!! The results you mentioned also make sense. It seems like you have taken one step forward! |
Fixing comparison for finding path point closest to final trajectories
Adding Doxygen, linting, copyright headers, header guards, and a litany of new unit testing
I merged in the bug fixes, unit tests, and doxygen entries from #104 Running quick CI on a Nav2 branch to get some coverage metrics https://github.com/ros-planning/navigation2/runs/10002533028 for staging. Still more work to do, but just wanted to see how close I am to 92% from my 2 weeks of unit test writing 😆 . Its 91.3%. There's a couple of gaps to fill in coverage it showed me but all are pretty easy. |
Major updates
|
@tkkim-robot is this formulation correct?