-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential MPPI Controller Improvements #3351
Comments
You may want to consider Tsallis VI-MPC, an MPPI variant which uses the Tsallis divergence instead of the KL divergence. As noted in artofnothingness/mppic#99, tuning lambda can be difficult when the costs varies across a large range in different scenarios. While adaptive temperature methods can be used to resolve this issue, an easy-to-implement fix is to switch to using the Tsallis divergence instead of the KL divergence used in the original work. The Tsallis divergence is a generalization of the KL divergence that allows it to "interpolate" between the MPPI weighting scheme (i.e., exponential) and the CEM weighting scheme (i.e., maintaining a fraction of elite samples). and can be implemented by replacing the weight where Intuitively, compared to MPPI, this allows you to ignore a fraction of the worst samples without over-emphasizing the best performing samples, which was previously not possible. In experiments, we found this added flexibility resulted in smaller variances compared to MPPI in difficult situations. |
Hi, thanks for the note, I have not run into that yet! I have a couple of follow-up questions on the systems side (since that's really the world I come from, lesser the optimal control world 😄 )
|
1:Although 2:One of the benefits we found from VI-MPC was that the variance of the cost of the optimal control was smaller. The tradeoff in MPPI for
Ideally, we would be able to pick (multiple) rollouts that have a low cost without also averaging over too many of them in a way that hinders performance (e.g., not being able to achieve full speed). Ignoring the worst 3:A robust way to pick I agree that |
You've made a good case for this, I will update my checklist above to investigate this solution for MPPI. While its not so large it prevented me from releasing the controller and feeling that its ready for use, there is some jitter in the output path that could be reduced or removed if we can either (a) reduce the noise characteristics which still results in exploration of the control space or (b) combine results in a way to better average smoothly. (a) can be done using Log-MPPI (or another similar formulation of changing sampling mechanics) and (b) what you propose sounds very good as well. I don't suppose this is something you'd be interested in contributing? I know there are definitely some users out there that would be more than happy to test it for us to let us know the results on actual robot hardware deployed. If not, its something I'd probably start looking into next month after I finish my current project set. |
Id actually be happy to contribute, but I’m not sure how to run MPPI and benchmark improvements vs the current version. I tried running on the turtlebot sim using the default bringup config but using MPPI for the FollowPath controller, but I got errors regarding the global path having an empty frame. Is there some standard setup for testing MPPI (in sim at least)? |
On reasonable values - we do typically in the thousands of trajectory samples (between 1000-2000). A piece of confusion that I'm sure the paper would address but since you're here, its probably better to simply ask: For the piecewise function, I assume we apply the
What errors? I use the |
Yes, you are correct that If we want to find
Interesting. Do you have a config file you use with Print debugging shows that EDIT: I'm not sure what the contract for the even though |
I think an important question is why should we care about including the top
The default bringup is what I dogfood so I see what users see. It loads the config file from
That's not overly surprising that a Edit: I just grabbed @doisyg it looks like an issue was introduced in #3425 - how did your testing not pick that up? Edit2: Oh I know how, I actually implement the header frames in the Smac Planners that you're using. Got it. Anyhow, easy enough fix.
I patched it here: #3458 will backport to Humble as well once merged. Will be merged within an hour and you should be able to move on from there. |
My bad, sorry, yes tested with Smac planners but not NavFn. |
No worries, simple fix 😄 |
@SteveMacenski Just curious, is there a way to see what the "jitter in the output path" you mentioned looks like? In some sense, a control trajectory that is more "optimal" may actually enhance the "jitter" that you see in the output path if the jitter comes from the cost function (and hence the optimal control) jumping from timestep to timestep, and you may actually want the optimizer to be "less optimal" for a "smoother path". |
That is a very good point. I'm thinking this jitter is the result of the sampling mechanics. I think this because when I reduce the noise's I can say from experience charting the controls while tuning the cost functions, the cost functions can cause jitter (particularly the obstacle critic) but largely modifying them did not change the noisy characteristics of the trajectory (again, with the exception of the obstacle critic, which is somewhat unsmooth). I think it is worth some analysis on my side to make sure that I isolate the system away from the critics and see if the reason is more to do with the objective functions than the sampling or divergence used. I don't have defensible metrics or experiments off hand right now, but the impression left upon me after tuning the critics was that the remaining noise wasn't largely due to critics being added/removed, but to the system processes at hand. That impression could be wrong though since I wasn't thinking in those terms, so I'm not saying that with a high degree of confidence. Though, if changing to the Tsallis divergence as you propose makes things smoother, that would be a marker that at least some of the lack of smoothness is due to KL and/or the necessarily low Did you get things working after I fixed the path handler issue? I don't suppose you tried with Tsallis? |
Another suggestion for improvement here: in the documentation - For each of the critic sections, add a one sentence summary of what the critic is supposed to do. I realize most are self explanatory (e.g. "Goal Angle Critic"), but from just the docs I'm not entirely clear on what "Prefer Forward Critic" actually does (it prefers forward motion? it prefers a forward heading?). |
@mikeferguson I'll do you one better: #3951 |
Do you have a specific approach in mind for incorporating the acceleration constraints? Any assistance or guidance on this would be greatly appreciated. @SteveMacenski |
There are a few places:
These get started, but are probably not sufficient for more "hard" constraints like acceleration. Think of these steps as incentivizing the right thing. We should also:
I'll note that previous attempts to take the randomly sampled trajectories and threshold them up front before going into the critic functions caused some issues. We used to threshold max velocities and resulted in us never being able to achieve the maximum velocity since samples near the limit were being cut off and in the averaging of the softmax function the final trajectory never got up to the final limit. That's why the Constraint critic was added and we removed the velocity constraint clipping to the final trajectory's output. However, the predict function is supposed to according to the MPPI theory be used to use a dynamics and/or AI model to take the controls and convert them into vehicle velocities - so I think for acceleration constraints specifically that is perfectly A-OK. But I did want to note that we should do some testing to make sure we can actually achieve the max acceleration limits in that case, or close enough to them that its OK due to this prior experience. @RBT22 is this something you're open to contributing? I don't think its actually all that difficult, but some careful testing and validation will take a few hours |
I tried to do some experiments following your notes, but encountered a few challenges. I'd appreciate your input on the following observations:
I also attempted modifying the predict function to allow a jump from the odometry value but not after, modeling the final applyConstraints. In this case the critic could be useful, but I could not get this to work. |
Think of the critics function of Constraints as weighting optimal samples away from using ones that invalidate acceleration limits -- as a soft constraint, the carrot before the stick. So once we apply the softmax function, we should be approximately already following the acceleration limits, assuming that the critic's weight it sufficiently high. That's a soft constraint that might still have some error outside of the acceleration bounds when combined with other critics incentivizing their own behaviors. That's where the hard constraint of
So we send the first new velocity, which we know is valid from the current robot's state including velocity we initialized with and onward we go round-n-round. I could be wrong here on some subtle detail, but I did this exact thing with the velocities and thought very closely about it, so I'm reasonably confident this should work fine. Now, if we also add in the acceleration constraints into the prediction function (which is kind of what its there for TBH), we even further reduce the impact the soft constraints in the Critic Constraint objective function has -- to the point that its probably not even necessary. But I don't want to say its not necessary until we test and my hypothesis is true. Its still possible to have the infeasibility introduced from the softmax function since its combining multiple trajectories together, but that should be really limited and the
I suppose I don't understand this comment. I'm representing this logic as a loop because I haven't thought about the vectorized version for a quick remark:
The first value is already set to the robot's state which is used to project from. So when you then predict, you're populating The current implementation is just a pass through, we set cv* to v* and move on without applying a dynamics model. This is exactly the place for the dynamics model to translate control requests into the vehicle's actual responses to those requests -- starting from |
Thanks a lot for the detailed reply! I'm still grappling with some aspects, as I've noticed issues with the first control value. While addressing velocity constraints, it was feasible to clamp it in the applyConstraint without requiring knowledge of the previous state, in this scenario, it seems more challenging for me.
How I see, the first value in the control_sequence might not be guaranteed to be valid, considering that odometry only extends to the state's Could you help me understand if there's something crucial I might be overlooking? Appreciate your guidance! |
The only place that isn't true is here, which I'm suspicious of now that we're talking about it that this shouldn't actually be I think from those links you should be able to prove to yourself that all
That happens before the dynamics models is applied in |
@RBT22 does that make sense? |
@SteveMacenski I'm really sorry for not being really active on this topic. Towards the end of last year, I had to address some pressing issues, and honestly I was a bit even more confused after I read your message.
Contrary to this, I think the opposite of this is true actually. After using the state's The sequence of operations follows these major steps:
|
No need for apologies, just following up :-) I know things come up
Nope, lets look at a snippet from the predict function
We're setting the value of You can also prove this to yourself looking at the noise generator:
We're setting the
Correct, because these methods is called after
I mentioned before, but I know this is all complicated, but this is actually an area I want to explore more later. I'm not totally convinced that we did the 100% right thing here for our applications. For now, regardless, its fine since we just have the pass-through |
Sorry about this, the I find your suggestion of using Wouldn't it be more prudent to guarantee the validity of the predict function for the specific |
Of course, that's what we're discussing right now. Replacing the pass-through model with an actual dynamics model. The MPPI research uses this slot to do things like adding DNNs trained on robot data to convert
I believe that breaks the information theoretic constraints. But, lets game this out and tell me if you find a gap:
Thus, I think its probably just another way of thinking about the same thing. I think as I've explained before, but in different phrasing: my concern with using For the case of wanting to artificially limit the dynamics computed in the controller is that if you don't constrain by the acceleration limits you'd like to enforce, then there's no enforcement of it at the controller level. The outputs are unconstrained. However, you could have something like the That seems pretty roundabout given that both the velocity smoother and the controller has access to the same data. And it also wouldn't then actually result in any acceleration limits applied to the output of the controller server if not paired with a velocity smoother. If the velocity smoother is going to squash it anyway, why not use the So, I agree ^ So... I think that could be perhaps a parameter? But honestly I think for the AMR case, |
I would love to see the acceleration constraints added for MPPI. Also it would be nice to incorporate some velocity saturation controls. A differential robot may not be able to simultaneously achieve max linear when angular velocities are introduced. |
@SteveMacenski I read your comment on another issue (#4126 (comment)) and please don't wait for me. Unfortunately, my current schedule doesn't allow me the time to dedicate to this task. I made some attempts, but none of them were successful. If you see that this can be completed easily, please proceed. When I initially said I would work on this, it seemed like no one besides my team really cared about this topic, and I had more free time. I genuinely hope I can contribute to something else in the future. |
@SteveMacenski Is anyone currently working on this? If not, then I can take up this work. Since we will already be covering the acceleration constraints in simulating the future velocities using the motion model(in |
Not currently! I'd love the help :-)
The softmax operation can go beyond those bounds when taking account the scoring. You don't have to take my word for that either, you can print the output trajectories and you should see that happen yourself with slightly infeasible trajectories in some cycles. It should be very small (if at all) if they're being limited on generation, but I believe it will be technically possible to occur. |
I wasn't able to build docker image with main branch due to missing rosdeps, and source installation of all the missing dependencies would be too painful. So I've started with iron branch for now and will get back after few days. |
I've tried accounting constraints in the predict function, which works fine in most cycles. But as you said, in a few cycles acceleration violates the constraints. So either we need to penalize the critic cost to reduce this possibility further or to apply hard constraints on the final output velocities. I've created a PR with the initial commit #4352 . |
Reviewed! Lets follow up in the PR :-) |
General improvements
Improved smoothness options
Improvements in speed potentially
The text was updated successfully, but these errors were encountered: