-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark speed for autodiff in RigidBodyTree, MultibodyPlant and AcrobotPlant #8482
Comments
The contrast between the good behavior on AcrobotPlant and the bad behavior with RB/MBPlant makes me think we are misusing AutoDiff there somehow. |
Is there some optimization that the compiler is just not able to do or are we really performing more floating point operations? |
Some updates, now 2+ years later.
Here are some informal numbers (I haven't yet put my new benchmark under cassie-level controls yet):
Observations:
Some early casual profiling suggests that heap thrashing (especially free()) is a problem for the MBP calculations. Since this problem is small, it is possible that some version of SBO would help. We have a draft; I'm not sure how much effort it would take to make that actually viable in master. I think my plan for this is to sharpen up my new benchmark a bit, try to commit it, and then fold further work on this into ongoing AutoDiff work. I have my doubts that MBP autodiff will ever rival (theoretical) numerical integration, owing to the long sad history of Eigen's unsupported autodiff scalar. However, it is useful to have a small-problem benchmark to complement the existing cassie benchmark. |
Relevant to: RobotLocomotion#8482 This patch rewrites Hongkai's original program (from an old branch) to use google benchmark, removes some obsolete measurements (RigidBodyTree, AutoDiffUpTo73d), and adds some new ones (MBP vanilla CalcMassMatrix()). This benchmark set is nice because is captures the small-problem (only four derivatives!) end of the autodiff problem space. A possible plan would be to wrap this program with controlled-experiment scripts, similar to those in examples/multibody/cassie_benchmark, and use it to help drive further autodiff optimization work.
Wow, those numbers actually look very good @rpoyner-tri, great work! |
Relevant to: RobotLocomotion#8482 This patch rewrites Hongkai's original program (from an old branch) to use google benchmark, removes some obsolete measurements (RigidBodyTree, AutoDiffUpTo73d), and adds some new ones (MBP vanilla CalcMassMatrix()). This benchmark set is nice because is captures the small-problem (only four derivatives!) end of the autodiff problem space. A possible plan would be to wrap this program with controlled-experiment scripts, similar to those in examples/multibody/cassie_benchmark, and use it to help drive further autodiff optimization work.
@amcastro-tri I did something similar to the UpTo73 case in earlier work, perhaps this: #13902 (comment) The fixed vs. heap tradeoff may be very different for these very small problems; hence my renewed interest in SBO similar to Nimmer's #12583 . |
My thought is that it is not worth putting a lot of effort into optimizing for small problems -- typically they run fast enough for whatever toy or pedagogical purpose they serve. I would like to focus our efforts on the more-difficult problems encountered by our target users. OTOH if this little benchmark can teach us something about performance on big systems that could be useful. |
FTR my original purpose for SBO was not for toy problems, but to use it for chunking (#2619) without increasing the number of scalar types we compile to. If we changed AutoDiffXd to use SBO, users who wanted to stripe their compute in chunks (maybe even with openmp) could do so without touching the heap, and without having more compile-time types. (It's convenient to assume that there's only ever one autodiff C++ type within Drake.) |
Excellent point @jwnimmer-tri, chunking + SBO could still perform better. |
Good discussion; thanks! Rounding back to "what is this ticket about?"
In the follow-on discussion a lot of work is proposed. I think that is beyond the scope/coherence of this issue as written. Here is what I think should happen:
|
Relevant to: RobotLocomotion#8482 This patch rewrites Hongkai's original program (from an old branch) to use google benchmark, removes some obsolete measurements (RigidBodyTree, AutoDiffUpTo73d), and adds some new ones (MBP vanilla CalcMassMatrix()). This benchmark set is nice because is captures the small-problem (only four derivatives!) end of the autodiff problem space. A possible plan would be to wrap this program with controlled-experiment scripts, similar to those in examples/multibody/cassie_benchmark, and use it to help drive further autodiff optimization work.
Relevant to: #8482 This patch rewrites Hongkai's original program (from an old branch) to use google benchmark, removes some obsolete measurements (RigidBodyTree, AutoDiffUpTo73d), and adds some new ones (MBP vanilla CalcMassMatrix()). This benchmark set is nice because is captures the small-problem (only four derivatives!) end of the autodiff problem space. A possible plan would be to wrap this program with controlled-experiment scripts, similar to those in examples/multibody/cassie_benchmark, and use it to help drive further autodiff optimization work.
Merged my benchmark code, linked some tickets, and filed a new one: #14449. Closing this one. |
@mposa noticed that the autodiff computation in
RigidBodyTree
is a lot slower than that inAcrobotPlant
(which writes the dynamics equation manually). I did a quick benchmark test on the three classes, and here is the result of computing the mass matrix for 1000 timesFrom this table, we know that
AutoDiffXd
version takes about 100x more time than thedouble
version. Numerical gradient with forward difference would take about 4x time, and central difference would take about 8x time. So in this case autodiff is significantly slower than the numerical differentiation.AutoDiffUpTo73d
is about 2 ~ 5x faster thanAutoDiffXd
.The benchmark code is in https://github.com/hongkai-dai/drake/blob/benchmark_autodiff/examples/acrobot/benchmark_autodiff.cc
@amcastro-tri @mposa @sherm1 @SeanCurtis-TRI @edrumwri
The text was updated successfully, but these errors were encountered: