-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce heap allocation in CalcMassMatrix() #13928
Conversation
480ca53
to
d42900c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+@amcastro-tri for feature review, please
+@rpoyner-tri for platform review per rotation and because it updates cassie_benchmark
Rico, I fixed the heap limit for CalcMassMatrix and attempted to do the same for CalcMassMatrix but I found I could reduce the limit there by more than one without failure. Can you help me pick the right limit for that case (or let me know how to do it)?
Reviewable status: LGTM missing from assignees rpoyner-tri(platform),amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: LGTM missing from assignees rpoyner-tri(platform),amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)
examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 104 at r1 (raw file):
for (auto _ : state) { // @see LimitMalloc note above. LimitMalloc guard(LimitReleaseOnly(174));
🎊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.S. Changes here are entirely internal with no user-visible impact and no changes in numerical results.
Reviewable status: LGTM missing from assignees rpoyner-tri(platform),amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sherm1 re new limit: the old hard way is trial and error with limit values. My PR #13930 was written with finding new limits in mind. You can just run the benchmark and read off the new limit from the results. If you like we could do a trial branch with both changes to get the limit and then paste it back in this change set?
Reviewable status: LGTM missing from assignees rpoyner-tri(platform),amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: LGTM missing from assignees rpoyner-tri(platform),amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)
examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 104 at r1 (raw file):
Previously, rpoyner-tri (Rick Poyner (rico)) wrote…
🎊
From a branch experiment locally, looks like 174 is the lowest ceiling we can set. Curious what other phenomena you saw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 6 files at r1.
Reviewable status: LGTM missing from assignees rpoyner-tri(platform),amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: LGTM missing from assignees rpoyner-tri(platform),amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)
examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 104 at r1 (raw file):
Previously, rpoyner-tri (Rick Poyner (rico)) wrote…
From a branch experiment locally, looks like 174 is the lowest ceiling we can set. Curious what other phenomena you saw.
Yeah, that worked as I expected. It was the AutoDiff version of CalcMassMatrix() which let me have a lower ceiling than I expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: LGTM missing from assignees rpoyner-tri(platform),amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)
examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 104 at r1 (raw file):
Previously, sherm1 (Michael Sherman) wrote…
Yeah, that worked as I expected. It was the AutoDiff version of CalcMassMatrix() which let me have a lower ceiling than I expected.
BTW I can't tell what this emoji is or what it means: 🎊
Please enlighten me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 6 files at r1.
Reviewable status: LGTM missing from assignees rpoyner-tri(platform),amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: LGTM missing from assignees rpoyner-tri(platform),amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)
examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 104 at r1 (raw file):
Previously, sherm1 (Michael Sherman) wrote…
BTW I can't tell what this emoji is or what it means: 🎊
Please enlighten me!
It's supposed to be confetti. Renders very poorly. I'll check the autodiff malloc ceilings in my test branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: LGTM missing from assignees rpoyner-tri(platform),amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)
examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 169 at r1 (raw file):
for (auto _ : state) { // @see LimitMalloc note above. LimitMalloc guard(LimitReleaseOnly(62476));
With this formulation of the benchmark case, the measured malloc limit is 62,430. With the first iteration separated, as in #13930, the steady-state ceiling is 61,197.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 6 files at r1.
Reviewable status: LGTM missing from assignees rpoyner-tri(platform),amcastro-tri (waiting on @amcastro-tri)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pending malloc limits adjustments?
Reviewable status: LGTM missing from assignee amcastro-tri (waiting on @amcastro-tri)
d42900c
to
8813e7f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just waiting for your blessing, @amcastro-tri !
Reviewable status: LGTM missing from assignee amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)
examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 169 at r1 (raw file):
Previously, rpoyner-tri (Rick Poyner (rico)) wrote…
With this formulation of the benchmark case, the measured malloc limit is 62,430. With the first iteration separated, as in #13930, the steady-state ceiling is 61,197.
Thanks! Changed to 62430 and verified that it fails at 62429. Depending which PR merges first we can make final adjustments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r2.
Reviewable status: LGTM missing from assignee amcastro-tri (waiting on @amcastro-tri)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: LGTM missing from assignee amcastro-tri (waiting on @amcastro-tri)
a discussion (no related file):
#13930 has landed. It probably requires you to rebase, and may let you tune the malloc limits some more.
…sMatrix(). Modernize cache allocation code.
8813e7f
to
9ab36e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: LGTM missing from assignee amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)
a discussion (no related file):
Previously, rpoyner-tri (Rick Poyner (rico)) wrote…
#13930 has landed. It probably requires you to rebase, and may let you tune the malloc limits some more.
Thanks -- very helpful. Rebased and tuned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3.
Reviewable status: LGTM missing from assignee amcastro-tri (waiting on @amcastro-tri)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 6 files at r1, 1 of 1 files at r3.
Reviewable status: 3 unresolved discussions (waiting on @sherm1)
multibody/tree/multibody_tree.h, line 2640 at r3 (raw file):
} const std::vector<SpatialInertia<T>>& EvalCompositeBodyInertiaInWorldCache(
nit, you used plural in the Calc method, you might want to keep them consistent.
multibody/tree/multibody_tree_system.h, line 135 at r3 (raw file):
/* Returns a reference to the up to date cache of composite-body inertias in the given Context, recalculating it first if necessary. */ const std::vector<SpatialInertia<T>>& EvalCompositeBodyInertiaInWorldCache(
nit, tree calc method uses plural. You might want to keep consistency all across.
multibody/tree/multibody_tree_system.h, line 338 at r3 (raw file):
} void CalcCompositeBodyInertiasInWorld(
nit, notice the usage of plural here but not in the Eval methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, comments addressed ptal.
Reviewable status: complete! all discussions resolved, LGTM from assignees rpoyner-tri(platform),amcastro-tri
multibody/tree/multibody_tree.h, line 2640 at r3 (raw file):
Previously, amcastro-tri (Alejandro Castro) wrote…
nit, you used plural in the Calc method, you might want to keep them consistent.
ok, the difference is intentional -- my thinking is that here we are talking about the name of the cache entry, the "composite body inertia cache", which is consistent with the "spatial inertia cache" above and "articulated body inertia cache" below. The Calc methods whose output is just an std::vector<Something>
(rather than a struct specifically for cache) are just outputing a bunch of Something
s which is why I used plurals there. They aren't actually restricted to use with cache entries, could be used for output ports for example. (The fact that we don't use them that way now doesn't matter.)
multibody/tree/multibody_tree_system.h, line 135 at r3 (raw file):
Previously, amcastro-tri (Alejandro Castro) wrote…
nit, tree calc method uses plural. You might want to keep consistency all across.
ok, see explanation above
multibody/tree/multibody_tree_system.h, line 338 at r3 (raw file):
Previously, amcastro-tri (Alejandro Castro) wrote…
nit, notice the usage of plural here but not in the Eval methods.
ok, see explanation above
This is the first in a long series of changes to eliminate heap allocations and other performance issues in MBPlant.
Changes here:
I reran cassie_benchmark to verify that this has no discernable effect on runtimes (as expected).
Relates #13186, #13902
This change is