Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce heap allocation in CalcMassMatrix() #13928

Merged
merged 1 commit into from
Aug 26, 2020

Conversation

sherm1
Copy link
Member

@sherm1 sherm1 commented Aug 25, 2020

This is the first in a long series of changes to eliminate heap allocations and other performance issues in MBPlant.

Changes here:

  • Adds a cache entry for composite body inertias so CalcMassMatrix() doesn't have to heap allocate them. That removes just one heap allocation from each CalcMassMatrix() call.
  • Updated the heap limit in cassie_benchmark accordingly (from 175 to 174; I verified it fails at 173).
  • Modernized the MBTreeSystem cache allocation code to make it more readable. That requires having some inline methods defined in MBTreeSystem that can be provided as the cache entries' Calc() methods. Made some (internal) name changes for consistency.
  • Changed terminology for composite body inertias from "R" to "Mc" since R means rotation matrix and "M" is used for spatial inertias (and these are just spatial inertias of composite bodies). Featherstone uses "I" for spatial inertias and "Iᶜ" for composite body inertias so this is similar in spirit to that notation.

I reran cassie_benchmark to verify that this has no discernable effect on runtimes (as expected).

Relates #13186, #13902


This change is Reviewable

@sherm1 sherm1 changed the title [WIP] Eliminate heap allocation in CalcMassMatrix() Reduce heap allocation in CalcMassMatrix() Aug 25, 2020
@sherm1 sherm1 marked this pull request as ready for review August 25, 2020 17:16
Copy link
Member Author

@sherm1 sherm1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+@amcastro-tri for feature review, please
+@rpoyner-tri for platform review per rotation and because it updates cassie_benchmark

Rico, I fixed the heap limit for CalcMassMatrix and attempted to do the same for CalcMassMatrix but I found I could reduce the limit there by more than one without failure. Can you help me pick the right limit for that case (or let me know how to do it)?

Reviewable status: LGTM missing from assignees rpoyner-tri(platform),amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)

Copy link
Contributor

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: LGTM missing from assignees rpoyner-tri(platform),amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 104 at r1 (raw file):

  for (auto _ : state) {
    // @see LimitMalloc note above.
    LimitMalloc guard(LimitReleaseOnly(174));

🎊

Copy link
Member Author

@sherm1 sherm1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. Changes here are entirely internal with no user-visible impact and no changes in numerical results.

Reviewable status: LGTM missing from assignees rpoyner-tri(platform),amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)

Copy link
Contributor

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sherm1 re new limit: the old hard way is trial and error with limit values. My PR #13930 was written with finding new limits in mind. You can just run the benchmark and read off the new limit from the results. If you like we could do a trial branch with both changes to get the limit and then paste it back in this change set?

Reviewable status: LGTM missing from assignees rpoyner-tri(platform),amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)

Copy link
Contributor

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: LGTM missing from assignees rpoyner-tri(platform),amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 104 at r1 (raw file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

🎊

From a branch experiment locally, looks like 174 is the lowest ceiling we can set. Curious what other phenomena you saw.

Copy link
Contributor

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 6 files at r1.
Reviewable status: LGTM missing from assignees rpoyner-tri(platform),amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)

Copy link
Member Author

@sherm1 sherm1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: LGTM missing from assignees rpoyner-tri(platform),amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 104 at r1 (raw file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

From a branch experiment locally, looks like 174 is the lowest ceiling we can set. Curious what other phenomena you saw.

Yeah, that worked as I expected. It was the AutoDiff version of CalcMassMatrix() which let me have a lower ceiling than I expected.

Copy link
Member Author

@sherm1 sherm1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: LGTM missing from assignees rpoyner-tri(platform),amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 104 at r1 (raw file):

Previously, sherm1 (Michael Sherman) wrote…

Yeah, that worked as I expected. It was the AutoDiff version of CalcMassMatrix() which let me have a lower ceiling than I expected.

BTW I can't tell what this emoji is or what it means: 🎊
Please enlighten me!

Copy link
Contributor

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 6 files at r1.
Reviewable status: LGTM missing from assignees rpoyner-tri(platform),amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)

Copy link
Contributor

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: LGTM missing from assignees rpoyner-tri(platform),amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 104 at r1 (raw file):

Previously, sherm1 (Michael Sherman) wrote…

BTW I can't tell what this emoji is or what it means: 🎊
Please enlighten me!

It's supposed to be confetti. Renders very poorly. I'll check the autodiff malloc ceilings in my test branch.

Copy link
Contributor

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: LGTM missing from assignees rpoyner-tri(platform),amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 169 at r1 (raw file):

  for (auto _ : state) {
    // @see LimitMalloc note above.
    LimitMalloc guard(LimitReleaseOnly(62476));

With this formulation of the benchmark case, the measured malloc limit is 62,430. With the first iteration separated, as in #13930, the steady-state ceiling is 61,197.

Copy link
Contributor

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 6 files at r1.
Reviewable status: LGTM missing from assignees rpoyner-tri(platform),amcastro-tri (waiting on @amcastro-tri)

Copy link
Contributor

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: pending malloc limits adjustments?

Reviewable status: LGTM missing from assignee amcastro-tri (waiting on @amcastro-tri)

Copy link
Member Author

@sherm1 sherm1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just waiting for your blessing, @amcastro-tri !

Reviewable status: LGTM missing from assignee amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)


examples/multibody/cassie_benchmark/test/cassie_bench.cc, line 169 at r1 (raw file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

With this formulation of the benchmark case, the measured malloc limit is 62,430. With the first iteration separated, as in #13930, the steady-state ceiling is 61,197.

Thanks! Changed to 62430 and verified that it fails at 62429. Depending which PR merges first we can make final adjustments.

Copy link
Contributor

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2.
Reviewable status: LGTM missing from assignee amcastro-tri (waiting on @amcastro-tri)

Copy link
Contributor

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: LGTM missing from assignee amcastro-tri (waiting on @amcastro-tri)

a discussion (no related file):
#13930 has landed. It probably requires you to rebase, and may let you tune the malloc limits some more.


Copy link
Member Author

@sherm1 sherm1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: LGTM missing from assignee amcastro-tri (waiting on @amcastro-tri and @rpoyner-tri)

a discussion (no related file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

#13930 has landed. It probably requires you to rebase, and may let you tune the malloc limits some more.

Thanks -- very helpful. Rebased and tuned.


Copy link
Contributor

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3.
Reviewable status: LGTM missing from assignee amcastro-tri (waiting on @amcastro-tri)

Copy link
Contributor

@amcastro-tri amcastro-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: , minor comments.

Reviewed 5 of 6 files at r1, 1 of 1 files at r3.
Reviewable status: 3 unresolved discussions (waiting on @sherm1)


multibody/tree/multibody_tree.h, line 2640 at r3 (raw file):

  }

  const std::vector<SpatialInertia<T>>& EvalCompositeBodyInertiaInWorldCache(

nit, you used plural in the Calc method, you might want to keep them consistent.


multibody/tree/multibody_tree_system.h, line 135 at r3 (raw file):

  /* Returns a reference to the up to date cache of composite-body inertias
  in the given Context, recalculating it first if necessary. */
  const std::vector<SpatialInertia<T>>& EvalCompositeBodyInertiaInWorldCache(

nit, tree calc method uses plural. You might want to keep consistency all across.


multibody/tree/multibody_tree_system.h, line 338 at r3 (raw file):

  }

  void CalcCompositeBodyInertiasInWorld(

nit, notice the usage of plural here but not in the Eval methods.

Copy link
Member Author

@sherm1 sherm1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, comments addressed ptal.

Reviewable status: :shipit: complete! all discussions resolved, LGTM from assignees rpoyner-tri(platform),amcastro-tri


multibody/tree/multibody_tree.h, line 2640 at r3 (raw file):

Previously, amcastro-tri (Alejandro Castro) wrote…

nit, you used plural in the Calc method, you might want to keep them consistent.

ok, the difference is intentional -- my thinking is that here we are talking about the name of the cache entry, the "composite body inertia cache", which is consistent with the "spatial inertia cache" above and "articulated body inertia cache" below. The Calc methods whose output is just an std::vector<Something> (rather than a struct specifically for cache) are just outputing a bunch of Somethings which is why I used plurals there. They aren't actually restricted to use with cache entries, could be used for output ports for example. (The fact that we don't use them that way now doesn't matter.)


multibody/tree/multibody_tree_system.h, line 135 at r3 (raw file):

Previously, amcastro-tri (Alejandro Castro) wrote…

nit, tree calc method uses plural. You might want to keep consistency all across.

ok, see explanation above


multibody/tree/multibody_tree_system.h, line 338 at r3 (raw file):

Previously, amcastro-tri (Alejandro Castro) wrote…

nit, notice the usage of plural here but not in the Eval methods.

ok, see explanation above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants