Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undesirable heap allocations in MultibodyPlant #13186

Closed
edrumwri opened this issue Apr 30, 2020 · 19 comments
Closed

Undesirable heap allocations in MultibodyPlant #13186

edrumwri opened this issue Apr 30, 2020 · 19 comments
Assignees

Comments

@edrumwri
Copy link
Collaborator

Heap allocations make computational run times nondeterministic and should be avoided, if possible to permit real-time applications. In some cases, the heap allocations identified below are likely reducing performance in non-real-time applications as well.

I expect this list will grow over time.

@sherm1
Copy link
Member

sherm1 commented Apr 30, 2020

Thanks for pointing these out, @edrumwri! Some questions:

  • the first example above is a resize of a supplied output argument. As long as the caller keeps using the same object for the output, this should not result in new heap allocation once the max size has been reached. Also, the caller could reserve that amount and get no heap allocation at all.
  • the other two seem like avoidable allocations, though possibly requiring an API change to pass in some necessary temp space. For prioritizing, have you been able to measure an effect from the heap allocation or is the cost speculative at this point?

@jwnimmer-tri
Copy link
Collaborator

jwnimmer-tri commented Apr 30, 2020

The colorful lines of code (tree functions) are a distraction. Look at the method names after the bullet points (plant functions). Those plant function do not offer user-supplied storage, and the entry points that we actually care about. (In particular, GetKinematicPathToWorld is called internally by MbT, not by the user.)

@edrumwri
Copy link
Collaborator Author

Per discussion with @jwnimmer-tri over slack, the key point that I neglected to mention is the typical workflow for real-time programming where one first goes through a "warm up" period where heap can be allocated at will. Afterwards, a real-time phase is entered where no further heap allocations are allowed.

The caching system naturally supports this workflow systems if the cache allocates the memory once and the cache evaluation function doesn't allocate any new memory (@ggould-tri has pointed out concerns with using the caching system in a real-time context; we're still proceeding under the assumption that we can use caching, carefully, to avoid unwanted heap allocations).

* the other two seem like avoidable allocations, though possibly requiring an API change to pass in some necessary temp space. For prioritizing, have you been able to measure an effect from the heap allocation or is the cost speculative at this point?

I don't think API changes are necessary as long as the caching system is used for maintaining those std::vector objects. In any case, our primary concern isn't cost, but deterministic computation times, and memory allocation (among other OS services) is typically not deterministic.

@sherm1
Copy link
Member

sherm1 commented Apr 30, 2020

Ah, yes for the first case the guilty code is here, where that output argument is allocated.

@sherm1
Copy link
Member

sherm1 commented Apr 30, 2020

cc @mitiguy

@mitiguy
Copy link
Contributor

mitiguy commented Jun 16, 2020

Issue #13186 was filed on MBP doing "undesirable heap allocation" by @edrumwri who was also the requester of issue #12592 for the new moving-frame bias acceleration feature (related issue #13453). Heap allocation in CalcBiasSpatialAcceleration() caused concern -- is the new method useful for Evan's application without caching? From email correspondence with Evan (June 16, 2020): "I can wait for the heap allocation to be eliminated at a later date. We're unable to use Drake in a real-time application until all of the heap allocations in MBP have been eliminated: adding a new heap allocation does not make the problem any better, but neither does it make it considerably worse."

@sherm1
Copy link
Member

sherm1 commented Jun 16, 2020

but neither does it make it considerably worse

Not sure I agree with that -- the changes in #13453 double the number of heap allocations in MBP.

@sherm1
Copy link
Member

sherm1 commented Aug 20, 2020

Turns out there are many other heap allocations in the MultibodyPlant computations. We are tracking those now with the instrumented cassie_benchmark and work is in progress to remove them.

@jwnimmer-tri
Copy link
Collaborator

This issue does not seem specific enough to be actionable. There are hundreds of public functions on MbP (and more if you count the supporting public classes like Body, Joint, etc.). What list of functions in particular need to be heap-sensitive? Without a list of functions to tackle (i.e., to write LimitMalloc test cases for), there is no way to move forward here and we should close out the issue.

Is the implication of Sherm's comment above that the only relevant functions are the non-autodiff cassie tests? Which is to say, fixing the 3 heap ops in DoubleInverseDynamics and 22 heap ops in ForwardDynamics is sufficient to close this issue?

@sherm1
Copy link
Member

sherm1 commented May 25, 2022

WDYT @edrumwri ?

@edrumwri
Copy link
Collaborator Author

edrumwri commented May 27, 2022

I think we can discount the parts of the forward dynamics stepping scheme- there is little hope that methods that compute contact forces can be made real-time safe, at least in general, so it doesn't make sense to chase down heap allocs there. But pretty much everything else in MBP*- query functions, slicing functions (i.e., get parts of an array), forward kinematics, inverse dynamics- should be able to avoid heap allocations / be made to be real time safe.

I've viewed removing heap allocs as a win for both real time safety and performance. If there is not interest in maintaining no-heap-allocs on the MBP side (we've been blessed with few heap allocs in the systems framework), we can always move the MBP accesses to a non-real time thread on our end. But it would be a shame to have to do so.

* I haven't been paying attention to the deformable body stuff that I know is coming/already here, so discount this statement as necessary.

@jwnimmer-tri
Copy link
Collaborator

jwnimmer-tri commented May 29, 2022

If there is not interest in maintaining no-heap-allocs on the MbP side ...

For my part I think it's a useful enough goal. We just need to understand what's desired.

... but pretty much everything else in MbP -- query functions, slicing functions (i.e., get parts of an array), forward kinematics, inverse dynamics -- should be able to avoid heap allocations ...

That's at least a start for setting a goal here. I suppose the next actionable step would be for someone to try to turn that general idea into a list of specific function names to repair.

Here's an approximate tally of all functions in scope (based on the pybind coverage report).

 11  drake/multibody/math/spatial_acceleration.h
 11  drake/multibody/math/spatial_force.h
  9  drake/multibody/math/spatial_momentum.h
 30  drake/multibody/math/spatial_vector.h
 11  drake/multibody/math/spatial_velocity.h
  6  drake/multibody/plant/coulomb_friction.h
  4  drake/multibody/plant/externally_applied_spatial_force.h
220  drake/multibody/plant/multibody_plant.h
 14  drake/multibody/plant/propeller.h
 10  drake/multibody/plant/wing.h
 13  drake/multibody/tree/articulated_body_inertia.h
 13  drake/multibody/tree/ball_rpy_joint.h
 41  drake/multibody/tree/body.h
 25  drake/multibody/tree/door_hinge.h
 10  drake/multibody/tree/fixed_offset_frame.h
  9  drake/multibody/tree/force_element.h
 31  drake/multibody/tree/frame.h
  2  drake/multibody/tree/frame_base.h
 63  drake/multibody/tree/joint.h
 21  drake/multibody/tree/joint_actuator.h
 21  drake/multibody/tree/linear_bushing_roll_pitch_yaw.h
 14  drake/multibody/tree/linear_spring_damper.h
 19  drake/multibody/tree/planar_joint.h
 22  drake/multibody/tree/prismatic_joint.h
 23  drake/multibody/tree/revolute_joint.h
 10  drake/multibody/tree/revolute_spring.h
 29  drake/multibody/tree/rigid_body.h
 40  drake/multibody/tree/rotational_inertia.h
 18  drake/multibody/tree/screw_joint.h
 23  drake/multibody/tree/spatial_inertia.h
 11  drake/multibody/tree/uniform_gravity_field_element.h
 30  drake/multibody/tree/unit_inertia.h
 13  drake/multibody/tree/universal_joint.h
  5  drake/multibody/tree/weld_joint.h

We need to cut down that list of ~800 functions into the subset which needs to be heapless. (Some of those 800 are probably obviously irrelevant, e.g., constructors and destructors, etc.)

If you want to propose a few dozen that are more important that the rest, we can look into fixing those first.

@amcastro-tri
Copy link
Contributor

If you could give us a short list (say 3 to 5) methods you'd like us to work on first @edrumwri, that'd help a lot.
We can re-evaluate after that.

@edrumwri
Copy link
Collaborator Author

Will do.

@sherm1
Copy link
Member

sherm1 commented Nov 16, 2022

@edrumwri is this issue still relevant? My thought is to close it now and open more-specific heap issues as they crop up.

@edrumwri
Copy link
Collaborator Author

Still highly relevant. Our engineering cycles have been focused elsewhere, but I'll be able to get you a list of functions this quarter.

@edrumwri
Copy link
Collaborator Author

I think that our remaining heap allocations are limited to CalcInverseDynamics(). It looks like the heap allocations are due to some temporary storage in MultibodyTree::CalcInverseDynamics(), as well as it returning a VectorX, rather than allowing an output vector to be passed in.

@edrumwri
Copy link
Collaborator Author

edrumwri commented Dec 21, 2022

Update on this: we hadn't exercised all code paths. We're also seeing heap allocations in some Jacobian functions as well.

The author (@mitiguy) is aware of two:

// TODO(Mitiguy) One way to avoid memory allocation and speed this up is to

and:

std::vector<SpatialAcceleration<T>> AsBias_WB_all(num_bodies());

We're seeing another here that is not just a heap allocation, but seemingly pretty inefficient as well (why wouldn't calculating the kinematic path to the world take constant time?):

std::vector<BodyNodeIndex> path_to_world;

Another (seems pretty simple to address):

const VectorX<T> vdot = VectorX<T>::Zero(num_velocities());

Another:

Matrix3X<T> p_WoBi_W(3, num_points);

Last one:

p_AQi->template topRows<3>() = X_AB * p_BQi.template topRows<3>();

@jwnimmer-tri
Copy link
Collaborator

We welcome any pull requests from users to fix these kind of problems that they care about. For now, the Drake team doesn't plan to make any changes ourselves here.

@jwnimmer-tri jwnimmer-tri closed this as not planned Won't fix, can't repro, duplicate, stale Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants