common: Use small buffer optimization for AutoDiffXd #12583

jwnimmer-tri · 2020-01-16T05:17:05Z

This may be interesting for profiling.

This change is

jwnimmer-tri · 2020-01-16T05:19:23Z

@amcastro-tri FYI from our chat a week or two ago. This moves autodiff to the stack, in cases where we have <= 6 partials, or gracefully overflows to the heap for larger sizes. If you have some benchmarks you want to run, I'd be curious to hear if this helps or not.

The broader idea is that we could switch all of framework + MbP to use this, and then add helpers that to chunking (#7039).

sherm1

Cool! I think we usually have a lot more than 6 partials so the chunking would be necessary to exploit this. An alternative would be to exploit the fact that the number of partials is typically constant through a large computation so memory could be doled out and repeatedly reused from a fixed-size pool.

Reviewable status: needs platform reviewer assigned, needs at least two assigned reviewers, labeled "do not merge"

amcastro-tri · 2020-01-16T14:43:43Z

Thank you so much @jwnimmer-tri! I'll see to write a quick MBP benchmark with this.

amcastro-tri

Reviewed 8 of 8 files at r1.
Reviewable status: 1 unresolved discussion, needs platform reviewer assigned, needs at least two assigned reviewers, labeled "do not merge" (waiting on @jwnimmer-tri)

common/eigen_autodiff_types.h, line 26 at r1 (raw file):

  /* _Cols = */ 1,
  /* _Options = */ 0,
  /* _MaxRows = */ internal::kMaxRowsAtCompileTimeThatTriggersInlineStorage,

btw, I love these inline comments!

common/eigen_dense_storage_sbo.h, line 22 at r1 (raw file):

/** The magic MaxRowsAtCompileTime value that invokes SBO. */
constexpr int kMaxRowsAtCompileTimeThatTriggersInlineStorage = 1234567;

could this number be negative? so that there's no even chance someone will trigger this by accident.

jwnimmer-tri

Reviewable status: 1 unresolved discussion, needs platform reviewer assigned, needs at least two assigned reviewers, labeled "do not merge" (waiting on @amcastro-tri)

common/eigen_dense_storage_sbo.h, line 22 at r1 (raw file):

Previously, amcastro-tri (Alejandro Castro) wrote…

could this number be negative? so that there's no even chance someone will trigger this by accident.

Eigen actually ends up using this template argument in various computations, so I would be worried that a negative would explode on us, even it we lucked into it working for now.

jwnimmer-tri · 2020-01-16T15:30:14Z

I think we usually have a lot more than 6 partials so the chunking would be necessary to exploit this.

Yes, in many optimization programs we'd have hundreds or thousands of partials. But @amcastro-tri thought we might have some cases with only a few. The possible win of this approach is that we only have a single autodiff scalar at build-time, to keep build times and bindings sane -- and it adapts to all use cases -- those with only a few partials, those who use chunking, and those who need giant partial arrays on the heap.

We could also play with the max here to be something like 15 instead of 6, in case that opens up a useful amount of more speed. Or maybe even 7 is a magical threshold (a single pose?).

An alternative would be to exploit the fact that the number of partials is typically constant through a large computation so memory could be doled out and repeatedly reused from a fixed-size pool.

Perhaps so. At minimum, this PR shows the trick for how to replace AutoDiff's storage protocol. Someone could rework it to use heap-only arenas or pools, or maybe even add the pool into the current implementation, to have both SBO + pools.

jwnimmer-tri · 2020-02-05T22:23:00Z

Github will remember the code, removing the PR for now.

amcastro-tri · 2020-02-10T14:38:46Z

Thanks @jwnimmer-tri. I am hopping to give this a meaningful spin test in the updated contact solver I am working on right now. Not forgotten!

jwnimmer-tri · 2022-07-01T03:15:04Z

\CC @calderpg-tri FYI this is how to inject SBO into Eigen::VectorXd.

common: Use small buffer optimization for AutoDiffXd

25c4170

jwnimmer-tri added status: do not merge priority: backlog status: do not review labels Jan 16, 2020

sherm1 reviewed Jan 16, 2020

View reviewed changes

amcastro-tri reviewed Jan 16, 2020

View reviewed changes

jwnimmer-tri commented Jan 16, 2020

View reviewed changes

jwnimmer-tri closed this Feb 5, 2020

jwnimmer-tri deleted the eigen_sbo_dense_storage branch February 5, 2020 22:23

jwnimmer-tri mentioned this pull request Oct 23, 2020

autodiffxd: Remove global heap overhead (experimental) #14242

Closed

rpoyner-tri mentioned this pull request Dec 8, 2020

Benchmark speed for autodiff in RigidBodyTree, MultibodyPlant and AcrobotPlant #8482

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common: Use small buffer optimization for AutoDiffXd #12583

common: Use small buffer optimization for AutoDiffXd #12583

jwnimmer-tri commented Jan 16, 2020 •

edited by sherm1

Loading

jwnimmer-tri commented Jan 16, 2020

sherm1 left a comment

amcastro-tri commented Jan 16, 2020

amcastro-tri left a comment

jwnimmer-tri left a comment

jwnimmer-tri commented Jan 16, 2020 •

edited

Loading

jwnimmer-tri commented Feb 5, 2020

amcastro-tri commented Feb 10, 2020

jwnimmer-tri commented Jul 1, 2022

common: Use small buffer optimization for AutoDiffXd #12583

common: Use small buffer optimization for AutoDiffXd #12583

Conversation

jwnimmer-tri commented Jan 16, 2020 • edited by sherm1 Loading

jwnimmer-tri commented Jan 16, 2020

sherm1 left a comment

Choose a reason for hiding this comment

amcastro-tri commented Jan 16, 2020

amcastro-tri left a comment

Choose a reason for hiding this comment

jwnimmer-tri left a comment

Choose a reason for hiding this comment

jwnimmer-tri commented Jan 16, 2020 • edited Loading

jwnimmer-tri commented Feb 5, 2020

amcastro-tri commented Feb 10, 2020

jwnimmer-tri commented Jul 1, 2022

jwnimmer-tri commented Jan 16, 2020 •

edited by sherm1

Loading

jwnimmer-tri commented Jan 16, 2020 •

edited

Loading