Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common: Use small buffer optimization for AutoDiffXd #12583

Conversation

jwnimmer-tri
Copy link
Collaborator

@jwnimmer-tri jwnimmer-tri commented Jan 16, 2020

This may be interesting for profiling.


This change is Reviewable

@jwnimmer-tri
Copy link
Collaborator Author

@amcastro-tri FYI from our chat a week or two ago. This moves autodiff to the stack, in cases where we have <= 6 partials, or gracefully overflows to the heap for larger sizes. If you have some benchmarks you want to run, I'd be curious to hear if this helps or not.

The broader idea is that we could switch all of framework + MbP to use this, and then add helpers that to chunking (#7039).

Copy link
Member

@sherm1 sherm1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! I think we usually have a lot more than 6 partials so the chunking would be necessary to exploit this. An alternative would be to exploit the fact that the number of partials is typically constant through a large computation so memory could be doled out and repeatedly reused from a fixed-size pool.

Reviewable status: needs platform reviewer assigned, needs at least two assigned reviewers, labeled "do not merge"

@amcastro-tri
Copy link
Contributor

Thank you so much @jwnimmer-tri! I'll see to write a quick MBP benchmark with this.

Copy link
Contributor

@amcastro-tri amcastro-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 8 files at r1.
Reviewable status: 1 unresolved discussion, needs platform reviewer assigned, needs at least two assigned reviewers, labeled "do not merge" (waiting on @jwnimmer-tri)


common/eigen_autodiff_types.h, line 26 at r1 (raw file):

  /* _Cols = */ 1,
  /* _Options = */ 0,
  /* _MaxRows = */ internal::kMaxRowsAtCompileTimeThatTriggersInlineStorage,

btw, I love these inline comments!


common/eigen_dense_storage_sbo.h, line 22 at r1 (raw file):

/** The magic MaxRowsAtCompileTime value that invokes SBO. */
constexpr int kMaxRowsAtCompileTimeThatTriggersInlineStorage = 1234567;

could this number be negative? so that there's no even chance someone will trigger this by accident.

Copy link
Collaborator Author

@jwnimmer-tri jwnimmer-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 unresolved discussion, needs platform reviewer assigned, needs at least two assigned reviewers, labeled "do not merge" (waiting on @amcastro-tri)


common/eigen_dense_storage_sbo.h, line 22 at r1 (raw file):

Previously, amcastro-tri (Alejandro Castro) wrote…

could this number be negative? so that there's no even chance someone will trigger this by accident.

Eigen actually ends up using this template argument in various computations, so I would be worried that a negative would explode on us, even it we lucked into it working for now.

@jwnimmer-tri
Copy link
Collaborator Author

jwnimmer-tri commented Jan 16, 2020

I think we usually have a lot more than 6 partials so the chunking would be necessary to exploit this.

Yes, in many optimization programs we'd have hundreds or thousands of partials. But @amcastro-tri thought we might have some cases with only a few. The possible win of this approach is that we only have a single autodiff scalar at build-time, to keep build times and bindings sane -- and it adapts to all use cases -- those with only a few partials, those who use chunking, and those who need giant partial arrays on the heap.

We could also play with the max here to be something like 15 instead of 6, in case that opens up a useful amount of more speed. Or maybe even 7 is a magical threshold (a single pose?).

An alternative would be to exploit the fact that the number of partials is typically constant through a large computation so memory could be doled out and repeatedly reused from a fixed-size pool.

Perhaps so. At minimum, this PR shows the trick for how to replace AutoDiff's storage protocol. Someone could rework it to use heap-only arenas or pools, or maybe even add the pool into the current implementation, to have both SBO + pools.

@jwnimmer-tri
Copy link
Collaborator Author

Github will remember the code, removing the PR for now.

@jwnimmer-tri jwnimmer-tri deleted the eigen_sbo_dense_storage branch February 5, 2020 22:23
@amcastro-tri
Copy link
Contributor

Thanks @jwnimmer-tri. I am hopping to give this a meaningful spin test in the updated contact solver I am working on right now. Not forgotten!

@jwnimmer-tri
Copy link
Collaborator Author

\CC @calderpg-tri FYI this is how to inject SBO into Eigen::VectorXd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants