-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for dehydrated runtime data structures #77884
Conversation
So far I assumed we could have method table pointers emitted in the native code and relocations would take care of them when loaded. But how that would work with rehydration of the method tables? I am obviously missing some important parts. I think some more explanation on how this works could be helpful. |
The comments are somewhat scattered across the code, but the key part is:
So the executable now defines an extra region within the |
I see. Makes sense. |
8d9a137
to
204d46a
Compare
int command, payload; | ||
|
||
byte[] buf = new byte[5]; | ||
Debug.Assert(Encode(1, 0, buf) == 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like IfFailGo
type macro style. Why not just throw onerror instead of error codes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is under #if false
- it was my debugging helper to make sure I didn't mess up the bit packing (this file can be compiled on its own with the region enabled).
I'm still debating whether to check it in, delete it, or spend more time converting it to a unit test that is never going to fail because nobody will want to touch this anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat! LGTM as someone not super familiar with the code. :)
src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/DehydratedDataNode.cs
Outdated
Show resolved
Hide resolved
src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/DehydratedDataNode.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for walking through the code and doing this work to reduce the size!
I took a look at the Encode and Decode routines on DehydratedDataCommand class and the testing there. The testing seems to cover the 1 byte boundary for EncodeShort well and I couldn't think of any missing cases. I also took look at the DehydratedDataNode & StartupCodeHelpers where this is used and while not fully able to follow the pointer manipulation for RehydrateData for the last 2 options, I assume the lookup table logic holds (and will blow up the program if it doesn't :-))
Hopefully the dependent update merge happens soon so the larger testing matrix can follow.
switch (command) | ||
{ | ||
case DehydratedDataCommand.Copy: | ||
// TODO: can we do any kind of memcpy here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Buffer.MemoryCopy
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't call that from Test.CoreLib. I'm also a little bit worried - this code runs during very early startup - casting, typeof, newobj
and many other things are not available. Buffer.MemoryCopy feels a bit higher up the stack that I'm not sure it would be safe to call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this runs very early. I am not sure if Buffer.MemoryCopy does anything complex, but since the data here is unmanaged, it would eventually just call something like InternalCalls.memmove
.
Are these copied chunks long enough to involve memmove? If they are typically just 10-20 bytes, it may not get any faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Nice!
Co-authored-by: Sven Boemer <[email protected]>
/cc @ivanpovazan we should rerun our sample after this is merged and see if we get more size reduction |
I can confirm this change makes a major size reduction on Linux. Publishing NativeAOT a simple ASP.NET WebAPI app: Before: 29.31 MB This is a 32.5% size reduction! |
This is a follow up to #77884. In the original pull request, all relocation targets went into a lookup table. This is not a very efficient way to represent rarely used relocs. In this update, I'm extending the dehydration format to allow representing relocations inline - instead of indirecting through the lookup table, the target immediately follows the instruction. I'm changing the emitter to emit this if there's less than 3 references to the reloc. This produces a ~0.5% size saving. It likely also speeds up the decoding at runtime since there's less cache thrashing. On a hello world, the lookup table originally had about 11k entries. With this change, the lookup table only has 1700 entries. If multiple relocations follow after each other, generate a single command with the payload specifying the number of subsequent relocations. This saves additional 0.1%.
Nice! Thank you for measuring it! #78545 should bring another maybe ~1% saving, and I have a list of a couple more savings (not quite 32.5%, but maybe another 5%). |
This adds support for dehydrating pointer-rich data structures at compile time and rehydrating them at runtime.
NativeAOT compiler generates several pointer-heavy data structures (the worst offender being
MethodTable
). These data structures get emitted at compile time and used at runtime to e.g. support casting or virtual method dispatch.We want to be able to generate structures that have pointers in them because e.g. virtual method dispatch needs to be fast and we don't want to be doing extra math to compute destination (just dereference a pointer in the data structure the compiler generated and call it).
But pointers are big, and there's extra metadata the OS needs in the executable file on top of that (2 bytes on Windows, 24 (!!) bytes on Linux/ELF).
This adds support for "dehydrating" the data structures with pointers at compile time (representing pointers more efficiently) and "rehydrating" them at runtime.
The rehydration is quite fast - I'm seeing 2.2 GB/s throughput on my machine. Hello world rehydrates under a millisecond.
The size savings are significant: 7+% on Windows, 30+% on Linux.
Depends on #77972 getting through (with the update from llvm-project repo).
Cc @dotnet/ilc-contrib