-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use static AArch64 decoder tables. #633
Conversation
Other improvements include faster compile time (for affected files), ~1.2s less lazy init time, ~7.8MiB smaller binary, and ~10MiB less heapusage.
There are a few places where we can clean the code up even yet more without relying on the compiler to convert the switch statements into indexed lookups. I'll post a code review tomorrow.
Having duplicate instruction entries in the original code is very bizarre. I think we should take a deeper look at that before committing here- if even just to assure ourselves we didn't break anything. @mxz297 @sashanicolas could you take a look? |
Many duplicated entries were introduced in this commit (https://github.com/dyninst/dyninst/pull/633/files#diff-0c178a3fd0c3ed4506eda2e5532af3c0). I believe some of the duplicated entries are wrong. For example, in the existing code
These are four different system registers, so at least three of the entries were wrong. It is not directly clear to me that where 0x8000 comes from based on the ARM manual. |
The key for sysRegMap is a clever bitmask composed at dyninst/instructionAPI/src/InstructionDecoder-aarch64.C Lines 435 to 436 in 62cf981
So |
Still, regardless of how the key is used, multiple values associated with the same key in std::map means only one value will ever be used. While duplicated entries are bad, I doubt they will cause actual issues because system registers are rarely used. And we will switch to use Capstone for Power and ARM in the near future. I don't think we should spent too much time to fix this piece of code. |
@mxz297 I am perfectly fine with leaving this alone since it will all be replaced with Capstone soon. I just wanted to make sure there wasn't a ticking timebomb waiting to get us. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing serious, mostly just a few places where I think we can const
ify things.
Ready for another review, if its worth doing before the Capstone stuff is merged. |
https://bottle.cs.wisc.edu/branch/PR633 Testing for this PR is a bit all of the place. I'm not sure if it's the test suite being it's usual buggy self or if there is a real problem. I will re-run. |
It's passing on ARM and PPC, and the regressions on x86 are spurious. |
Fixes #630, reduces compile time (enough to notice), ~7.8MiB smaller binary, ~10MiB less heap usage. Performance effects not well known, but for HPCStruct ~1.2s is saved on a smaller input.
The previous implementation used a lazy initialization scheme to generate the various tables used for arm64 decoding. This PR replaces all that with simple
const
tables and switch statements, which the compiler will usually reduce to a blob in.data
(or.text
).For the small(-ish) lookup tables for the state machine in
main_decoder_table
, a linear scan is employed. Any performance drop is likely caused by this in some way.Apologies for the unreadable diffs, the only change in
aarch_opcode_tables.C
that isn't noted somewhere else is the removal of duplicate keys insysRegMap
(switch wouldn't compile otherwise). I used the last one of every duplicate, hopefully that reflects what thestd::map
would have done.