-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize decoding of registers #805
Conversation
FP and X registers are currenltly limited to MAX_REG which is 16. Enforce limitation while decoding and take advantage of this to optimize execution. This brings a 6% speed increase on pi_test benchmark Signed-off-by: Paul Guyot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, however in my plans I'd like to finally add support for x_regs > 16 (using some kind of sparse structure). Still declaring fp_regs > 16 as unsupported would be fine in regular usage.
I agree. The main point of this PR is to establish the benefit of #776 (and actually they merge conflict each other). If we bump MAX_REG to 1024, this optimization goes away as it's conditioned at compile time. Also I think supporting 1024 registers could be done differently. With #795 , we will no longer need to store 1024 registers on each context as the number of used registers is capped with |
What I did notice in some previous disassembly is that the compiler generated in some occasions something code using x[0], x[1], x[2], ... big hole ... x[1023] or something like that, so we need some specific code path with a sparse data structure when dealing with >= 16 registers. |
Or maybe not. It should be captured in #698 rather. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge it since it improves what we have now, and let's iterate over this later.
OK. I rebased #776 |
FP and X registers are currenltly limited to MAX_REG which is 16. Enforce limitation while decoding and take advantage of this to optimize execution.
This brings a 6% speed increase on pi_test benchmark (with esp32 idf 4.4)
These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).
SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later