Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize decoding of registers #805

Merged
merged 1 commit into from
Sep 17, 2023

Conversation

pguyot
Copy link
Collaborator

@pguyot pguyot commented Sep 12, 2023

FP and X registers are currenltly limited to MAX_REG which is 16. Enforce limitation while decoding and take advantage of this to optimize execution.

This brings a 6% speed increase on pi_test benchmark (with esp32 idf 4.4)

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

FP and X registers are currenltly limited to MAX_REG which is 16.
Enforce limitation while decoding and take advantage of this to optimize
execution.

This brings a 6% speed increase on pi_test benchmark

Signed-off-by: Paul Guyot <[email protected]>
Copy link
Collaborator

@bettio bettio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, however in my plans I'd like to finally add support for x_regs > 16 (using some kind of sparse structure). Still declaring fp_regs > 16 as unsupported would be fine in regular usage.

@pguyot
Copy link
Collaborator Author

pguyot commented Sep 15, 2023

Looks good, however in my plans I'd like to finally add support for x_regs > 16 (using some kind of sparse structure). Still declaring fp_regs > 16 as unsupported would be fine in regular usage.

I agree. The main point of this PR is to establish the benefit of #776 (and actually they merge conflict each other).
On top of this PR, #776 is faster with pi_test benchmark with both ESP-IDF 4.4 and ESP-IDF 5.1.

If we bump MAX_REG to 1024, this optimization goes away as it's conditioned at compile time.

Also I think supporting 1024 registers could be done differently. With #795 , we will no longer need to store 1024 registers on each context as the number of used registers is capped with live.

@bettio
Copy link
Collaborator

bettio commented Sep 15, 2023

What I did notice in some previous disassembly is that the compiler generated in some occasions something code using x[0], x[1], x[2], ... big hole ... x[1023] or something like that, so we need some specific code path with a sparse data structure when dealing with >= 16 registers.

@pguyot
Copy link
Collaborator Author

pguyot commented Sep 15, 2023

What I did notice in some previous disassembly is that the compiler generated in some occasions something code using x[0], x[1], x[2], ... big hole ... x[1023] or something like that, so we need some specific code path with a sparse data structure when dealing with >= 16 registers.

Indeed. This is captured in #698

Or maybe not. It should be captured in #698 rather.

Copy link
Collaborator

@bettio bettio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge it since it improves what we have now, and let's iterate over this later.

@bettio bettio merged commit 11d6509 into atomvm:master Sep 17, 2023
80 checks passed
@pguyot pguyot deleted the w37/max_reg-optimization branch September 17, 2023 21:53
@pguyot
Copy link
Collaborator Author

pguyot commented Sep 17, 2023

Let's merge it since it improves what we have now, and let's iterate over this later.

OK. I rebased #776

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants