Optimize decoding of registers #805

pguyot · 2023-09-12T20:37:11Z

FP and X registers are currenltly limited to MAX_REG which is 16. Enforce limitation while decoding and take advantage of this to optimize execution.

This brings a 6% speed increase on pi_test benchmark (with esp32 idf 4.4)

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

FP and X registers are currenltly limited to MAX_REG which is 16. Enforce limitation while decoding and take advantage of this to optimize execution. This brings a 6% speed increase on pi_test benchmark Signed-off-by: Paul Guyot <[email protected]>

bettio

Looks good, however in my plans I'd like to finally add support for x_regs > 16 (using some kind of sparse structure). Still declaring fp_regs > 16 as unsupported would be fine in regular usage.

pguyot · 2023-09-15T05:39:19Z

Looks good, however in my plans I'd like to finally add support for x_regs > 16 (using some kind of sparse structure). Still declaring fp_regs > 16 as unsupported would be fine in regular usage.

I agree. The main point of this PR is to establish the benefit of #776 (and actually they merge conflict each other).
On top of this PR, #776 is faster with pi_test benchmark with both ESP-IDF 4.4 and ESP-IDF 5.1.

If we bump MAX_REG to 1024, this optimization goes away as it's conditioned at compile time.

Also I think supporting 1024 registers could be done differently. With #795 , we will no longer need to store 1024 registers on each context as the number of used registers is capped with live.

bettio · 2023-09-15T19:17:22Z

What I did notice in some previous disassembly is that the compiler generated in some occasions something code using x[0], x[1], x[2], ... big hole ... x[1023] or something like that, so we need some specific code path with a sparse data structure when dealing with >= 16 registers.

pguyot · 2023-09-15T19:28:00Z

What I did notice in some previous disassembly is that the compiler generated in some occasions something code using x[0], x[1], x[2], ... big hole ... x[1023] or something like that, so we need some specific code path with a sparse data structure when dealing with >= 16 registers.

~~Indeed. This is captured in #698~~

Or maybe not. It should be captured in #698 rather.

bettio

Let's merge it since it improves what we have now, and let's iterate over this later.

pguyot · 2023-09-17T22:00:39Z

Let's merge it since it improves what we have now, and let's iterate over this later.

OK. I rebased #776

Optimize decoding of registers

55cb5f4

FP and X registers are currenltly limited to MAX_REG which is 16. Enforce limitation while decoding and take advantage of this to optimize execution. This brings a 6% speed increase on pi_test benchmark Signed-off-by: Paul Guyot <[email protected]>

pguyot mentioned this pull request Sep 12, 2023

Use a pointer for PC instead of an instruction counter #776

Merged

bettio reviewed Sep 14, 2023

View reviewed changes

bettio approved these changes Sep 17, 2023

View reviewed changes

bettio merged commit 11d6509 into atomvm:master Sep 17, 2023
80 checks passed

pguyot deleted the w37/max_reg-optimization branch September 17, 2023 21:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize decoding of registers #805

Optimize decoding of registers #805

pguyot commented Sep 12, 2023

bettio left a comment

pguyot commented Sep 15, 2023

bettio commented Sep 15, 2023

pguyot commented Sep 15, 2023 •

edited

Loading

bettio left a comment

pguyot commented Sep 17, 2023

Optimize decoding of registers #805

Optimize decoding of registers #805

Conversation

pguyot commented Sep 12, 2023

bettio left a comment

Choose a reason for hiding this comment

pguyot commented Sep 15, 2023

bettio commented Sep 15, 2023

pguyot commented Sep 15, 2023 • edited Loading

bettio left a comment

Choose a reason for hiding this comment

pguyot commented Sep 17, 2023

pguyot commented Sep 15, 2023 •

edited

Loading