Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#6758: Fix AArch64 FP state at fcache events and attach #6757

Merged
merged 5 commits into from
Apr 5, 2024

Conversation

abhinav92003
Copy link
Contributor

@abhinav92003 abhinav92003 commented Apr 5, 2024

Fixes the slot used to save and restore FP regs at fcache enter and return events. PR #6725 adjusted the slots used during signal handling in core/unix/signal_linux_aarch64.c but did not adjust the same in fcache enter/return and attach events. Prior to that PR, the FP regs were simply stored in a contiguous manner in signal handling code and fcache enter/return routines (instead of in their respective dr_simd_t struct member), which was a bit confusing.

The mismatch between slot usage in signal handling and fcache enter/return code caused failures in the signalNNN1 tests on some systems. Verified that those tests pass with this fix.

Also fixes the same issue in save_priv_mcontext_helper which is used in the dr_app_start API. Unit tests for this scenario will be added as part of #6759.

Issue: #5036, #6755, #5365
Fixes #6758

Fixes the slot used to save and restore FP regs at fcache enter and
return events. PR #6725 adjusted the slots used during signal handling
in core/unix/signal_linux_aarch64.c but did not adjust the same in
fcache enter/exit and attach events. Prior to that the FP regs were
simply stored in a continuous manner in signal handling code, and
fcache enter/exit routines.

The mismatch between slot usage caused failueres in the signalNNN1
tests on some systems.
@abhinav92003 abhinav92003 marked this pull request as draft April 5, 2024 00:24
@abhinav92003 abhinav92003 changed the title Fix AArch64 fp reg state save/restore i#5036: Fix AArch64 fp reg state save/restore Apr 5, 2024
@abhinav92003
Copy link
Contributor Author

This PR is probably not ready completely yet, but would be good to get some quick feedback since this is blocking downstream pulls.

@abhinav92003 abhinav92003 marked this pull request as ready for review April 5, 2024 00:58
@abhinav92003 abhinav92003 changed the title i#5036: Fix AArch64 fp reg state save/restore i#5036: Fix AArch64 fp reg state save/restore at fcache events and attach Apr 5, 2024
@abhinav92003 abhinav92003 changed the title i#5036: Fix AArch64 fp reg state save/restore at fcache events and attach i#5036: Fix AArch64 fp reg state at fcache events and attach Apr 5, 2024
core/arch/aarch64/aarch64.asm Show resolved Hide resolved
core/arch/aarch64/aarch64.asm Show resolved Hide resolved
core/arch/aarch64/aarch64.asm Show resolved Hide resolved
@abhinav92003 abhinav92003 changed the title i#5036: Fix AArch64 fp reg state at fcache events and attach i#6758: Fix AArch64 fp reg state at fcache events and attach Apr 5, 2024
Copy link
Contributor

@derekbruening derekbruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change makes sense to me but probably best to have @AssadHashmi take a look as I am not an expert in a64 SIMD: as you can tell I'm confused by the svep field being a full dr_simd_t and other things.

core/arch/aarch64/aarch64.asm Show resolved Hide resolved
core/arch/aarch64/emit_utils.c Show resolved Hide resolved
@abhinav92003
Copy link
Contributor Author

code_api|tool.drcov.eintr failed on ubuntu-20-arm64-sve. I see it did so also on #6756 though so very likely unrelated to this PR itself.

x86-32 tool.drcachesim.opcode_mix failure is #6303.

@abhinav92003
Copy link
Contributor Author

#6761 is fixed now. Rerunning tests.

@abhinav92003 abhinav92003 changed the title i#6758: Fix AArch64 fp reg state at fcache events and attach i#6758: Fix AArch64 FP reg state at fcache events and attach Apr 5, 2024
@abhinav92003 abhinav92003 changed the title i#6758: Fix AArch64 FP reg state at fcache events and attach i#6758: Fix AArch64 FP state at fcache events and attach Apr 5, 2024
@abhinav92003 abhinav92003 merged commit a0b86cc into master Apr 5, 2024
16 checks passed
@abhinav92003 abhinav92003 deleted the i6755-fix-fp-reg-state branch April 5, 2024 16:24
jackgallagher-arm added a commit that referenced this pull request Apr 12, 2024
PR #6757 fixed the way we read/write SVE register slots but
unfortunately it is now broken on systems with 128-bit vector length.

Both SVE vector and predicate registers use dr_simd_t slots which is a
64-byte type meant to store up to 512-bit vector registers. SVE
predicate registers are always 1/8 the size of the vector register so
for 512-bit vector length systems we only really need 64 / 8 = 8 bytes
to store predicate registers.

The ldr/str instructions we use to read and write the predicate
register slots have a base+offset memory operand where the offset is
a value in the range [-256, 255] scaled based by predicate register
length. We read and write the registers by setting the base address
to the address of the first slot, and setting the offset to
n * sizeof(dr_simd_t) for each register Pn.
For systems with 128-bit vector length, this means the predicate
registers are 16 / 8 = 2 bytes so the maximum offset we can reach
is 2 * 255 = 510 bytes. This means on 128-bit VL systems we can only
reach the first 8 predicate registers (8 * sizeof(dr_simd_t) = 512).

By changing the predicate register and FFR slots to use a new type
dr_svep_t which is 1/8 the size of dr_simd_t we can fix this bug and
save space.

dr_svep_t is currently 8 bytes to correspond to 64 byte vectors, but
even if we extend DynamoRIO to support the maximum SVE vector length of
2048-bits (256 bytes) dr_svep_t will only need to be increased to
256 / 8 = 32 bytes so the maximum offset (15 * 32 = 480 bytes) will
always be in range.

As this changes the size of the predicate register and FFR slots, this
changes the size of the dr_mcontext_t structure and breaks backwards
compatibility with earlier versions of DynamoRIO so the version number
is increased to 10.90.

Issues: #6760, #5365
Fixes: #6760
jackgallagher-arm added a commit that referenced this pull request Apr 15, 2024
…6774)

PR #6757 fixed the way we read/write SVE register slots but
unfortunately it is now broken on systems with 128-bit vector length.

Both SVE vector and predicate registers use dr_simd_t slots which is a
64-byte type meant to store up to 512-bit vector registers. SVE
predicate registers are always 1/8 the size of the vector register so
for 512-bit vector length systems we only really need 64 / 8 = 8 bytes
to store predicate registers.

The ldr/str instructions we use to read and write the predicate register
slots have a base+offset memory operand where the offset is a value in
the range [-256, 255] scaled based by predicate register length. We read
and write the registers by setting the base address to the address of
the first slot, and setting the offset to n * sizeof(dr_simd_t) for each
register Pn.
For systems with 128-bit vector length, this means the predicate
registers are 16 / 8 = 2 bytes so the maximum offset we can reach is 2 *
255 = 510 bytes. This means on 128-bit VL systems we can only reach the
first 8 predicate registers (8 * sizeof(dr_simd_t) = 512).

By changing the predicate register and FFR slots to use a new type
dr_svep_t which is 1/8 the size of dr_simd_t we can fix this bug and
save space.

dr_svep_t is currently 8 bytes to correspond to 64 byte vectors, but
even if we extend DynamoRIO to support the maximum SVE vector length of
2048-bits (256 bytes) dr_svep_t will only need to be increased to 256 /
8 = 32 bytes so the maximum offset (15 * 32 = 480 bytes) will always be
in range.

As this changes the size of the predicate register and FFR slots, this
changes the size of the dr_mcontext_t structure and breaks backwards
compatibility with earlier versions of DynamoRIO so the version number
is increased to 10.90.

Issues: #6760, #5365
Fixes: #6760
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

linux.signalNNN1 test failing due to non-determinism in floating point operations
4 participants