Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#5365: Add AArch64 SVE support to the core (part 1) #5835

Merged
merged 71 commits into from
Aug 14, 2023

Conversation

AssadHashmi
Copy link
Contributor

@AssadHashmi AssadHashmi commented Jan 24, 2023

This patch adds Arm AArch64 Scalable Vector Extension (SVE) support to
the core including related changes to the codec, IR and relevant clients.

SVE and SVE2 are major extensions to Arm's 64 bit architecture.
Developers and users should reference the relevant documentation at
developer.arm.com, (currently
https://developer.arm.com/Architectures/Scalable%20Vector%20Extensions).

The architecture allows hardware implementations to support vector
lengths from 128 to 2048 bits. This patch supports up to 512 bits due
to DynamoRIO's stack size limitation. There is currently no stock SVE
hardware with vector lengths greater than 512 bits. The vector length
is determined by get_processor_specific_info() at runtime on startup
and is available by calling proc_get_vector_length(). For Z registers,
reg_get_size() will return the vector size implemented by the hardware
rather than OPSZ_SCALABLE.

There will be follow up patches for:

  • SVE scatter/gather emulation
  • Full SVE signal context support
  • Complete SVE support in sample clients and drcachesim tracer.

Issues: #5365, #3044

This patch adds Arm's Scalable Vector Extension vector length support.
The vector length is determined at runtime on startup in
get_processor_specific_info() and available using
proc_get_vector_length().

Cleancall, machine and signal context code have been updated to handle
SVE registers as have API functions like reg_get_size() which will
return the hardware's vector size rather than OPSZ_SCALABLE.

The SVE specification allows for a maximum vector length of 2048 bits.
We currently support 512 bits maximum due to DR's stack size limitation.
There is currently no stock SVE hardware with vector lengths greater
than 512 bits.

There will be follow on patches to add:
- Predicate registers.
- Handling of First Fault Register (FFR).
- Targetted SVE tests.

Issue: #5365, #3044
@AssadHashmi
Copy link
Contributor Author

Hi @derekbruening and @abhinav92003, this PR is partially complete. I would really appreciate help/advice on the gaps and remaining work please? Also, I'd like some confidence that the current changes are not way off the mark and I've not missed anything significant. The patch has been tested on Graviton 3. Testing on A64FX is pending.

Below are the issues I know of. I have put TODO i#5365 comments in places where I think changes need to be made, including for these:

  1. client.signal test fail
    The change in get_clone_record() to handle larger dr_simd_t on the stack works for the basic clone tests but not when cloning and signalling. AFAICT the code_api|client.signal test fails on thread creation. There are two ways of creating threads AFAICT: new_thread_setup() which seems to work with the larger stack and client_thread_run() which doesn't. Possibly something to do with the former using dr_clone_args's stack spec passed in rather than creating it in create_clone_record()? See comments in core/unix/signal.c.

  2. Signal context save/restore
    Is this a case of copying the relevant data structs and macros from system headers and updating save/restore code for SVE registers? See comments in core/unix/include/sigcontext.h.

@derekbruening
Copy link
Contributor

  1. Signal context save/restore
    Is this a case of copying the relevant data structs and macros from system headers and updating save/restore code for SVE registers? See comments in core/unix/include/sigcontext.h.

Yes, we need our frame structs to match the kernel's.

core/unix/signal.c Outdated Show resolved Hide resolved
core/arch/aarch64/proc.c Show resolved Hide resolved
core/arch/aarch64/proc.c Show resolved Hide resolved
core/arch/aarchxx/mangle.c Outdated Show resolved Hide resolved
core/arch/aarchxx/mangle.c Outdated Show resolved Hide resolved
core/arch/aarchxx/mangle.c Outdated Show resolved Hide resolved
core/ir/aarch64/codec.c Outdated Show resolved Hide resolved
core/ir/aarch64/codec.py Outdated Show resolved Hide resolved
core/ir/opnd_shared.c Show resolved Hide resolved
core/lib/globals_api.h Outdated Show resolved Hide resolved
ext/drstatecmp/drstatecmp.c Outdated Show resolved Hide resolved
- Change int register types to enum reg_type.
- Make vector length getter function SVE and SIMD neutral, i.e.
  proc_get_sve_vector_length_bytes() -> proc_get_vector_length_bytes().
- Functions name changes.
- More informative comments.
api/docs/release.dox Outdated Show resolved Hide resolved
api/docs/release.dox Outdated Show resolved Hide resolved
core/arch/arch.c Show resolved Hide resolved
…release.dox.

Updated version numbering in:
- Top-level CMakeLists.txt
- .github/workflows/ci-docs.yml
- .github/workflows/ci-package.yml
core/arch/arch.c Show resolved Hide resolved
core/arch/arch.c Outdated Show resolved Hide resolved
Removed Doxygen link directive, (#dr_simd_t to dr_simd_t) in
api/docs/release.dox due to Doxygen parsing error.
@AssadHashmi AssadHashmi added this pull request to the merge queue Aug 14, 2023
@AssadHashmi AssadHashmi removed this pull request from the merge queue due to a manual request Aug 14, 2023
@AssadHashmi AssadHashmi changed the title i#5365: Add AArch64 SVE vector length support (part 1) i#5365: Add AArch64 SVE support to the core (part 1) Aug 14, 2023
@AssadHashmi AssadHashmi added this pull request to the merge queue Aug 14, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 14, 2023
@AssadHashmi AssadHashmi merged commit f646a63 into master Aug 14, 2023
15 checks passed
@AssadHashmi AssadHashmi deleted the i5365-aarch64-sve-veclen-part1 branch August 14, 2023 12:44
derekbruening pushed a commit that referenced this pull request Aug 15, 2023
This patch adds Arm AArch64 Scalable Vector Extension (SVE) support to
the core including related changes to the codec, IR and relevant
clients.

SVE and SVE2 are major extensions to Arm's 64 bit architecture.
Developers and users should reference the relevant documentation at
developer.arm.com, (currently
https://developer.arm.com/Architectures/Scalable%20Vector%20Extensions).

The architecture allows hardware implementations to support vector
lengths from 128 to 2048 bits. This patch supports up to 512 bits due
to DynamoRIO's stack size limitation. There is currently no stock SVE
hardware with vector lengths greater than 512 bits. The vector length
is determined by get_processor_specific_info() at runtime on startup
and is available by calling proc_get_vector_length(). For Z registers,
reg_get_size() will return the vector size implemented by the hardware
rather than OPSZ_SCALABLE.

There will be follow up patches for:
- SVE scatter/gather emulation
- Full SVE signal context support
- Complete SVE support in sample clients and drcachesim tracer.

Issues: #5365, #3044

---------

Co-authored-by: Cam Mannett <[email protected]>
@derekbruening
Copy link
Contributor

This PR broke DR on Mac M1 where XPACI is no longer encodeable. I filed PR #6276 on this.

AssadHashmi added a commit that referenced this pull request Mar 26, 2024
This patch adds SVE support for signals in the core. It is the follow
on patch from the SVE core work part 1, in PR #5835 (f646a63) and
includes vector address computation for SVE scatter/gather, enabling
first-fault load handling.

Issue: #5365, #5036

Co-authored-by: Jack Gallagher <[email protected]>
AssadHashmi added a commit that referenced this pull request Apr 3, 2024
This patch adds SVE support for signals in the core. It is the follow on
patch from the SVE core work part 1, in PR #5835 (f646a63) and
includes vector address computation for SVE scatter/gather, enabling
first-fault load handling.

Issue: #5365, #5036

Co-authored-by: Jack Gallagher <[email protected]>
jackgallagher-arm added a commit that referenced this pull request Jul 8, 2024
PR #5835 inadvertently broke support for large pages on AArch64 by
introducing code that assumed a 4K page size.

The core options unit_tests were also failing on large page systems
because a lot of options need to be overridden and the test was not
accounting for extra options being set in the options string.

Issues: #1680, #6451
Fixes: #6451
jackgallagher-arm added a commit that referenced this pull request Jul 12, 2024
PR #5835 inadvertently broke support for large pages on AArch64 by
introducing code that assumed a 4K page size.

The core options unit_tests were also failing on large page systems
because a lot of options need to be overridden and the test was not
accounting for extra options being set in the options string.

Issues: #1680, #6451
Fixes: #6451
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants