Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add encoder/decoder support for Arm's Scalable Vector Extension #3044

Open
fhahn opened this issue Jun 9, 2018 · 9 comments
Open

Add encoder/decoder support for Arm's Scalable Vector Extension #3044

fhahn opened this issue Jun 9, 2018 · 9 comments

Comments

@fhahn
Copy link
Contributor

fhahn commented Jun 9, 2018

Armv8-a's SVE is a big extension requiring a few changes to the AArch64 backend. At first, I think we need new register types for scalable vector (Z) and predicate (P) registers as well as support for the new SVE encodings.

https://developer.arm.com/docs/ddi0584/latest/arm-architecture-reference-manual-supplement-the-scalable-vector-extension-sve-for-armv8-a

Xref #2626

fhahn added a commit that referenced this issue Jun 28, 2018
This patch adds a new register type for scalable vector (Z) registers
and encoding/decoding support for the
'SVE Integer Arithmetic - Unpredicated Group' encoding group.

The specification can be found at
https://developer.arm.com/docs/ddi0584/latest/arm-architecture-reference-manual-supplement-the-scalable-vector-extension-sve-for-armv8-a

Issue: #3044

Change-Id: I36b1e55b250aca11a9743e12e517edf500fdba4c
fhahn added a commit that referenced this issue Jun 28, 2018
Issue: #3044

Change-Id: I8af06df778c203498a5c247bdc19b5c2064a1766
fhahn added a commit that referenced this issue Jun 29, 2018
This patch adds a new register type for scalable vector (Z) registers
and encoding/decoding support for the
'SVE Integer Arithmetic - Unpredicated Group' encoding group.

The specification can be found at
https://developer.arm.com/docs/ddi0584/latest/arm-architecture-reference-manual-supplement-the-scalable-vector-extension-sve-for-armv8-a

Issue: #3044
fhahn added a commit that referenced this issue Jun 30, 2018
This patch adds a new register type for SVE predicate registers
(P0-P15) and support for the sve_int_bin_pred_log encoding group.

The macros added follow the SVE assembly syntax, where the shared input
and output register Zdn has to be supplied twice.

Issue #3044

Change-Id: Ib55b5e1160ab41b9c298b7abbef66a5454ed4384
fhahn added a commit that referenced this issue Jul 3, 2018
This patch adds a new register type for SVE predicate registers
(P0-P15) and support for the sve_int_bin_pred_log encoding group.

The macros added follow the SVE assembly syntax, where the shared input
and output register Zdn has to be supplied twice.

Issue #3044
AssadHashmi added a commit that referenced this issue Apr 8, 2022
This patch is an example of the code which will be generated from a
machine readable specification (MRS) to decode and encode AArch64
instructions from v8.1 onwards.

It is provided for review and discussion purposes, in order to resolve
any issues which may arise and to make visible what changes to expect.

This patch does not include the MRS, the parser or generator, just an
example of the target code which we intend to generate, based on the
v8.1 SQRDMLAH instruction.

Issue: #4393, #3044, #2626
AssadHashmi added a commit that referenced this issue Apr 27, 2022
This patch is an example of the code which will be generated from a
machine readable specification (MRS) to decode and encode AArch64
instructions from v8.1 onwards.

It is provided for review and discussion purposes, in order to resolve
any issues which may arise and to make visible what changes to expect.

This patch does not include the MRS, the parser or generator, just an
example of the target code which we intend to generate, based on the
v8.1 SQRDMLAH instruction.

Issue: #4393, #3044, #2626
@AssadHashmi
Copy link
Contributor

Currently we can handle 10 SVE opcodes (out of 314) in the codec. Now that we are preparing to implement the rest of SVE, the notion of a variable vector register length in DynamoRIO needs to be addressed. Specifically, how will clients and tools like DrMemory and drcachesim work correctly on SVE hardware with different SVE vector lengths?

Currently we have OPSZ_SCALABLE and OPSZ_SCALABLE_PRED for SVE vector and predicate registers returned by reg_get_size(). This needs to be converted to an OPSZ_ representing the SVE vector and predicate register size on the currently executing AArch64 implementation, i.e. one of:

OPSZ_128 OPSZ_256 OPSZ_384 OPSZ_512 OPSZ_640 OPSZ_768 OPSZ_896 OPSZ_1024 OPSZ_1152 OPSZ_1280 OPSZ_1408 OPSZ_1536 OPSZ_1664 OPSZ_1792 OPSZ_1920 OPSZ_2048

One approach would be for DynamoRIO to execute the RDVL instruction (https://developer.arm.com/documentation/ddi0602/2022-06/SVE-Instructions/RDVL--Read-multiple-of-vector-register-size-to-scalar-register-) on startup so that when reg_get_size() is called for SVE vector or predicate registers, the appropriate OPSZ_ is returned.

@derekbruening is this enough to cover all of DynamoRIO's clients/tools requirements? Is there another approach?

@derekbruening
Copy link
Contributor

Having the actual register size in the operand in the IR at runtime does seem necessary but maybe not sufficient. Tools like DrMemory which are shadowing registers may want to know the maximum register sizes at initialization time to optimize their shadow allocations. Maybe an API routine like dr_mcontext_zmm_fields_valid() where a client can ask which parts of the mcontext are live for the current hardware, with some way to refer to the different simd vector sizes?

What about dr_simd_t which it looks like is currently just 128 bits? Does it need to be increased to handle 2048 bits -- err, is your list from OPSZ_128 to OPSZ_2048 supposed to be /8 to get to bytes?

@AssadHashmi
Copy link
Contributor

Thanks for the steer @derekbruening. Some more thoughts and questions...

What about dr_simd_t which it looks like is currently just 128 bits? Does it need to be increased to handle 2048 bits...

Yes. The low 128 bits of each SVE Z register overlap the corresponding Advanced SIMD (a.k.a NEON) registers.
Can we set the size of dr_simd_t to the SVE maximum, 2048 bits, even though fewer bytes will be accessed most of the time? If so, this means we'll have to replace occurrences of sizeof(dr_simd_t) with a runtime get size function for AArch64.

Maybe an API routine like dr_mcontext_zmm_fields_valid() where a client can ask which parts of the mcontext are live for the current hardware,

Is this a case of creating a new dr_mcontext_sve_fields_valid() function which behaves like dr_mcontext_zmm_fields_valid() ?

with some way to refer to the different simd vector sizes?

Does the size have to be in the mcontext as well, for direct access from the context? Is get_reg_size() at translation time insufficient?

err, is your list from OPSZ_128 to OPSZ_2048 supposed to be /8 to get to bytes?

Yes, twas just a quick brain dump of bit lengths!

Other thoughts:

We don't need the SVE equivalent of e.g.

define ZMM_ENABLED() (proc_avx512_enabled())

because all SVE h/w will have full SVE support in the OS.

In terms of testing, are the existing dynamorio and drmemory unit tests enough to guard against regression for such a size change? As part of the first patch I'll update tests/api/opnd-a64.c and tests/client-interface/reg_size_test.dll.c for real SVE h/w.

@derekbruening
Copy link
Contributor

What about dr_simd_t which it looks like is currently just 128 bits? Does it need to be increased to handle 2048 bits...

Yes. The low 128 bits of each SVE Z register overlap the corresponding Advanced SIMD (a.k.a NEON) registers. Can we set the size of dr_simd_t to the SVE maximum, 2048 bits, even though fewer bytes will be accessed most of the time?

One concern is that DR puts the mcontext on the stack in a number of places, and has a small stack size. 2048 bits == 256 bytes * 32 registers is 8K which is a lot on DR's small stacks. We've thought about heap-allocating in the past but use in signal handling complicates things. For comparison the x86 512-bit SIMD occupies 2K. I think we may have to move to heap allocation and special heap usage for signals if we add 8K.

If so, this means we'll have to replace occurrences of sizeof(dr_simd_t) with a runtime get size function for AArch64.

I think most copy routines already have logic to copy only certain SIMD fields so this should be doable. E.g., the x86 dr_mcontext_to_priv_mcontext has logic to copy just the ymm parts of the zmm fields.

Maybe an API routine like dr_mcontext_zmm_fields_valid() where a client can ask which parts of the mcontext are live for the current hardware,

Is this a case of creating a new dr_mcontext_sve_fields_valid() function which behaves like dr_mcontext_zmm_fields_valid() ?

But also returning the size that's valid?

with some way to refer to the different simd vector sizes?

Does the size have to be in the mcontext as well, for direct access from the context? Is get_reg_size() at translation time insufficient?

A separate size query is probably ok.

In terms of testing, are the existing dynamorio and drmemory unit tests enough to guard against regression for such a size change? As part of the first patch I'll update tests/api/opnd-a64.c and tests/client-interface/reg_size_test.dll.c for real SVE h/w.

Users must set the total struct size in the size field, so DR code would use that to handle both the old 128-bit and new larger sizes. One-off tests against binary clients built against the old would be one way to add confidence; a true regression test would probably require a duplicated struct def in the test or sthg.

@AssadHashmi
Copy link
Contributor

What about dr_simd_t which it looks like is currently just 128 bits? Does it need to be increased to handle 2048 bits...

Yes. The low 128 bits of each SVE Z register overlap the corresponding Advanced SIMD (a.k.a NEON) registers. Can we set the size of dr_simd_t to the SVE maximum, 2048 bits, even though fewer bytes will be accessed most of the time?

One concern is that DR puts the mcontext on the stack in a number of places, and has a small stack size. 2048 bits == 256 bytes * 32 registers is 8K which is a lot on DR's small stacks. We've thought about heap-allocating in the past but use in signal handling complicates things. For comparison the x86 512-bit SIMD occupies 2K. I think we may have to move to heap allocation and special heap usage for signals if we add 8K.

In addition to the 32 vector registers (Z0-Z31), there are 16 predicate registers (P0-P15) and a first-fault register (FFR).

Current SVE hardware supports 128, 256 and 512 bit vectors. I suggest that we only implement for 512 bits max to start with, avoiding increasing stack size now. This will require just over 3K for Z, P and FFR registers. I think it's more efficient to get SVE patches rolling into the trunk now for 512 bits and address the larger vector and stack sizes as a separate issue later. Is that ok?

Maybe an API routine like dr_mcontext_zmm_fields_valid() where a client can ask which parts of the mcontext are live for the current hardware,

Is this a case of creating a new dr_mcontext_sve_fields_valid() function which behaves like dr_mcontext_zmm_fields_valid() ?

But also returning the size that's valid?

Yes.

@derekbruening
Copy link
Contributor

What about dr_simd_t which it looks like is currently just 128 bits? Does it need to be increased to handle 2048 bits...

Yes. The low 128 bits of each SVE Z register overlap the corresponding Advanced SIMD (a.k.a NEON) registers. Can we set the size of dr_simd_t to the SVE maximum, 2048 bits, even though fewer bytes will be accessed most of the time?

One concern is that DR puts the mcontext on the stack in a number of places, and has a small stack size. 2048 bits == 256 bytes * 32 registers is 8K which is a lot on DR's small stacks. We've thought about heap-allocating in the past but use in signal handling complicates things. For comparison the x86 512-bit SIMD occupies 2K. I think we may have to move to heap allocation and special heap usage for signals if we add 8K.

In addition to the 32 vector registers (Z0-Z31), there are 16 predicate registers (P0-P15) and a first-fault register (FFR).

Current SVE hardware supports 128, 256 and 512 bit vectors. I suggest that we only implement for 512 bits max to start with, avoiding increasing stack size now. This will require just over 3K for Z, P and FFR registers. I think it's more efficient to get SVE patches rolling into the trunk now for 512 bits and address the larger vector and stack sizes as a separate issue later. Is that ok?

Yes, this sounds like a good plan to me.

joshua-warburton added a commit that referenced this issue Oct 10, 2022
This patch creates a new file for SVE disassembly tests
and another ir_aarch64 test file. Also included is a small
bug fix for the disassembly test sorter that could remove
a test if the file did not end in a new line.

issues: #3044

Change-Id: I8b2ded8cd8d48d160132e96d712d502d30cfd05f
joshua-warburton added a commit that referenced this issue Oct 10, 2022
This patch creates a new file for SVE disassembly tests
and another ir_aarch64 test file. Also included is a small
bug fix for the disassembly test sorter that could remove
a test if the file did not end in a new line.

issues: #3044
joshua-warburton added a commit that referenced this issue Oct 11, 2022
Previously in the IR we have represented vectors by a
plain H, S, D or Q register combined with a faux-imm
added after the last vector to give a hint to its size.

This patch uses the .size field of the opnd struct to
include the element size, similar to how partial registers
are used for x86. Element Vector registers are differentiated
from this by setting the DR_OPND_IS_VECTOR (0x40) bit on the
.flag field. This bit is also set by the DR_OPND_IS_EXTEND
flag for imms and was chosen as to not extend the size of the
flag field while remaining unambiguous.

The new Element Vector operand is printed like z0.b via a check
in disassemble_shared.c to set a suffix. Also added are the
utility functions opnd_is_element_vector_reg,
opnd_create_reg_element_vector and opnd_get_vector_element_size/

issue: #3044

Change-Id: I0cdb78cc1a13c3db7b6c742fb1d5d5c0e54216ff
joshua-warburton added a commit that referenced this issue Oct 17, 2022
Previously in the IR we have represented vectors by a
plain H, S, D or Q register combined with a faux-imm
added after the last vector to give a hint to its size.

This patch uses the .size field of the opnd struct to
include the element size, similar to how partial registers
are used for x86. Element Vector registers are differentiated
from this by setting the DR_OPND_IS_VECTOR (0x40) bit on the
.flag field. This bit is also set by the DR_OPND_IS_EXTEND
flag for imms and was chosen as to not extend the size of the
flag field while remaining unambiguous.

The new Element Vector operand is printed like z0.b via a check
in disassemble_shared.c to set a suffix. Also added are the
utility functions opnd_is_element_vector_reg,
opnd_create_reg_element_vector and opnd_get_vector_element_size/

issue: #3044

Change-Id: I0cdb78cc1a13c3db7b6c742fb1d5d5c0e54216ff
AssadHashmi added a commit that referenced this issue Jul 27, 2023
…6230)

For the current decode/encode functions of:

LDR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}]
LDR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}]
STR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}]
STR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}]
PRFB <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]
PRFH <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]
PRFW <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]
PRFD <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

Vector indexing is used in the memory operand at the IR level. However
the IR must always refer to the address in terms of the base register
value plus a byte offset displacement. This patch changes the
decode/encode functions for these instructions to expect byte offsets
at the IR level, converting to vector length offsets within the codec.

Issues #3044, #5365
ivankyluk pushed a commit to ivankyluk/dynamorio that referenced this issue Jul 28, 2023
This patch adds the appropriate macros, tests and codec entries to
decode and encode the following instructions:
```
LDG     <Xt>, [<Xn|SP>, #<simm>]
ST2G    <Xt>, [<Xn|SP>], #<simm>
ST2G    <Xt>, [<Xn|SP>, #<simm>]!
ST2G    <Xt>, [<Xn|SP>, #<simm>]
STG     <Xt>, [<Xn|SP>], #<simm>
STG     <Xt>, [<Xn|SP>, #<simm>]!
STG     <Xt>, [<Xn|SP>, #<simm>]
STZ2G   <Xt>, [<Xn|SP>], #<simm>
STZ2G   <Xt>, [<Xn|SP>, #<simm>]!
STZ2G   <Xt>, [<Xn|SP>, #<simm>]
STZG    <Xt>, [<Xn|SP>], #<simm>
STZG    <Xt>, [<Xn|SP>, #<simm>]!
STZG    <Xt>, [<Xn|SP>, #<simm>]
STGP    <Xt>, <Xt2>, [<Xn|SP>], #<simm>
STGP    <Xt>, <Xt2>, [<Xn|SP>, #<simm>]!
STGP    <Xt>, <Xt2>, [<Xn|SP>, #<simm>]
```
Issue DynamoRIO#3044

Co-authored-by: Joshua Warburton <[email protected]>
ivankyluk pushed a commit to ivankyluk/dynamorio that referenced this issue Jul 28, 2023
…oRIO#6216)

This patch adds the appropriate macros, tests and codec entries to
encode the following variants:
```
GMI     <Xd>, <Xn|SP>, <Xm>
IRG     <Xd|SP>, <Xn|SP>{, <Xm>}
SUBP    <Xd>, <Xn|SP>, <Xm|SP>
SUBPS   <Xd>, <Xn|SP>, <Xm|SP>
ADDG    <Xd|SP>, <Xn|SP>, #<imm1>, #<imm2>
SUBG    <Xd|SP>, <Xn|SP>, #<imm1>, #<imm2>
DC GVA, <Xt>
DC GZVA, <Xt>
```
Issue DynamoRIO#3044
ivankyluk pushed a commit to ivankyluk/dynamorio that referenced this issue Jul 28, 2023
…ynamoRIO#6230)

For the current decode/encode functions of:

LDR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}]
LDR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}]
STR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}]
STR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}]
PRFB <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]
PRFH <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]
PRFW <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]
PRFD <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

Vector indexing is used in the memory operand at the IR level. However
the IR must always refer to the address in terms of the base register
value plus a byte offset displacement. This patch changes the
decode/encode functions for these instructions to expect byte offsets
at the IR level, converting to vector length offsets within the codec.

Issues DynamoRIO#3044, DynamoRIO#5365
AssadHashmi added a commit that referenced this issue Aug 14, 2023
This patch adds Arm AArch64 Scalable Vector Extension (SVE) support to
the core including related changes to the codec, IR and relevant
clients.

SVE and SVE2 are major extensions to Arm's 64 bit architecture.
Developers and users should reference the relevant documentation at
developer.arm.com, (currently
https://developer.arm.com/Architectures/Scalable%20Vector%20Extensions).

The architecture allows hardware implementations to support vector
lengths from 128 to 2048 bits. This patch supports up to 512 bits due
to DynamoRIO's stack size limitation. There is currently no stock SVE
hardware with vector lengths greater than 512 bits. The vector length
is determined by get_processor_specific_info() at runtime on startup
and is available by calling proc_get_vector_length(). For Z registers,
reg_get_size() will return the vector size implemented by the hardware
rather than OPSZ_SCALABLE.

There will be follow up patches for:
- SVE scatter/gather emulation
- Full SVE signal context support
- Complete SVE support in sample clients and drcachesim tracer.

Issues: #5365, #3044

---------

Co-authored-by: Cam Mannett <[email protected]>
derekbruening pushed a commit that referenced this issue Aug 15, 2023
This patch adds Arm AArch64 Scalable Vector Extension (SVE) support to
the core including related changes to the codec, IR and relevant
clients.

SVE and SVE2 are major extensions to Arm's 64 bit architecture.
Developers and users should reference the relevant documentation at
developer.arm.com, (currently
https://developer.arm.com/Architectures/Scalable%20Vector%20Extensions).

The architecture allows hardware implementations to support vector
lengths from 128 to 2048 bits. This patch supports up to 512 bits due
to DynamoRIO's stack size limitation. There is currently no stock SVE
hardware with vector lengths greater than 512 bits. The vector length
is determined by get_processor_specific_info() at runtime on startup
and is available by calling proc_get_vector_length(). For Z registers,
reg_get_size() will return the vector size implemented by the hardware
rather than OPSZ_SCALABLE.

There will be follow up patches for:
- SVE scatter/gather emulation
- Full SVE signal context support
- Complete SVE support in sample clients and drcachesim tracer.

Issues: #5365, #3044

---------

Co-authored-by: Cam Mannett <[email protected]>
jackgallagher-arm added a commit that referenced this issue Oct 23, 2023
…ffsets

All SVE scalar+immediate LD[1234]/ST[1234] have a signed 4-bit
immediate value that encodes a vector index offset from the base
register. This value was being used directly in the IR for instructions,
however base+disp memory operands should always use a byte displacement.

This changes the codec to use byte displacements in the IR and updates
the codec unit tests accordingly.

The following instructions are updated:

LD1B    { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1B    { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1B    { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1B    { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1D    { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1H    { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1H    { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1H    { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1SB   { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1SB   { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1SB   { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1SH   { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1SH   { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1SW   { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1W    { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1W    { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD2B    { <Zt1>.B, <Zt2>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD2D    { <Zt1>.D, <Zt2>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD2H    { <Zt1>.H, <Zt2>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD2W    { <Zt1>.S, <Zt2>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD3B    { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD3D    { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD3H    { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD3W    { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD4B    { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD4D    { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD4H    { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD4W    { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1B  { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1B  { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1B  { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1B  { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1D  { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1H  { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1H  { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1H  { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1W  { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1W  { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNT1B  { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNT1D  { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNT1H  { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNT1W  { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
ST1B    { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST1D    { <Zt>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST1H    { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST1W    { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST2B    { <Zt1>.B, <Zt2>.B }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST2D    { <Zt1>.D, <Zt2>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST2H    { <Zt1>.H, <Zt2>.H }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST2W    { <Zt1>.S, <Zt2>.S }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST3B    { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST3D    { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST3H    { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST3W    { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST4B    { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST4D    { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST4H    { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST4W    { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
STNT1B  { <Zt>.B }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
STNT1D  { <Zt>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
STNT1H  { <Zt>.H }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
STNT1W  { <Zt>.S }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]

Issue: #3044
jackgallagher-arm added a commit that referenced this issue Oct 25, 2023
All SVE scalar+immediate LD[1234]/ST[1234] have a signed 4-bit immediate
value that encodes a vector index offset from the base register. This
value was being used directly in the IR for instructions, however
base+disp memory operands should always use a byte displacement.

This changes the codec to use byte displacements in the IR and updates
the codec unit tests accordingly.

The following instructions are updated:

LD1B    { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1B    { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1B    { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1B    { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1D    { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1H    { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1H    { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1H    { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1SB   { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1SB   { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1SB   { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1SH   { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1SH   { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1SW   { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1W    { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD1W    { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD2B    { <Zt1>.B, <Zt2>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD2D    { <Zt1>.D, <Zt2>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD2H    { <Zt1>.H, <Zt2>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD2W    { <Zt1>.S, <Zt2>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD3B    { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD3D    { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD3H    { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD3W    { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD4B    { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD4D    { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD4H    { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LD4W    { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1B  { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1B  { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1B  { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1B  { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1D  { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1H  { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1H  { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1H  { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1W  { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNF1W  { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNT1B  { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNT1D  { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNT1H  { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
LDNT1W  { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}]
ST1B    { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST1D    { <Zt>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST1H    { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST1W    { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST2B    { <Zt1>.B, <Zt2>.B }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST2D    { <Zt1>.D, <Zt2>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST2H    { <Zt1>.H, <Zt2>.H }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST2W    { <Zt1>.S, <Zt2>.S }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST3B    { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST3D    { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST3H    { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST3W    { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST4B    { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST4D    { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST4H    { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
ST4W    { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
STNT1B  { <Zt>.B }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
STNT1D  { <Zt>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
STNT1H  { <Zt>.H }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]
STNT1W  { <Zt>.S }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}]

Issue: #3044
joshua-warburton added a commit that referenced this issue Oct 27, 2023
This patch makes sure that the size and element size for
predicate and vector registers are always initialised
which will avoid potential issues when comparing registers

issue: #3044
Change-Id: I28195e22c861ffb8d1d06c6672edb1c170cd291c
joshua-warburton added a commit that referenced this issue Oct 30, 2023
This patch makes sure that the size and element size for predicate and
vector registers are always initialised which will avoid potential
issues when comparing registers

issue: #3044
AssadHashmi added a commit that referenced this issue Nov 9, 2023
drcachesim's tracer.cpp, sample clients memtrace_simple.c and
memval_simple.c have checks to avoid handling SVE scatter/gather
memory instructions, i.e. use of Z registers in memory address
operands. Now that a significant number of scatter/gather instructions
have been implemented, these checks can be removed.

Issues: #5036, #5365, #3044
AssadHashmi added a commit that referenced this issue Nov 15, 2023
…#6431)

drcachesim's tracer.cpp, sample clients memtrace_simple.c and
memval_simple.c have checks to avoid handling SVE scatter/gather memory
instructions, i.e. use of Z registers in memory address operands. Now
that a significant number of scatter/gather instructions have been
implemented, these checks can be removed.

Issues: #5036, #5365, #3044
brettcoon pushed a commit that referenced this issue Nov 16, 2023
…#6431)

drcachesim's tracer.cpp, sample clients memtrace_simple.c and
memval_simple.c have checks to avoid handling SVE scatter/gather memory
instructions, i.e. use of Z registers in memory address operands. Now
that a significant number of scatter/gather instructions have been
implemented, these checks can be removed.

Issues: #5036, #5365, #3044
joshua-warburton added a commit that referenced this issue Nov 21, 2023
This patch adds the appropriate macros, tests and codec entries
to encode the following variants:
```
    LDNT1B  { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}]
    LDNT1D  { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}]
    LDNT1H  { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}]
    LDNT1W  { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}]
    STNT1B  { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}]
    STNT1D  { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}]
    STNT1H  { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}]
    STNT1W  { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}]
    LDNT1B  { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}]
    LDNT1H  { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}]
    LDNT1W  { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}]
    STNT1B  { <Zt>.S }, <Pg>, [<Zn>.S{, <Xm>}]
    STNT1H  { <Zt>.S }, <Pg>, [<Zn>.S{, <Xm>}]
    STNT1W  { <Zt>.S }, <Pg>, [<Zn>.S{, <Xm>}]
```
issue: #3044

Change-Id: I95dd710f95b797e8e53ae69dfcadd430a04abc47
AssadHashmi added a commit that referenced this issue Nov 22, 2023
)

This patch adds the appropriate macros, tests and codec entries to
encode the following variants:

    LDNT1B  { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}]
    LDNT1D  { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}]
    LDNT1H  { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}]
    LDNT1W  { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}]
    STNT1B  { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}]
    STNT1D  { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}]
    STNT1H  { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}]
    STNT1W  { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}]
    LDNT1B  { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}]
    LDNT1H  { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}]
    LDNT1W  { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}]
    STNT1B  { <Zt>.S }, <Pg>, [<Zn>.S{, <Xm>}]
    STNT1H  { <Zt>.S }, <Pg>, [<Zn>.S{, <Xm>}]
    STNT1W  { <Zt>.S }, <Pg>, [<Zn>.S{, <Xm>}]

Issue: #3044

---------

Co-authored-by: Assad Hashmi <[email protected]>
philramsey-arm added a commit that referenced this issue Dec 8, 2023
This patch adds the appropriate macros, tests and codec entries to
decode and encode the following instructions:
```MUL    <Zd>.<T>, <Zn>.<T>, <Zm>.<T>```
```MUL    <Zd>.D, <Zn>.D, <Zm>.D[<imm>]```
```MUL    <Zd>.H, <Zn>.H, <Zm>.H[<imm>]```
```MUL    <Zd>.S, <Zn>.S, <Zm>.S[<imm>]```

Issue: #3044
AssadHashmi pushed a commit that referenced this issue Dec 12, 2023
This patch adds the appropriate macros, tests and codec entries to
decode and encode the following instructions:
MUL    <Zd>.<T>, <Zn>.<T>, <Zm>.<T>
MUL    <Zd>.D, <Zn>.D, <Zm>.D[<imm>]
MUL    <Zd>.H, <Zn>.H, <Zm>.H[<imm>]
MUL    <Zd>.S, <Zn>.S, <Zm>.S[<imm>]

Issue: #3044
philramsey-arm added a commit that referenced this issue Dec 19, 2023
This patch adds the appropriate macros, tests and codec entries to
decode and encode the following instruction:
```SPLICE    <Zd>.<T>, <Pv>, { <Zn1>.<T>, <Zn2>.<T> }```

Issue: #3044
philramsey-arm added a commit that referenced this issue Dec 19, 2023
This patch adds the appropriate macros, tests and codec entries to
decode and encode the following instruction:
```SPLICE    <Zd>.<T>, <Pv>, { <Zn1>.<T>, <Zn2>.<T> }```

Issue: #3044
AssadHashmi pushed a commit that referenced this issue Dec 19, 2023
This patch adds the appropriate macros, tests and codec entries to
decode and encode the following instruction:
SPLICE    <Zd>.<T>, <Pv>, { <Zn1>.<T>, <Zn2>.<T> }

Issue: #3044
philramsey-arm added a commit that referenced this issue Dec 20, 2023
This patch adds the appropriate macros, tests and codec entries to
decode and encode the following instruction:
```TBL    <Zd>.<Ts>, { <Zn1>.<Ts>, <Zn2>.<Ts> }, <Zm>.<Ts>```

Issue: #3044
AssadHashmi pushed a commit that referenced this issue Dec 20, 2023
This patch adds the appropriate macros, tests and codec entries to
decode and encode the following instruction:
TBL    <Zd>.<Ts>, { <Zn1>.<Ts>, <Zn2>.<Ts> }, <Zm>.<Ts>

Issue: #3044
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants