Add encoder/decoder support for Arm's Scalable Vector Extension #3044

fhahn · 2018-06-09T12:04:44Z

Armv8-a's SVE is a big extension requiring a few changes to the AArch64 backend. At first, I think we need new register types for scalable vector (Z) and predicate (P) registers as well as support for the new SVE encodings.

https://developer.arm.com/docs/ddi0584/latest/arm-architecture-reference-manual-supplement-the-scalable-vector-extension-sve-for-armv8-a

Xref #2626

This patch adds a new register type for scalable vector (Z) registers and encoding/decoding support for the 'SVE Integer Arithmetic - Unpredicated Group' encoding group. The specification can be found at https://developer.arm.com/docs/ddi0584/latest/arm-architecture-reference-manual-supplement-the-scalable-vector-extension-sve-for-armv8-a Issue: #3044 Change-Id: I36b1e55b250aca11a9743e12e517edf500fdba4c

Issue: #3044 Change-Id: I8af06df778c203498a5c247bdc19b5c2064a1766

This patch adds a new register type for scalable vector (Z) registers and encoding/decoding support for the 'SVE Integer Arithmetic - Unpredicated Group' encoding group. The specification can be found at https://developer.arm.com/docs/ddi0584/latest/arm-architecture-reference-manual-supplement-the-scalable-vector-extension-sve-for-armv8-a Issue: #3044

This patch adds a new register type for SVE predicate registers (P0-P15) and support for the sve_int_bin_pred_log encoding group. The macros added follow the SVE assembly syntax, where the shared input and output register Zdn has to be supplied twice. Issue #3044 Change-Id: Ib55b5e1160ab41b9c298b7abbef66a5454ed4384

This patch adds a new register type for SVE predicate registers (P0-P15) and support for the sve_int_bin_pred_log encoding group. The macros added follow the SVE assembly syntax, where the shared input and output register Zdn has to be supplied twice. Issue #3044

This patch is an example of the code which will be generated from a machine readable specification (MRS) to decode and encode AArch64 instructions from v8.1 onwards. It is provided for review and discussion purposes, in order to resolve any issues which may arise and to make visible what changes to expect. This patch does not include the MRS, the parser or generator, just an example of the target code which we intend to generate, based on the v8.1 SQRDMLAH instruction. Issue: #4393, #3044, #2626

AssadHashmi · 2022-08-23T15:01:45Z

Currently we can handle 10 SVE opcodes (out of 314) in the codec. Now that we are preparing to implement the rest of SVE, the notion of a variable vector register length in DynamoRIO needs to be addressed. Specifically, how will clients and tools like DrMemory and drcachesim work correctly on SVE hardware with different SVE vector lengths?

Currently we have OPSZ_SCALABLE and OPSZ_SCALABLE_PRED for SVE vector and predicate registers returned by reg_get_size(). This needs to be converted to an OPSZ_ representing the SVE vector and predicate register size on the currently executing AArch64 implementation, i.e. one of:

OPSZ_128 OPSZ_256 OPSZ_384 OPSZ_512 OPSZ_640 OPSZ_768 OPSZ_896 OPSZ_1024 OPSZ_1152 OPSZ_1280 OPSZ_1408 OPSZ_1536 OPSZ_1664 OPSZ_1792 OPSZ_1920 OPSZ_2048

One approach would be for DynamoRIO to execute the RDVL instruction (https://developer.arm.com/documentation/ddi0602/2022-06/SVE-Instructions/RDVL--Read-multiple-of-vector-register-size-to-scalar-register-) on startup so that when reg_get_size() is called for SVE vector or predicate registers, the appropriate OPSZ_ is returned.

@derekbruening is this enough to cover all of DynamoRIO's clients/tools requirements? Is there another approach?

derekbruening · 2022-08-23T18:00:32Z

Having the actual register size in the operand in the IR at runtime does seem necessary but maybe not sufficient. Tools like DrMemory which are shadowing registers may want to know the maximum register sizes at initialization time to optimize their shadow allocations. Maybe an API routine like dr_mcontext_zmm_fields_valid() where a client can ask which parts of the mcontext are live for the current hardware, with some way to refer to the different simd vector sizes?

What about dr_simd_t which it looks like is currently just 128 bits? Does it need to be increased to handle 2048 bits -- err, is your list from OPSZ_128 to OPSZ_2048 supposed to be /8 to get to bytes?

AssadHashmi · 2022-09-05T15:11:57Z

Thanks for the steer @derekbruening. Some more thoughts and questions...

What about dr_simd_t which it looks like is currently just 128 bits? Does it need to be increased to handle 2048 bits...

Yes. The low 128 bits of each SVE Z register overlap the corresponding Advanced SIMD (a.k.a NEON) registers.
Can we set the size of dr_simd_t to the SVE maximum, 2048 bits, even though fewer bytes will be accessed most of the time? If so, this means we'll have to replace occurrences of sizeof(dr_simd_t) with a runtime get size function for AArch64.

Maybe an API routine like dr_mcontext_zmm_fields_valid() where a client can ask which parts of the mcontext are live for the current hardware,

Is this a case of creating a new dr_mcontext_sve_fields_valid() function which behaves like dr_mcontext_zmm_fields_valid() ?

with some way to refer to the different simd vector sizes?

Does the size have to be in the mcontext as well, for direct access from the context? Is get_reg_size() at translation time insufficient?

err, is your list from OPSZ_128 to OPSZ_2048 supposed to be /8 to get to bytes?

Yes, twas just a quick brain dump of bit lengths!

Other thoughts:

We don't need the SVE equivalent of e.g.

define ZMM_ENABLED() (proc_avx512_enabled())

because all SVE h/w will have full SVE support in the OS.

In terms of testing, are the existing dynamorio and drmemory unit tests enough to guard against regression for such a size change? As part of the first patch I'll update tests/api/opnd-a64.c and tests/client-interface/reg_size_test.dll.c for real SVE h/w.

derekbruening · 2022-09-07T18:19:03Z

What about dr_simd_t which it looks like is currently just 128 bits? Does it need to be increased to handle 2048 bits...

Yes. The low 128 bits of each SVE Z register overlap the corresponding Advanced SIMD (a.k.a NEON) registers. Can we set the size of dr_simd_t to the SVE maximum, 2048 bits, even though fewer bytes will be accessed most of the time?

One concern is that DR puts the mcontext on the stack in a number of places, and has a small stack size. 2048 bits == 256 bytes * 32 registers is 8K which is a lot on DR's small stacks. We've thought about heap-allocating in the past but use in signal handling complicates things. For comparison the x86 512-bit SIMD occupies 2K. I think we may have to move to heap allocation and special heap usage for signals if we add 8K.

If so, this means we'll have to replace occurrences of sizeof(dr_simd_t) with a runtime get size function for AArch64.

I think most copy routines already have logic to copy only certain SIMD fields so this should be doable. E.g., the x86 dr_mcontext_to_priv_mcontext has logic to copy just the ymm parts of the zmm fields.

Maybe an API routine like dr_mcontext_zmm_fields_valid() where a client can ask which parts of the mcontext are live for the current hardware,

Is this a case of creating a new dr_mcontext_sve_fields_valid() function which behaves like dr_mcontext_zmm_fields_valid() ?

But also returning the size that's valid?

with some way to refer to the different simd vector sizes?

Does the size have to be in the mcontext as well, for direct access from the context? Is get_reg_size() at translation time insufficient?

A separate size query is probably ok.

In terms of testing, are the existing dynamorio and drmemory unit tests enough to guard against regression for such a size change? As part of the first patch I'll update tests/api/opnd-a64.c and tests/client-interface/reg_size_test.dll.c for real SVE h/w.

Users must set the total struct size in the size field, so DR code would use that to handle both the old 128-bit and new larger sizes. One-off tests against binary clients built against the old would be one way to add confidence; a true regression test would probably require a duplicated struct def in the test or sthg.

AssadHashmi · 2022-09-08T12:59:53Z

What about dr_simd_t which it looks like is currently just 128 bits? Does it need to be increased to handle 2048 bits...

Yes. The low 128 bits of each SVE Z register overlap the corresponding Advanced SIMD (a.k.a NEON) registers. Can we set the size of dr_simd_t to the SVE maximum, 2048 bits, even though fewer bytes will be accessed most of the time?

One concern is that DR puts the mcontext on the stack in a number of places, and has a small stack size. 2048 bits == 256 bytes * 32 registers is 8K which is a lot on DR's small stacks. We've thought about heap-allocating in the past but use in signal handling complicates things. For comparison the x86 512-bit SIMD occupies 2K. I think we may have to move to heap allocation and special heap usage for signals if we add 8K.

In addition to the 32 vector registers (Z0-Z31), there are 16 predicate registers (P0-P15) and a first-fault register (FFR).

Current SVE hardware supports 128, 256 and 512 bit vectors. I suggest that we only implement for 512 bits max to start with, avoiding increasing stack size now. This will require just over 3K for Z, P and FFR registers. I think it's more efficient to get SVE patches rolling into the trunk now for 512 bits and address the larger vector and stack sizes as a separate issue later. Is that ok?

Maybe an API routine like dr_mcontext_zmm_fields_valid() where a client can ask which parts of the mcontext are live for the current hardware,

Is this a case of creating a new dr_mcontext_sve_fields_valid() function which behaves like dr_mcontext_zmm_fields_valid() ?

But also returning the size that's valid?

Yes.

derekbruening · 2022-09-08T16:41:58Z

What about dr_simd_t which it looks like is currently just 128 bits? Does it need to be increased to handle 2048 bits...

Yes. The low 128 bits of each SVE Z register overlap the corresponding Advanced SIMD (a.k.a NEON) registers. Can we set the size of dr_simd_t to the SVE maximum, 2048 bits, even though fewer bytes will be accessed most of the time?

One concern is that DR puts the mcontext on the stack in a number of places, and has a small stack size. 2048 bits == 256 bytes * 32 registers is 8K which is a lot on DR's small stacks. We've thought about heap-allocating in the past but use in signal handling complicates things. For comparison the x86 512-bit SIMD occupies 2K. I think we may have to move to heap allocation and special heap usage for signals if we add 8K.

In addition to the 32 vector registers (Z0-Z31), there are 16 predicate registers (P0-P15) and a first-fault register (FFR).

Current SVE hardware supports 128, 256 and 512 bit vectors. I suggest that we only implement for 512 bits max to start with, avoiding increasing stack size now. This will require just over 3K for Z, P and FFR registers. I think it's more efficient to get SVE patches rolling into the trunk now for 512 bits and address the larger vector and stack sizes as a separate issue later. Is that ok?

Yes, this sounds like a good plan to me.

This patch creates a new file for SVE disassembly tests and another ir_aarch64 test file. Also included is a small bug fix for the disassembly test sorter that could remove a test if the file did not end in a new line. issues: #3044 Change-Id: I8b2ded8cd8d48d160132e96d712d502d30cfd05f

This patch creates a new file for SVE disassembly tests and another ir_aarch64 test file. Also included is a small bug fix for the disassembly test sorter that could remove a test if the file did not end in a new line. issues: #3044

Previously in the IR we have represented vectors by a plain H, S, D or Q register combined with a faux-imm added after the last vector to give a hint to its size. This patch uses the .size field of the opnd struct to include the element size, similar to how partial registers are used for x86. Element Vector registers are differentiated from this by setting the DR_OPND_IS_VECTOR (0x40) bit on the .flag field. This bit is also set by the DR_OPND_IS_EXTEND flag for imms and was chosen as to not extend the size of the flag field while remaining unambiguous. The new Element Vector operand is printed like z0.b via a check in disassemble_shared.c to set a suffix. Also added are the utility functions opnd_is_element_vector_reg, opnd_create_reg_element_vector and opnd_get_vector_element_size/ issue: #3044 Change-Id: I0cdb78cc1a13c3db7b6c742fb1d5d5c0e54216ff

…6230) For the current decode/encode functions of: LDR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}] LDR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}] STR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}] STR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}] PRFB <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}] PRFH <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}] PRFW <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}] PRFD <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}] Vector indexing is used in the memory operand at the IR level. However the IR must always refer to the address in terms of the base register value plus a byte offset displacement. This patch changes the decode/encode functions for these instructions to expect byte offsets at the IR level, converting to vector length offsets within the codec. Issues #3044, #5365

…oRIO#6216) This patch adds the appropriate macros, tests and codec entries to encode the following variants: ``` GMI <Xd>, <Xn|SP>, <Xm> IRG <Xd|SP>, <Xn|SP>{, <Xm>} SUBP <Xd>, <Xn|SP>, <Xm|SP> SUBPS <Xd>, <Xn|SP>, <Xm|SP> ADDG <Xd|SP>, <Xn|SP>, #<imm1>, #<imm2> SUBG <Xd|SP>, <Xn|SP>, #<imm1>, #<imm2> DC GVA, <Xt> DC GZVA, <Xt> ``` Issue DynamoRIO#3044

…ynamoRIO#6230) For the current decode/encode functions of: LDR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}] LDR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}] STR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}] STR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}] PRFB <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}] PRFH <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}] PRFW <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}] PRFD <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}] Vector indexing is used in the memory operand at the IR level. However the IR must always refer to the address in terms of the base register value plus a byte offset displacement. This patch changes the decode/encode functions for these instructions to expect byte offsets at the IR level, converting to vector length offsets within the codec. Issues DynamoRIO#3044, DynamoRIO#5365

This patch adds Arm AArch64 Scalable Vector Extension (SVE) support to the core including related changes to the codec, IR and relevant clients. SVE and SVE2 are major extensions to Arm's 64 bit architecture. Developers and users should reference the relevant documentation at developer.arm.com, (currently https://developer.arm.com/Architectures/Scalable%20Vector%20Extensions). The architecture allows hardware implementations to support vector lengths from 128 to 2048 bits. This patch supports up to 512 bits due to DynamoRIO's stack size limitation. There is currently no stock SVE hardware with vector lengths greater than 512 bits. The vector length is determined by get_processor_specific_info() at runtime on startup and is available by calling proc_get_vector_length(). For Z registers, reg_get_size() will return the vector size implemented by the hardware rather than OPSZ_SCALABLE. There will be follow up patches for: - SVE scatter/gather emulation - Full SVE signal context support - Complete SVE support in sample clients and drcachesim tracer. Issues: #5365, #3044 --------- Co-authored-by: Cam Mannett <[email protected]>

…ffsets All SVE scalar+immediate LD[1234]/ST[1234] have a signed 4-bit immediate value that encodes a vector index offset from the base register. This value was being used directly in the IR for instructions, however base+disp memory operands should always use a byte displacement. This changes the codec to use byte displacements in the IR and updates the codec unit tests accordingly. The following instructions are updated: LD1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1B { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD2B { <Zt1>.B, <Zt2>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD2D { <Zt1>.D, <Zt2>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD2H { <Zt1>.H, <Zt2>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD2W { <Zt1>.S, <Zt2>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD3B { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD3D { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD3H { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD3W { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1B { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNT1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNT1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNT1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNT1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] ST1B { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST1D { <Zt>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST1H { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST1W { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST2B { <Zt1>.B, <Zt2>.B }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST2D { <Zt1>.D, <Zt2>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST2H { <Zt1>.H, <Zt2>.H }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST2W { <Zt1>.S, <Zt2>.S }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST3B { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST3D { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST3H { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST3W { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] STNT1B { <Zt>.B }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] STNT1D { <Zt>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] STNT1H { <Zt>.H }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] STNT1W { <Zt>.S }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] Issue: #3044

All SVE scalar+immediate LD[1234]/ST[1234] have a signed 4-bit immediate value that encodes a vector index offset from the base register. This value was being used directly in the IR for instructions, however base+disp memory operands should always use a byte displacement. This changes the codec to use byte displacements in the IR and updates the codec unit tests accordingly. The following instructions are updated: LD1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1B { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD2B { <Zt1>.B, <Zt2>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD2D { <Zt1>.D, <Zt2>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD2H { <Zt1>.H, <Zt2>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD2W { <Zt1>.S, <Zt2>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD3B { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD3D { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD3H { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD3W { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1B { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNT1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNT1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNT1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNT1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] ST1B { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST1D { <Zt>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST1H { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST1W { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST2B { <Zt1>.B, <Zt2>.B }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST2D { <Zt1>.D, <Zt2>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST2H { <Zt1>.H, <Zt2>.H }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST2W { <Zt1>.S, <Zt2>.S }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST3B { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST3D { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST3H { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST3W { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] STNT1B { <Zt>.B }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] STNT1D { <Zt>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] STNT1H { <Zt>.H }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] STNT1W { <Zt>.S }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] Issue: #3044

This patch makes sure that the size and element size for predicate and vector registers are always initialised which will avoid potential issues when comparing registers issue: #3044 Change-Id: I28195e22c861ffb8d1d06c6672edb1c170cd291c

This patch makes sure that the size and element size for predicate and vector registers are always initialised which will avoid potential issues when comparing registers issue: #3044

drcachesim's tracer.cpp, sample clients memtrace_simple.c and memval_simple.c have checks to avoid handling SVE scatter/gather memory instructions, i.e. use of Z registers in memory address operands. Now that a significant number of scatter/gather instructions have been implemented, these checks can be removed. Issues: #5036, #5365, #3044

…#6431) drcachesim's tracer.cpp, sample clients memtrace_simple.c and memval_simple.c have checks to avoid handling SVE scatter/gather memory instructions, i.e. use of Z registers in memory address operands. Now that a significant number of scatter/gather instructions have been implemented, these checks can be removed. Issues: #5036, #5365, #3044

This patch adds the appropriate macros, tests and codec entries to encode the following variants: ``` LDNT1B { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}] LDNT1D { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}] LDNT1H { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}] LDNT1W { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}] STNT1B { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}] STNT1D { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}] STNT1H { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}] STNT1W { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}] LDNT1B { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}] LDNT1H { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}] LDNT1W { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}] STNT1B { <Zt>.S }, <Pg>, [<Zn>.S{, <Xm>}] STNT1H { <Zt>.S }, <Pg>, [<Zn>.S{, <Xm>}] STNT1W { <Zt>.S }, <Pg>, [<Zn>.S{, <Xm>}] ``` issue: #3044 Change-Id: I95dd710f95b797e8e53ae69dfcadd430a04abc47

) This patch adds the appropriate macros, tests and codec entries to encode the following variants: LDNT1B { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}] LDNT1D { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}] LDNT1H { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}] LDNT1W { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}] STNT1B { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}] STNT1D { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}] STNT1H { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}] STNT1W { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}] LDNT1B { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}] LDNT1H { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}] LDNT1W { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}] STNT1B { <Zt>.S }, <Pg>, [<Zn>.S{, <Xm>}] STNT1H { <Zt>.S }, <Pg>, [<Zn>.S{, <Xm>}] STNT1W { <Zt>.S }, <Pg>, [<Zn>.S{, <Xm>}] Issue: #3044 --------- Co-authored-by: Assad Hashmi <[email protected]>

This patch adds the appropriate macros, tests and codec entries to decode and encode the following instructions: ```MUL <Zd>.<T>, <Zn>.<T>, <Zm>.<T>``` ```MUL <Zd>.D, <Zn>.D, <Zm>.D[<imm>]``` ```MUL <Zd>.H, <Zn>.H, <Zm>.H[<imm>]``` ```MUL <Zd>.S, <Zn>.S, <Zm>.S[<imm>]``` Issue: #3044

This patch adds the appropriate macros, tests and codec entries to decode and encode the following instructions: MUL <Zd>.<T>, <Zn>.<T>, <Zm>.<T> MUL <Zd>.D, <Zn>.D, <Zm>.D[<imm>] MUL <Zd>.H, <Zn>.H, <Zm>.H[<imm>] MUL <Zd>.S, <Zn>.S, <Zm>.S[<imm>] Issue: #3044

This patch adds the appropriate macros, tests and codec entries to decode and encode the following instruction: ```SPLICE <Zd>.<T>, <Pv>, { <Zn1>.<T>, <Zn2>.<T> }``` Issue: #3044

This patch adds the appropriate macros, tests and codec entries to decode and encode the following instruction: SPLICE <Zd>.<T>, <Pv>, { <Zn1>.<T>, <Zn2>.<T> } Issue: #3044

This patch adds the appropriate macros, tests and codec entries to decode and encode the following instruction: ```TBL <Zd>.<Ts>, { <Zn1>.<Ts>, <Zn2>.<Ts> }, <Zm>.<Ts>``` Issue: #3044

This patch adds the appropriate macros, tests and codec entries to decode and encode the following instruction: TBL <Zd>.<Ts>, { <Zn1>.<Ts>, <Zn2>.<Ts> }, <Zm>.<Ts> Issue: #3044

fhahn added the OpSys-AArch64 label Jun 9, 2018

fhahn mentioned this issue Jun 28, 2018

i#3044 AArch64 SVE: Add Z registers and a simple encoding group. #3073

Merged

fhahn added a commit that referenced this issue Jun 28, 2018

i#3044 AArch64 SVE: Add SVE support to generation scripts.

ddfcf00

Issue: #3044 Change-Id: I8af06df778c203498a5c247bdc19b5c2064a1766

fhahn mentioned this issue Jun 30, 2018

i#3044 SVE encoder: Add support for sve_int_bin_pred_log enc group. #3078

Merged

derekbruening mentioned this issue Aug 6, 2020

Finish AArch64 encoder/decoder #2626

Open

derekbruening mentioned this issue Aug 16, 2020

[Aarch64] float instruction fmov unrecognized, src and dst go wrong，float register value not saved #4408

Closed

derekbruening added Component-IR Google-Affecting Priority-Low labels Apr 14, 2021

AssadHashmi mentioned this issue Feb 17, 2022

Build and validate DynamoRIO on AArch64 SVE hardware #5365

Open

AssadHashmi mentioned this issue Apr 8, 2022

i#4393 New AArch64 codec implementation #5453

Merged

joshua-warburton mentioned this issue Oct 10, 2022

i#3044: Split out SVE instruction test files #5680

Merged

joshua-warburton mentioned this issue Oct 11, 2022

i#3044: Add a type to represent aarch64 vectors #5681

Merged

jackgallagher-arm mentioned this issue Oct 23, 2023

i#3044 AArch64 SVE codec: fix scalar+immediate LD/ST offsets #6390

Merged

joshua-warburton mentioned this issue Oct 27, 2023

i#3044: Always initialise sizes for registers #6401

Merged

AssadHashmi mentioned this issue Nov 9, 2023

i#5036 AArch64: Remove Z register checks in tracer and sample clients #6431

Merged

joshua-warburton mentioned this issue Nov 21, 2023

i#3044: AArch64 SVE2 codec: add vector+scalar versions of st/ldnt #6468

Merged

philramsey-arm mentioned this issue Dec 8, 2023

i#3044 AArch64 SVE codec: Add SVE2 MUL variants #6501

Merged

philramsey-arm mentioned this issue Dec 19, 2023

i#3044 AArch64 SVE codec: Add SVE2 SPLICE variant #6517

Merged

philramsey-arm mentioned this issue Dec 20, 2023

i#3044 AArch64 SVE codec: Add SVE2 TBL variant #6521

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add encoder/decoder support for Arm's Scalable Vector Extension #3044

Add encoder/decoder support for Arm's Scalable Vector Extension #3044

fhahn commented Jun 9, 2018 •

edited

Loading

AssadHashmi commented Aug 23, 2022

derekbruening commented Aug 23, 2022

AssadHashmi commented Sep 5, 2022

derekbruening commented Sep 7, 2022

AssadHashmi commented Sep 8, 2022

derekbruening commented Sep 8, 2022

Add encoder/decoder support for Arm's Scalable Vector Extension #3044

Add encoder/decoder support for Arm's Scalable Vector Extension #3044

Comments

fhahn commented Jun 9, 2018 • edited Loading

AssadHashmi commented Aug 23, 2022

derekbruening commented Aug 23, 2022

AssadHashmi commented Sep 5, 2022

derekbruening commented Sep 7, 2022

AssadHashmi commented Sep 8, 2022

derekbruening commented Sep 8, 2022

fhahn commented Jun 9, 2018 •

edited

Loading