-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add encoder/decoder support for Arm's Scalable Vector Extension #3044
Comments
This patch adds a new register type for scalable vector (Z) registers and encoding/decoding support for the 'SVE Integer Arithmetic - Unpredicated Group' encoding group. The specification can be found at https://developer.arm.com/docs/ddi0584/latest/arm-architecture-reference-manual-supplement-the-scalable-vector-extension-sve-for-armv8-a Issue: #3044 Change-Id: I36b1e55b250aca11a9743e12e517edf500fdba4c
Issue: #3044 Change-Id: I8af06df778c203498a5c247bdc19b5c2064a1766
This patch adds a new register type for scalable vector (Z) registers and encoding/decoding support for the 'SVE Integer Arithmetic - Unpredicated Group' encoding group. The specification can be found at https://developer.arm.com/docs/ddi0584/latest/arm-architecture-reference-manual-supplement-the-scalable-vector-extension-sve-for-armv8-a Issue: #3044
This patch adds a new register type for SVE predicate registers (P0-P15) and support for the sve_int_bin_pred_log encoding group. The macros added follow the SVE assembly syntax, where the shared input and output register Zdn has to be supplied twice. Issue #3044 Change-Id: Ib55b5e1160ab41b9c298b7abbef66a5454ed4384
This patch adds a new register type for SVE predicate registers (P0-P15) and support for the sve_int_bin_pred_log encoding group. The macros added follow the SVE assembly syntax, where the shared input and output register Zdn has to be supplied twice. Issue #3044
This patch is an example of the code which will be generated from a machine readable specification (MRS) to decode and encode AArch64 instructions from v8.1 onwards. It is provided for review and discussion purposes, in order to resolve any issues which may arise and to make visible what changes to expect. This patch does not include the MRS, the parser or generator, just an example of the target code which we intend to generate, based on the v8.1 SQRDMLAH instruction. Issue: #4393, #3044, #2626
This patch is an example of the code which will be generated from a machine readable specification (MRS) to decode and encode AArch64 instructions from v8.1 onwards. It is provided for review and discussion purposes, in order to resolve any issues which may arise and to make visible what changes to expect. This patch does not include the MRS, the parser or generator, just an example of the target code which we intend to generate, based on the v8.1 SQRDMLAH instruction. Issue: #4393, #3044, #2626
Currently we can handle 10 SVE opcodes (out of 314) in the codec. Now that we are preparing to implement the rest of SVE, the notion of a variable vector register length in DynamoRIO needs to be addressed. Specifically, how will clients and tools like DrMemory and drcachesim work correctly on SVE hardware with different SVE vector lengths? Currently we have
One approach would be for DynamoRIO to execute the @derekbruening is this enough to cover all of DynamoRIO's clients/tools requirements? Is there another approach? |
Having the actual register size in the operand in the IR at runtime does seem necessary but maybe not sufficient. Tools like DrMemory which are shadowing registers may want to know the maximum register sizes at initialization time to optimize their shadow allocations. Maybe an API routine like dr_mcontext_zmm_fields_valid() where a client can ask which parts of the mcontext are live for the current hardware, with some way to refer to the different simd vector sizes? What about |
Thanks for the steer @derekbruening. Some more thoughts and questions...
Yes. The low 128 bits of each SVE Z register overlap the corresponding Advanced SIMD (a.k.a NEON) registers.
Is this a case of creating a new
Does the size have to be in the mcontext as well, for direct access from the context? Is
Yes, twas just a quick brain dump of bit lengths! Other thoughts: We don't need the SVE equivalent of e.g.
because all SVE h/w will have full SVE support in the OS. In terms of testing, are the existing dynamorio and drmemory unit tests enough to guard against regression for such a size change? As part of the first patch I'll update |
One concern is that DR puts the mcontext on the stack in a number of places, and has a small stack size. 2048 bits == 256 bytes * 32 registers is 8K which is a lot on DR's small stacks. We've thought about heap-allocating in the past but use in signal handling complicates things. For comparison the x86 512-bit SIMD occupies 2K. I think we may have to move to heap allocation and special heap usage for signals if we add 8K.
I think most copy routines already have logic to copy only certain SIMD fields so this should be doable. E.g., the x86 dr_mcontext_to_priv_mcontext has logic to copy just the ymm parts of the zmm fields.
But also returning the size that's valid?
A separate size query is probably ok.
Users must set the total struct size in the size field, so DR code would use that to handle both the old 128-bit and new larger sizes. One-off tests against binary clients built against the old would be one way to add confidence; a true regression test would probably require a duplicated struct def in the test or sthg. |
In addition to the 32 vector registers (Z0-Z31), there are 16 predicate registers (P0-P15) and a first-fault register (FFR). Current SVE hardware supports 128, 256 and 512 bit vectors. I suggest that we only implement for 512 bits max to start with, avoiding increasing stack size now. This will require just over 3K for Z, P and FFR registers. I think it's more efficient to get SVE patches rolling into the trunk now for 512 bits and address the larger vector and stack sizes as a separate issue later. Is that ok?
Yes. |
Yes, this sounds like a good plan to me. |
This patch creates a new file for SVE disassembly tests and another ir_aarch64 test file. Also included is a small bug fix for the disassembly test sorter that could remove a test if the file did not end in a new line. issues: #3044 Change-Id: I8b2ded8cd8d48d160132e96d712d502d30cfd05f
This patch creates a new file for SVE disassembly tests and another ir_aarch64 test file. Also included is a small bug fix for the disassembly test sorter that could remove a test if the file did not end in a new line. issues: #3044
Previously in the IR we have represented vectors by a plain H, S, D or Q register combined with a faux-imm added after the last vector to give a hint to its size. This patch uses the .size field of the opnd struct to include the element size, similar to how partial registers are used for x86. Element Vector registers are differentiated from this by setting the DR_OPND_IS_VECTOR (0x40) bit on the .flag field. This bit is also set by the DR_OPND_IS_EXTEND flag for imms and was chosen as to not extend the size of the flag field while remaining unambiguous. The new Element Vector operand is printed like z0.b via a check in disassemble_shared.c to set a suffix. Also added are the utility functions opnd_is_element_vector_reg, opnd_create_reg_element_vector and opnd_get_vector_element_size/ issue: #3044 Change-Id: I0cdb78cc1a13c3db7b6c742fb1d5d5c0e54216ff
Previously in the IR we have represented vectors by a plain H, S, D or Q register combined with a faux-imm added after the last vector to give a hint to its size. This patch uses the .size field of the opnd struct to include the element size, similar to how partial registers are used for x86. Element Vector registers are differentiated from this by setting the DR_OPND_IS_VECTOR (0x40) bit on the .flag field. This bit is also set by the DR_OPND_IS_EXTEND flag for imms and was chosen as to not extend the size of the flag field while remaining unambiguous. The new Element Vector operand is printed like z0.b via a check in disassemble_shared.c to set a suffix. Also added are the utility functions opnd_is_element_vector_reg, opnd_create_reg_element_vector and opnd_get_vector_element_size/ issue: #3044 Change-Id: I0cdb78cc1a13c3db7b6c742fb1d5d5c0e54216ff
…6230) For the current decode/encode functions of: LDR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}] LDR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}] STR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}] STR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}] PRFB <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}] PRFH <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}] PRFW <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}] PRFD <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}] Vector indexing is used in the memory operand at the IR level. However the IR must always refer to the address in terms of the base register value plus a byte offset displacement. This patch changes the decode/encode functions for these instructions to expect byte offsets at the IR level, converting to vector length offsets within the codec. Issues #3044, #5365
This patch adds the appropriate macros, tests and codec entries to decode and encode the following instructions: ``` LDG <Xt>, [<Xn|SP>, #<simm>] ST2G <Xt>, [<Xn|SP>], #<simm> ST2G <Xt>, [<Xn|SP>, #<simm>]! ST2G <Xt>, [<Xn|SP>, #<simm>] STG <Xt>, [<Xn|SP>], #<simm> STG <Xt>, [<Xn|SP>, #<simm>]! STG <Xt>, [<Xn|SP>, #<simm>] STZ2G <Xt>, [<Xn|SP>], #<simm> STZ2G <Xt>, [<Xn|SP>, #<simm>]! STZ2G <Xt>, [<Xn|SP>, #<simm>] STZG <Xt>, [<Xn|SP>], #<simm> STZG <Xt>, [<Xn|SP>, #<simm>]! STZG <Xt>, [<Xn|SP>, #<simm>] STGP <Xt>, <Xt2>, [<Xn|SP>], #<simm> STGP <Xt>, <Xt2>, [<Xn|SP>, #<simm>]! STGP <Xt>, <Xt2>, [<Xn|SP>, #<simm>] ``` Issue DynamoRIO#3044 Co-authored-by: Joshua Warburton <[email protected]>
…oRIO#6216) This patch adds the appropriate macros, tests and codec entries to encode the following variants: ``` GMI <Xd>, <Xn|SP>, <Xm> IRG <Xd|SP>, <Xn|SP>{, <Xm>} SUBP <Xd>, <Xn|SP>, <Xm|SP> SUBPS <Xd>, <Xn|SP>, <Xm|SP> ADDG <Xd|SP>, <Xn|SP>, #<imm1>, #<imm2> SUBG <Xd|SP>, <Xn|SP>, #<imm1>, #<imm2> DC GVA, <Xt> DC GZVA, <Xt> ``` Issue DynamoRIO#3044
…ynamoRIO#6230) For the current decode/encode functions of: LDR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}] LDR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}] STR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}] STR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}] PRFB <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}] PRFH <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}] PRFW <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}] PRFD <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}] Vector indexing is used in the memory operand at the IR level. However the IR must always refer to the address in terms of the base register value plus a byte offset displacement. This patch changes the decode/encode functions for these instructions to expect byte offsets at the IR level, converting to vector length offsets within the codec. Issues DynamoRIO#3044, DynamoRIO#5365
This patch adds Arm AArch64 Scalable Vector Extension (SVE) support to the core including related changes to the codec, IR and relevant clients. SVE and SVE2 are major extensions to Arm's 64 bit architecture. Developers and users should reference the relevant documentation at developer.arm.com, (currently https://developer.arm.com/Architectures/Scalable%20Vector%20Extensions). The architecture allows hardware implementations to support vector lengths from 128 to 2048 bits. This patch supports up to 512 bits due to DynamoRIO's stack size limitation. There is currently no stock SVE hardware with vector lengths greater than 512 bits. The vector length is determined by get_processor_specific_info() at runtime on startup and is available by calling proc_get_vector_length(). For Z registers, reg_get_size() will return the vector size implemented by the hardware rather than OPSZ_SCALABLE. There will be follow up patches for: - SVE scatter/gather emulation - Full SVE signal context support - Complete SVE support in sample clients and drcachesim tracer. Issues: #5365, #3044 --------- Co-authored-by: Cam Mannett <[email protected]>
This patch adds Arm AArch64 Scalable Vector Extension (SVE) support to the core including related changes to the codec, IR and relevant clients. SVE and SVE2 are major extensions to Arm's 64 bit architecture. Developers and users should reference the relevant documentation at developer.arm.com, (currently https://developer.arm.com/Architectures/Scalable%20Vector%20Extensions). The architecture allows hardware implementations to support vector lengths from 128 to 2048 bits. This patch supports up to 512 bits due to DynamoRIO's stack size limitation. There is currently no stock SVE hardware with vector lengths greater than 512 bits. The vector length is determined by get_processor_specific_info() at runtime on startup and is available by calling proc_get_vector_length(). For Z registers, reg_get_size() will return the vector size implemented by the hardware rather than OPSZ_SCALABLE. There will be follow up patches for: - SVE scatter/gather emulation - Full SVE signal context support - Complete SVE support in sample clients and drcachesim tracer. Issues: #5365, #3044 --------- Co-authored-by: Cam Mannett <[email protected]>
…ffsets All SVE scalar+immediate LD[1234]/ST[1234] have a signed 4-bit immediate value that encodes a vector index offset from the base register. This value was being used directly in the IR for instructions, however base+disp memory operands should always use a byte displacement. This changes the codec to use byte displacements in the IR and updates the codec unit tests accordingly. The following instructions are updated: LD1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1B { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD2B { <Zt1>.B, <Zt2>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD2D { <Zt1>.D, <Zt2>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD2H { <Zt1>.H, <Zt2>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD2W { <Zt1>.S, <Zt2>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD3B { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD3D { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD3H { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD3W { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1B { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNT1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNT1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNT1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNT1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] ST1B { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST1D { <Zt>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST1H { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST1W { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST2B { <Zt1>.B, <Zt2>.B }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST2D { <Zt1>.D, <Zt2>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST2H { <Zt1>.H, <Zt2>.H }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST2W { <Zt1>.S, <Zt2>.S }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST3B { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST3D { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST3H { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST3W { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] STNT1B { <Zt>.B }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] STNT1D { <Zt>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] STNT1H { <Zt>.H }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] STNT1W { <Zt>.S }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] Issue: #3044
All SVE scalar+immediate LD[1234]/ST[1234] have a signed 4-bit immediate value that encodes a vector index offset from the base register. This value was being used directly in the IR for instructions, however base+disp memory operands should always use a byte displacement. This changes the codec to use byte displacements in the IR and updates the codec unit tests accordingly. The following instructions are updated: LD1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1B { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD2B { <Zt1>.B, <Zt2>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD2D { <Zt1>.D, <Zt2>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD2H { <Zt1>.H, <Zt2>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD2W { <Zt1>.S, <Zt2>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD3B { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD3D { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD3H { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD3W { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LD4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1B { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNT1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNT1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNT1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] LDNT1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<simm>, MUL VL}] ST1B { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST1D { <Zt>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST1H { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST1W { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST2B { <Zt1>.B, <Zt2>.B }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST2D { <Zt1>.D, <Zt2>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST2H { <Zt1>.H, <Zt2>.H }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST2W { <Zt1>.S, <Zt2>.S }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST3B { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST3D { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST3H { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST3W { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] ST4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] STNT1B { <Zt>.B }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] STNT1D { <Zt>.D }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] STNT1H { <Zt>.H }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] STNT1W { <Zt>.S }, <Pg>, [<Xn|SP>{, #<simm>, MUL VL}] Issue: #3044
This patch makes sure that the size and element size for predicate and vector registers are always initialised which will avoid potential issues when comparing registers issue: #3044 Change-Id: I28195e22c861ffb8d1d06c6672edb1c170cd291c
This patch makes sure that the size and element size for predicate and vector registers are always initialised which will avoid potential issues when comparing registers issue: #3044
drcachesim's tracer.cpp, sample clients memtrace_simple.c and memval_simple.c have checks to avoid handling SVE scatter/gather memory instructions, i.e. use of Z registers in memory address operands. Now that a significant number of scatter/gather instructions have been implemented, these checks can be removed. Issues: #5036, #5365, #3044
…#6431) drcachesim's tracer.cpp, sample clients memtrace_simple.c and memval_simple.c have checks to avoid handling SVE scatter/gather memory instructions, i.e. use of Z registers in memory address operands. Now that a significant number of scatter/gather instructions have been implemented, these checks can be removed. Issues: #5036, #5365, #3044
…#6431) drcachesim's tracer.cpp, sample clients memtrace_simple.c and memval_simple.c have checks to avoid handling SVE scatter/gather memory instructions, i.e. use of Z registers in memory address operands. Now that a significant number of scatter/gather instructions have been implemented, these checks can be removed. Issues: #5036, #5365, #3044
This patch adds the appropriate macros, tests and codec entries to encode the following variants: ``` LDNT1B { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}] LDNT1D { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}] LDNT1H { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}] LDNT1W { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}] STNT1B { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}] STNT1D { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}] STNT1H { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}] STNT1W { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}] LDNT1B { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}] LDNT1H { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}] LDNT1W { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}] STNT1B { <Zt>.S }, <Pg>, [<Zn>.S{, <Xm>}] STNT1H { <Zt>.S }, <Pg>, [<Zn>.S{, <Xm>}] STNT1W { <Zt>.S }, <Pg>, [<Zn>.S{, <Xm>}] ``` issue: #3044 Change-Id: I95dd710f95b797e8e53ae69dfcadd430a04abc47
) This patch adds the appropriate macros, tests and codec entries to encode the following variants: LDNT1B { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}] LDNT1D { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}] LDNT1H { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}] LDNT1W { <Zt>.D }, <Pg>/Z, [<Zn>.D{, <Xm>}] STNT1B { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}] STNT1D { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}] STNT1H { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}] STNT1W { <Zt>.D }, <Pg>, [<Zn>.D{, <Xm>}] LDNT1B { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}] LDNT1H { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}] LDNT1W { <Zt>.S }, <Pg>/Z, [<Zn>.S{, <Xm>}] STNT1B { <Zt>.S }, <Pg>, [<Zn>.S{, <Xm>}] STNT1H { <Zt>.S }, <Pg>, [<Zn>.S{, <Xm>}] STNT1W { <Zt>.S }, <Pg>, [<Zn>.S{, <Xm>}] Issue: #3044 --------- Co-authored-by: Assad Hashmi <[email protected]>
This patch adds the appropriate macros, tests and codec entries to decode and encode the following instructions: ```MUL <Zd>.<T>, <Zn>.<T>, <Zm>.<T>``` ```MUL <Zd>.D, <Zn>.D, <Zm>.D[<imm>]``` ```MUL <Zd>.H, <Zn>.H, <Zm>.H[<imm>]``` ```MUL <Zd>.S, <Zn>.S, <Zm>.S[<imm>]``` Issue: #3044
This patch adds the appropriate macros, tests and codec entries to decode and encode the following instructions: MUL <Zd>.<T>, <Zn>.<T>, <Zm>.<T> MUL <Zd>.D, <Zn>.D, <Zm>.D[<imm>] MUL <Zd>.H, <Zn>.H, <Zm>.H[<imm>] MUL <Zd>.S, <Zn>.S, <Zm>.S[<imm>] Issue: #3044
This patch adds the appropriate macros, tests and codec entries to decode and encode the following instruction: ```SPLICE <Zd>.<T>, <Pv>, { <Zn1>.<T>, <Zn2>.<T> }``` Issue: #3044
This patch adds the appropriate macros, tests and codec entries to decode and encode the following instruction: ```SPLICE <Zd>.<T>, <Pv>, { <Zn1>.<T>, <Zn2>.<T> }``` Issue: #3044
This patch adds the appropriate macros, tests and codec entries to decode and encode the following instruction: SPLICE <Zd>.<T>, <Pv>, { <Zn1>.<T>, <Zn2>.<T> } Issue: #3044
This patch adds the appropriate macros, tests and codec entries to decode and encode the following instruction: ```TBL <Zd>.<Ts>, { <Zn1>.<Ts>, <Zn2>.<Ts> }, <Zm>.<Ts>``` Issue: #3044
This patch adds the appropriate macros, tests and codec entries to decode and encode the following instruction: TBL <Zd>.<Ts>, { <Zn1>.<Ts>, <Zn2>.<Ts> }, <Zm>.<Ts> Issue: #3044
Armv8-a's SVE is a big extension requiring a few changes to the AArch64 backend. At first, I think we need new register types for scalable vector (Z) and predicate (P) registers as well as support for the new SVE encodings.
https://developer.arm.com/docs/ddi0584/latest/arm-architecture-reference-manual-supplement-the-scalable-vector-extension-sve-for-armv8-a
Xref #2626
The text was updated successfully, but these errors were encountered: