[Arm64] Implement ASIMD Extract Insert ExtractVector64 ExtractVector128 #35030

echesakov · 2020-04-15T22:07:36Z

This implements Extract, Insert, ExtractVector64 and ExtractVector128 intrinsics.
This also implements a way to generate a fallback mechanism for intrinsics accepting an immediate operand when the operand is not constant.
This renames NoContainment flag to SupportsContainment on Arm64 (presumably, there should be fewer intrinsics supporting containment analysis so it makes more sense to have NoContainment as default)
This removes ival column from hwintrinsiclistarm64.h table and the corresponding field in HWIntrinsicInfo struct.
The functionality of Insert and Extract for Vector64<double>, Vector64<long> and Vector64<ulong> will be implemented by CreateScalar() and ToScalar() methods so I removed those from the API surface.

Fixes #34228 and fixes #24588, contributes to #24794 (ExtractVector64 and ExtractVector128)

I put below some examples of the generated code for a fallback "switch" table.

ExtractVector64(Vector64, Vector64, ubyte)

; Assembly listing for method System.Runtime.Intrinsics.Arm.AdvSimd:ExtractVector64(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; optimized code
; fp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00    ] (  3,  3   )   simd8  ->  [fp+0x28]   HFA(double)  do-not-enreg[XS] addr-exposed
;  V01 arg1         [V01    ] (  3,  3   )   simd8  ->  [fp+0x18]   HFA(double)  do-not-enreg[XS] addr-exposed
;  V02 arg2         [V02,T00] (  3,  3   )   ubyte  ->   x0
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
;  V04 cse0         [V04,T01] (  3,  3   )     int  ->   x0         "CSE - aggressive"
;
; Lcl frame size = 32

G_M23204_IG01:
        A9BD7BFD          stp     fp, lr, [sp,#-48]!
        910003FD          mov     fp, sp
        FD0017A0          str     d0, [fp,#40]
        FD000FA1          str     d1, [fp,#24]
                                                ;; bbWeight=1    PerfScore 3.50
G_M23204_IG02:
        FD4017A0          ldr     d0, [fp,#40]
        FD400FB0          ldr     d16, [fp,#24]
        53001C00          uxtb    w0, w0
        7100201F          cmp     w0, #8
        540002A2          bhs     G_M23204_IG12
        10000061          adr     x1, [G_M23204_IG03]
        8B000C21          add     x1, x1, x0, LSL #3
        D61F0020          br      x1
                                                ;; bbWeight=1    PerfScore 8.50
G_M23204_IG03:
        2E100000          ext     v0.8b, v0.8b, v16.8b, #0
        1400000E          b       G_M23204_IG11
                                                ;; bbWeight=1    PerfScore 2.00
G_M23204_IG04:
        2E100800          ext     v0.8b, v0.8b, v16.8b, #1
        1400000C          b       G_M23204_IG11
                                                ;; bbWeight=1    PerfScore 2.00
G_M23204_IG05:
        2E101000          ext     v0.8b, v0.8b, v16.8b, #2
        1400000A          b       G_M23204_IG11
                                                ;; bbWeight=1    PerfScore 2.00
G_M23204_IG06:
        2E101800          ext     v0.8b, v0.8b, v16.8b, #3
        14000008          b       G_M23204_IG11
                                                ;; bbWeight=1    PerfScore 2.00
G_M23204_IG07:
        2E102000          ext     v0.8b, v0.8b, v16.8b, #4
        14000006          b       G_M23204_IG11
                                                ;; bbWeight=1    PerfScore 2.00
G_M23204_IG08:
        2E102800          ext     v0.8b, v0.8b, v16.8b, #5
        14000004          b       G_M23204_IG11
                                                ;; bbWeight=1    PerfScore 2.00
G_M23204_IG09:
        2E103000          ext     v0.8b, v0.8b, v16.8b, #6
        14000002          b       G_M23204_IG11
                                                ;; bbWeight=1    PerfScore 2.00
G_M23204_IG10:
        2E103800          ext     v0.8b, v0.8b, v16.8b, #7
                                                ;; bbWeight=1    PerfScore 1.00
G_M23204_IG11:
        A8C37BFD          ldp     fp, lr, [sp],#48
        D65F03C0          ret     lr
                                                ;; bbWeight=1    PerfScore 2.00
G_M23204_IG12:
        97FE5A87          bl      CORINFO_HELP_THROW_ARGUMENTOUTOFRANGEEXCEPTION
        D43E0000          bkpt
                                                ;; bbWeight=0    PerfScore 0.00

; Total bytes of code 124, prolog size 8, PerfScore 41.40, (MethodHash=aa85a55b) for method System.Runtime.Intrinsics.Arm.AdvSimd:ExtractVector64(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
; ============================================================

ExtractVector64(Vector64, Vector64, ubyte)

; Assembly listing for method System.Runtime.Intrinsics.Arm.AdvSimd:ExtractVector64(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single],ubyte):System.Runtime.Intrinsics.Vector64`1[Single]
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; optimized code
; fp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00    ] (  3,  3   )   simd8  ->  [fp+0x28]   HFA(double)  do-not-enreg[XS] addr-exposed
;  V01 arg1         [V01    ] (  3,  3   )   simd8  ->  [fp+0x18]   HFA(double)  do-not-enreg[XS] addr-exposed
;  V02 arg2         [V02,T00] (  3,  3   )   ubyte  ->   x0
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
;  V04 cse0         [V04,T01] (  3,  3   )     int  ->   x0         "CSE - aggressive"
;
; Lcl frame size = 32

G_M6420_IG01:
        A9BD7BFD          stp     fp, lr, [sp,#-48]!
        910003FD          mov     fp, sp
        FD0017A0          str     d0, [fp,#40]
        FD000FA1          str     d1, [fp,#24]
                                                ;; bbWeight=1    PerfScore 3.50
G_M6420_IG02:
        FD4017A0          ldr     d0, [fp,#40]
        FD400FB0          ldr     d16, [fp,#24]
        53001C00          uxtb    w0, w0
        7100081F          cmp     w0, #2
        540000E2          bhs     G_M6420_IG06
        35000060          cbnz    w0, G_M6420_IG04
                                                ;; bbWeight=1    PerfScore 7.00
G_M6420_IG03:
        2E100000          ext     v0.8b, v0.8b, v16.8b, #0
        14000002          b       G_M6420_IG05
                                                ;; bbWeight=1    PerfScore 2.00
G_M6420_IG04:
        2E102000          ext     v0.8b, v0.8b, v16.8b, #4
                                                ;; bbWeight=1    PerfScore 1.00
G_M6420_IG05:
        A8C37BFD          ldp     fp, lr, [sp],#48
        D65F03C0          ret     lr
                                                ;; bbWeight=1    PerfScore 2.00
G_M6420_IG06:
        97FE57CD          bl      CORINFO_HELP_THROW_ARGUMENTOUTOFRANGEEXCEPTION
        D43E0000          bkpt
                                                ;; bbWeight=0    PerfScore 0.00

; Total bytes of code 68, prolog size 8, PerfScore 22.30, (MethodHash=821fe6eb) for method System.Runtime.Intrinsics.Arm.AdvSimd:ExtractVector64(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single],ubyte):System.Runtime.Intrinsics.Vector64`1[Single]
; ============================================================

ExtractVector128(Vector128, Vector128, ubyte)

; Assembly listing for method System.Runtime.Intrinsics.Arm.AdvSimd:ExtractVector128(System.Runtime.Intrinsics.Vector128`1[Double],System.Runtime.Intrinsics.Vector128`1[Double],ubyte):System.Runtime.Intrinsics.Vector128`1[Double]
; Emitting BLENDED_CODE for generic ARM64 CPU - Windows
; optimized code
; fp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00    ] (  3,  3   )  simd16  ->  [fp+0x20]   HFA(simd16)  do-not-enreg[XS] addr-exposed
;  V01 arg1         [V01    ] (  3,  3   )  simd16  ->  [fp+0x10]   HFA(simd16)  do-not-enreg[XS] addr-exposed
;  V02 arg2         [V02,T00] (  3,  3   )   ubyte  ->   x0
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
;  V04 cse0         [V04,T01] (  3,  3   )     int  ->   x0         "CSE - aggressive"
;
; Lcl frame size = 32

G_M55355_IG01:
        A9BD7BFD          stp     fp, lr, [sp,#-48]!
        910003FD          mov     fp, sp
        3D800BA0          str     q0, [fp,#32]
        3D8007A1          str     q1, [fp,#16]
                                                ;; bbWeight=1    PerfScore 3.50
G_M55355_IG02:
        3DC00BB0          ldr     q16, [fp,#32]
        3DC007B1          ldr     q17, [fp,#16]
        53001C00          uxtb    w0, w0
        7100081F          cmp     w0, #2
        54000102          bhs     G_M55355_IG07
        35000060          cbnz    w0, G_M55355_IG04
                                                ;; bbWeight=1    PerfScore 7.00
G_M55355_IG03:
        6E110210          ext     v16.16b, v16.16b, v17.16b, #0
        14000002          b       G_M55355_IG05
                                                ;; bbWeight=1    PerfScore 2.00
G_M55355_IG04:
        6E114210          ext     v16.16b, v16.16b, v17.16b, #8
                                                ;; bbWeight=1    PerfScore 1.00
G_M55355_IG05:
        4EB01E00          mov     v0.16b, v16.16b
                                                ;; bbWeight=1    PerfScore 0.50
G_M55355_IG06:
        A8C37BFD          ldp     fp, lr, [sp],#48
        D65F03C0          ret     lr
                                                ;; bbWeight=1    PerfScore 2.00
G_M55355_IG07:
        97FE5F66          bl      CORINFO_HELP_THROW_ARGUMENTOUTOFRANGEEXCEPTION
        D43E0000          bkpt
                                                ;; bbWeight=0    PerfScore 0.00

; Total bytes of code 72, prolog size 8, PerfScore 23.20, (MethodHash=721027c4) for method System.Runtime.Intrinsics.Arm.AdvSimd:ExtractVector128(System.Runtime.Intrinsics.Vector128`1[Double],System.Runtime.Intrinsics.Vector128`1[Double],ubyte):System.Runtime.Intrinsics.Vector128`1[Double]
; ============================================================

…imd.PlatformNotSupported.cs

…h hwintrinsiclistarm64.h namedintrinsiclist.h valuenumfuncs.h

….cpp

…wintrinsic.h hwintrinsiclistarm64.h

…8 in lowerarmarch.cpp

tannergooding · 2020-04-16T05:29:26Z

src/coreclr/src/jit/hwintrinsic.h

+    // NoContainment
+    // the intrinsic cannot be handled by comtainment,
+    // all the intrinsic that have explicit memory load/store semantics should have this flag
+    HW_Flag_NoContainment = 0x40,


This is misordered with respect to the rest of the flags.

Yep, I meant to update this before marking this PR as ready for review.

tannergooding · 2020-04-16T05:32:36Z

src/coreclr/src/jit/hwintrinsiccodegenarm64.cpp

+            {
+                HWIntrinsicImmOpHelper helper(this, intrin.op2, node);
+
+                for (helper.EmitAtFirst(); !helper.Done(); helper.EmitAfterCase())


Could I get a brief explanation of why the ARM64 jmp table is so much more involved than the x86 one?

For x86, we just needed a small helper method that took in the intrinsic, registers, and a lambda that emitted the contents of each case statement: https://github.com/dotnet/runtime/blob/master/src/coreclr/src/jit/hwintrinsiccodegenxarch.cpp#L1068-L1120

I wouldn't call this approach more involved - here instead you need a helper class and no lambda at all. This is basically transforming .Select(Action<int> func) to foreach (int imm in Immediates()) { /* do your action on imm */ }

Below are my ideas why I did it this way.

First, branching on arm64 could be potentially optimized in many different ways (e.g. due to the fact that all the instruction are fixed size).
Also branching at non zero (when imm can only be 0 or 1) is a special case that doesn't require an additional general-purpose register and I though it would be nice to generate more optimal code in this case.

Second, having a lambda instead leads to repetitive code when you first need to check if immOp is const then call the lambda with ival. Otherwise (if it's not const), you call the helper. While with this approach you define and use the code generation logic only once - in a loop - hiding all the details behind HWIntrinsicImmOpHelper. Actually, I implemented the approach with template function first but I didn't like how the code looked - especially for AdvSimd_Insert case - since we need different emitter functions depending on the base type.

I don't like the fact that we declare but NOT define template function in codegen.h. IMO, template functions if used should be defined in a header file.

First, branching on arm64 could be potentially optimized in many different ways (e.g. due to the fact that all the instruction are fixed size).
Also branching at non zero (when imm can only be 0 or 1) is a special case that doesn't require an additional general-purpose register and I though it would be nice to generate more optimal code in this case.

For these two bits, I don't think the optimization is that important. This is a fallback case meant for debuggers and reflection invocation.
Likewise on x86, the branching is already "optimal" as every case is exactly the same number of bytes (so its just a simple baseAddress + index * caseSize then jump call)

Second, having a lambda instead leads to repetitive code when you first need to check if immOp is const then call the lambda with ival. Otherwise (if it's not const), you call the helper. While with this approach you define and use the code generation logic only once - in a loop - hiding all the details behind HWIntrinsicImmOpHelper. Actually, I implemented the approach with template function first but I didn't like how the code looked - especially for AdvSimd_Insert case - since we need different emitter functions depending on the base type.

I think this is just a trade-off of do you declare if (const) { lambda } else { emitJumpTable(lambda) } or declare for () { similarLogicToLambda }. You still have to redeclare some logic in all the same places, its just what is redeclared that differs

I don't like the fact that we declare but NOT define template function in codegen.h. IMO, template functions if used should be defined in a header file.

It was just placed in the cpp file since that is the only place it will ever be used from, similar to many other functions that aren't meant to be generally reusable (or aren't yet).

Just to add my two cents - personally, I find lambdas problematic both to understanding and readability of the code, but also to debugging (though the latter will presumably improve over time). I find the approach that Egor has taken here to be quite understandable and readable, though I might make minor changes to the names.

I'm actually the opposite and find the new code more complex, but its not terribly so and I was mainly interested in why we are differing.
Ideally we'd share these types of constructs as much as possible, rather than having the ARM and x86 code paths drastically differ.

… to avoid ifdef-s at places where this function is used in hwintrinsic.h hwintrinsicarm64.cpp hwintrinsicxarch.cpp

CarolEidt

Some comments, questions and suggestions.

CarolEidt · 2020-04-20T17:46:36Z

src/coreclr/src/jit/hwintrinsic.cpp

@@ -541,7 +537,7 @@ GenTree* Compiler::getArgForHWIntrinsic(var_types argType, CORINFO_CLASS_HANDLE
 //     add a GT_HW_INTRINSIC_CHK node for non-full-range imm-intrinsic, which would throw ArgumentOutOfRangeException
 //     when the imm-argument is not in the valid range
 //
-GenTree* Compiler::addRangeCheckIfNeeded(NamedIntrinsic intrinsic, GenTree* immOp, bool mustExpand)
+GenTree* Compiler::addRangeCheckIfNeeded(NamedIntrinsic intrinsic, GenTree* immOp, bool mustExpand, int immUpperBound)


The new argument should be documented in the header comment.

CarolEidt · 2020-04-20T17:52:38Z

src/coreclr/src/jit/hwintrinsic.cpp

@@ -541,7 +537,7 @@ GenTree* Compiler::getArgForHWIntrinsic(var_types argType, CORINFO_CLASS_HANDLE
 //     add a GT_HW_INTRINSIC_CHK node for non-full-range imm-intrinsic, which would throw ArgumentOutOfRangeException
 //     when the imm-argument is not in the valid range
 //
-GenTree* Compiler::addRangeCheckIfNeeded(NamedIntrinsic intrinsic, GenTree* immOp, bool mustExpand)
+GenTree* Compiler::addRangeCheckIfNeeded(NamedIntrinsic intrinsic, GenTree* immOp, bool mustExpand, int immUpperBound)


The header comment needs to be updated for this additional argument.

CarolEidt · 2020-04-20T17:54:20Z

src/coreclr/src/jit/hwintrinsic.h

@@ -315,6 +335,102 @@ struct HWIntrinsicInfo
    }
 };

+#ifdef TARGET_ARM64
+
+struct HWIntrinsic final


Can you explain why this wrapper struct is needed and how it is used? It doesn't seem necessarily, and (to me) just obfuscates the creation logic. In any case, comments are needed to explain what this is for.

The wrapper is used when we want to access the operands of GenTreeHWIntrinsic node.

Otherwise, the code that does lookup:

op1 = node->gtGetOp1(); op2 = node->gtGetOp2(); assert(op1 != nullptr); if (op1->OperIsList()) { assert(op2 == nullptr); GenTreeArgList* list = op1->AsArgList(); op1 = list->Current(); list = list->Rest(); op2 = list->Current(); list = list->Rest(); op3 = list->Current(); assert(list->Rest() == nullptr); numOperands = 3; } else if (op2 != nullptr) { numOperands = 2; } else { numOperands = 1; }

would need to be repeated in Lower, LSRA, CodeGen and, perhaps, other places.

I had this wrapper in CodeGen originally but here I decided to extend its use to the other places.

As an alternative, I can place this code directly in GenTreeHWIntrinsic (or even GenTreeJitIntrinsic).

I think it might be cleaner to put it on one of the GenTree nodes.

Okay, I can try this. Would you object me doing this as a separate PR and leave the wrapper as is here?

No objection.

src/coreclr/src/jit/hwintrinsicarm64.cpp

src/coreclr/src/jit/hwintrinsiccodegenarm64.cpp

CarolEidt · 2020-04-20T19:24:01Z

src/coreclr/src/jit/hwintrinsiccodegenarm64.cpp

+            {
+                HWIntrinsicImmOpHelper helper(this, intrin.op2, node);
+
+                for (helper.EmitAtFirst(); !helper.Done(); helper.EmitAfterCase())


Just to add my two cents - personally, I find lambdas problematic both to understanding and readability of the code, but also to debugging (though the latter will presumably improve over time). I find the approach that Egor has taken here to be quite understandable and readable, though I might make minor changes to the names.

…Simd.PlatformNotSupported.cs

…imd.PlatformNotSupported.cs

…mmUpperBound in hwintrinsicarm64.cpp

…en::HWIntrinsicImmOpHelper in hwintrinsiccodegenarm64.cpp

…OpHelper in hwintrinsiccodegenarm64.cpp

…degen.h and hwintrinsiccodegenarm64.cpp

…m with update their values in hwintrinsic.h

…-ExtractVector128

echesakov · 2020-04-21T23:57:00Z

@CarolEidt @tannergooding I believe I addressed all you comments and suggestions (except the one about wrapper struct - I asked if this could be a part of a separate PR). Can you please take a look when you have time?

CarolEidt

LGTM - thanks!

echesakov added 29 commits April 9, 2020 11:05

Put ExtractAndNarrowLow after ExtractAndNarrowHigh in AdvSimd.cs AdvS…

14bdaf1

…imd.PlatformNotSupported.cs

Add Extract in AdvSimd.cs AdvSimd.PlatformNotSupported.cs

dfcf01f

Add Insert in AdvSimd.cs AdvSimd.PlatformNotSupported.cs

74b4868

Add ExtractVector64 in AdvSimd.cs AdvSimd.PlatformNotSupported.cs

803e2f5

Add ExtractVector128 in AdvSimd.cs AdvSimd.PlatformNotSupported.cs

b1852ad

Update System.Runtime.Intrinsics.Experimental.cs

8cec295

Update Helpers.cs Helpers.tt

648ee20

Add ExtractTest.template

6b4fbed

Add ExtractVectorTest.template

9a82b44

Add InsertTest.template

1fcbade

Add Extract in GenerateTests.csx

bf58686

Add ExtractVector64 and ExtractVector128 in GenerateTests.csx

34eccf3

Add Insert in GenerateTests.csx

7effe2a

Put ExtractAndNarrowLow after ExtractAndNarrowHigh in GenerateTests.csx

c5673aa

Update AdvSimd/ AdvSimd.Arm64/

6aaaa41

Add emitter unit tests for INS_ext in codegenarm64.cpp

6953bb2

Implement INS_ext in emitarm64.cpp emitfmtsarm64.h instrsarm64.h

f75ab71

Update hwintrinsiclistarm64.h

233c348

Formatting in instrsarm64.h

8246181

Remove HWIntrinsicInfo::ival on Arm64 in hwintrinsic.cpp hwintrinsic.…

9f3c72e

…h hwintrinsiclistarm64.h namedintrinsiclist.h valuenumfuncs.h

Remove lookupImmUpperBound on Arm64 in hwintrinsic.h hwintrinsicarm64…

36d4e96

….cpp

Remove HW_Flag_NoContainment and add HW_Flag_SupportsContainment in h…

7d87daa

…wintrinsic.h hwintrinsiclistarm64.h

Remove HWIntrinsic in hwintrinsiccodegenarm64.cpp

23c1b94

Add HWIntrinsic in hwintrinsic.h

0d82dcd

Remove #include-s in hwintrinsiccodegenarm64.cpp

563c3e0

Add analysis for allocation of branchTargetReg in lsraarm64.cpp

6843309

Add ContainmentAnalysis for ASIMD Extract in lowerarmarch.cpp

62ec8a5

Add ContainmentAnalysis for ASIMD ExtractVector64 and ExtractVector12…

e75ed7f

…8 in lowerarmarch.cpp

Add ContainmentAnalysis for ASIMD Insert in lowerarmarch.cpp

1cc6804

Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 15, 2020

tannergooding reviewed Apr 16, 2020

View reviewed changes

tannergooding mentioned this pull request Apr 16, 2020

[Arm64] More overloads for *BySelectedScalar methods #33683

Closed

echesakov added 2 commits April 16, 2020 12:02

Re-refactor impHWIntrinsic in hwintrinsic.cpp

c2416a0

Has HWIntrinsicInfo::isInImmRange accept unnecessary arguments on x86…

a12ed8d

… to avoid ifdef-s at places where this function is used in hwintrinsic.h hwintrinsicarm64.cpp hwintrinsicxarch.cpp

jaredpar mentioned this pull request Apr 17, 2020

Hitting restore timeouts with NuGet #35074

Closed

CarolEidt reviewed Apr 20, 2020

View reviewed changes

This was referenced Apr 20, 2020

Add fmov arm64 intrinsic in JIT to implement Vector*.CreateScalarUnsafe API #34485

Closed

Implement Vector{Size}<T>.AllBitsSet #33924

Merged

echesakov added 9 commits April 20, 2020 18:04

Remove Extract methods that operate on 64x1_t types in AdvSimd.cs Adv…

2436d51

…Simd.PlatformNotSupported.cs

Remove Insert methods that operate on 64x1_t types in AdvSimd.cs AdvS…

2b710cc

…imd.PlatformNotSupported.cs

Update System.Runtime.Intrinsics.Experimental.cs

19394b0

Update comment for Compiler::getArgForHWIntrinsic in hwintrinsic.cpp

532a9f4

Add comment and rename elemType->baseType in HWIntrinsicInfo::lookupI…

347b2c8

…mmUpperBound in hwintrinsicarm64.cpp

Rename EmitAtFirst->EmitBegin and EmitAfterCase->EmitCaseEnd in CodeG…

02892e4

…en::HWIntrinsicImmOpHelper in hwintrinsiccodegenarm64.cpp

Rename BranchAtNonZero->TestImmOpZeroOrOne in CodeGen::HWIntrinsicImm…

7fc1c43

…OpHelper in hwintrinsiccodegenarm64.cpp

Document HWIntrinsicImmOpHelper class and how it should be used in co…

6ba615f

…degen.h and hwintrinsiccodegenarm64.cpp

Make more flags in HWIntrinsicFlag platform-specific and re-order the…

65ca881

…m with update their values in hwintrinsic.h

echesakov closed this Apr 21, 2020

echesakov reopened this Apr 21, 2020

echesakov marked this pull request as ready for review April 21, 2020 20:22

Merge branch 'master' into Arm64-ASIMD-Extract-Insert-ExtractVector64…

b0bcb43

…-ExtractVector128

CarolEidt approved these changes Apr 22, 2020

View reviewed changes

tannergooding approved these changes Apr 22, 2020

View reviewed changes

echesakov merged commit 32dd7d4 into dotnet:master Apr 22, 2020

echesakov deleted the Arm64-ASIMD-Extract-Insert-ExtractVector64-ExtractVector128 branch April 22, 2020 18:01

tannergooding mentioned this pull request Apr 23, 2020

Update the x86 hwintrinsic list to match the arm64 layout #35364

Merged

echesakov mentioned this pull request Apr 30, 2020

[Arm64] Implement ASIMD widening, narrowing, saturating intrinsics #35612

Merged

ghost locked as resolved and limited conversation to collaborators Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Arm64] Implement ASIMD Extract Insert ExtractVector64 ExtractVector128 #35030

[Arm64] Implement ASIMD Extract Insert ExtractVector64 ExtractVector128 #35030

echesakov commented Apr 15, 2020 •

edited

Loading

tannergooding Apr 16, 2020

echesakov Apr 16, 2020

tannergooding Apr 16, 2020

echesakov Apr 16, 2020

tannergooding Apr 16, 2020

CarolEidt Apr 20, 2020

tannergooding Apr 21, 2020

CarolEidt left a comment

CarolEidt Apr 20, 2020

CarolEidt Apr 20, 2020

CarolEidt Apr 20, 2020

echesakov Apr 20, 2020

CarolEidt Apr 21, 2020

echesakov Apr 21, 2020

CarolEidt Apr 21, 2020

CarolEidt Apr 20, 2020

echesakov commented Apr 21, 2020

CarolEidt left a comment

[Arm64] Implement ASIMD Extract Insert ExtractVector64 ExtractVector128 #35030

[Arm64] Implement ASIMD Extract Insert ExtractVector64 ExtractVector128 #35030

Conversation

echesakov commented Apr 15, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarolEidt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

echesakov commented Apr 21, 2020

CarolEidt left a comment

Choose a reason for hiding this comment

echesakov commented Apr 15, 2020 •

edited

Loading