Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Bad codegen with Avx512DQ.VL.Range #106610

Closed
jakobbotsch opened this issue Aug 19, 2024 · 9 comments · Fixed by #106711
Closed

JIT: Bad codegen with Avx512DQ.VL.Range #106610

jakobbotsch opened this issue Aug 19, 2024 · 9 comments · Fixed by #106711
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI in-pr There is an active PR which will close this issue when it is merged
Milestone

Comments

@jakobbotsch
Copy link
Member

// Generated by Fuzzlyn v2.2 on 2024-08-17 17:40:06
// Run on X86 Windows
// Seed: 1343518557351353159-vectort,vector128,vector256,vector512,x86aes,x86avx,x86avx2,x86avx512bw,x86avx512bwvl,x86avx512cd,x86avx512cdvl,x86avx512dq,x86avx512dqvl,x86avx512f,x86avx512fvl,x86avx512vbmi,x86avx512vbmivl,x86bmi1,x86bmi2,x86fma,x86lzcnt,x86pclmulqdq,x86popcnt,x86sse,x86sse2,x86sse3,x86sse41,x86sse42,x86ssse3,x86x86base
// Reduced from 171.2 KiB to 0.6 KiB in 00:06:37
// Debug: Outputs <4292870144, 0, 0, 0, 0, 0, 0, 0>
// Release: Outputs <0, 0, 0, 0, 0, 0, 0, 0>
using System;
using System.Runtime.CompilerServices;
using System.Numerics;
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;

public class C1
{
    public Vector256<float> F5;
    public C1(Vector256<float> f5)
    {
        F5 = f5;
    }
}

public class Program
{
    public static void Main()
    {
        var vr4 = Vector256.Create<float>(0);
        var vr5 = Vector256.CreateScalar(1f);
        var vr6 = Vector256.CreateScalar(-10f);
        var vr7 = Avx.Or(vr5, vr6);
        C1 vr8 = new C1(Avx512DQ.VL.Range(vr4, vr7, 0));
        System.Console.WriteLine(vr8.F5.AsUInt32());
    }
}

cc @dotnet/jit-contrib

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 19, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Aug 19, 2024
@JulieLeeMSFT JulieLeeMSFT added this to the 9.0.0 milestone Aug 19, 2024
@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Aug 19, 2024
@AndyAyersMS
Copy link
Member

Preview 7 is ok

G_M000_IG01:                ;; offset=0x0000
       sub      esp, 32

G_M000_IG02:                ;; offset=0x0003
       vxorps   ymm0, ymm0, ymm0
       vrangeps ymm0, ymm0, ymmword ptr [@RWD00], 0
       vmovups  ymmword ptr [esp], ymm0
       mov      ecx, 0x8B3C8AC
       call     CORINFO_HELP_NEWSFAST
       vmovups  ymm0, ymmword ptr [esp]
       vmovups  ymmword ptr [eax+0x04], ymm0
       mov      ecx, eax
       call     [System.Console:WriteLine(System.Object)]

G_M000_IG03:                ;; offset=0x0033
       vzeroupper
       add      esp, 32
       ret

RWD00   dq      00000000FFA00000h, 0000000000000000h, 0000000000000000h, 0000000000000000h

but recent builds are not

G_M27646_IG01:  ;; offset=0x0000
       sub      esp, 32
                                                ;; size=3 bbWeight=1 PerfScore 0.25
G_M27646_IG02:  ;; offset=0x0003
       vxorps   ymm0, ymm0, ymm0
       vrangeps ymm0, ymm0, ymmword ptr [@RWD00], 0
       vmovups  ymmword ptr [esp], ymm0
       mov      ecx, 0xB6632E0      ; System.Runtime.Intrinsics.Vector256`1[uint]
       call     CORINFO_HELP_NEWSFAST
       vmovups  ymm0, ymmword ptr [esp]
       vmovups  ymmword ptr [eax+0x04], ymm0
       mov      ecx, eax
       call     [System.Console:WriteLine(System.Object)]
                                                ;; size=48 bbWeight=1 PerfScore 16.83
G_M27646_IG03:  ;; offset=0x0033
       vzeroupper
       add      esp, 32
       ret
                                                ;; size=7 bbWeight=1 PerfScore 2.25
RWD00   dq      00000000FFE00000h, 0000000000000000h, 0000000000000000h, 0000000000000000h

Only diff is the value in RWD00.

@AndyAyersMS
Copy link
Member

***** BB01 [0000]
STMT00001 ( ??? ... 0x025 )
               [000009] DA---------                         *  STORE_LCL_VAR simd32<System.Runtime.Intrinsics.Vector256`1> V01 loc1         
               [000008] -----------                         \--*  HWINTRINSIC simd32 float Or
               [000003] -----------                            +--*  CNS_VEC   simd32<0x000000003f800000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000>
               [000005] -----------                            \--*  CNS_VEC   simd32<0x00000000c1200000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000>

becomes

               [000009] DA---------                         *  STORE_LCL_VAR simd32<System.Runtime.Intrinsics.Vector256`1> V01 loc1         
               [000005] -----+-----                         \--*  CNS_VEC   simd32<0x00000000ffe00000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000>

so somehow an extra bit gets set in the vector constant (FFE instead of FFA).

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Aug 20, 2024

Looks like this is nan normalization? 0xFFA00000 is a nan float, and we normalize it to 0xFFE00000.

@tannergooding
Copy link
Member

Not quite NaN normalization, but rather 0xFFA0_0000 is an sNaN (signalling NaN) because the most significant bit of the significand is zero and sNaN when encountered by the CPU raise the "signal" (throw an exception if IEEE 754 floating-point exception handling is enabled, which it never is for .NET), and then suppress the signal from being raised again in the future by setting the bit that converts it to a qNaN (quiet NaN).

[0xFFC0_0000, 0xFFFF_FFFF] is -qNaN
[0xFF80_0001, 0xFFBF_FFFF] is -sNaN
0xFF80_0000 is -inf

0x7F80_0000 is +inf
[0x7F80_0001, 0x7FBF_FFFF] is +sNaN
[0x7FC0_0000, 0x7FFF_FFFF] is +qNaN

In general we (RyuJIT) try and avoid any explicit normalization and instead try to explicitly propagate wherever applicable (that is, if a NaN is passed in, we return precisely that NaN back). The only places we explicitly produce canonical NaN is when neither input is a NaN (so -inf + inf produces NaN, this requires a net new NaN to be produced so we choose the canonical NaN of 0xFFC0_0000 on xarch and 0x7FC0_0000 on arm/arm64/loongarch64/riscv64).

@tannergooding
Copy link
Member

So this is somewhat expected behavior, but it would still be interesting to find out what is causing the CPU to see it as an sNaN for this particular case. I expect there's some place its being seen as a scalar float and we could instead pass it through as a scalar uint32_t to ensure that operations like and, not, or, xor, left shift, and right shift don't exhibit such behavior.

@AndyAyersMS
Copy link
Member

The path we take is:

gtFoldExprHWIntrinsic

               [000008] ----------- >>>                     *  HWINTRINSIC simd32 float Or
               [000005] -----+-----                         +--*  CNS_VEC   simd32<0x00000000c1200000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000>
               [000003] -----+-----                         \--*  CNS_VEC   simd32<0x000000003f800000, 0x0000000000000000, 0x0000000000000000, 0x0000000000000000>

GenTreeVecCon::EvaluateBinaryInPlace
EvaluateBinarySimd<simd32_t>
EvaluateBinarySimd<simd32_t,float>
EvaluateBinaryScalar<float>(genTreeOps oper, float arg0, float arg1)
EvaluateBinaryScalarSpecialized<float>(genTreeOps oper, float arg0, float arg1)

uint32_t resultBits is 0xFFA0_0000

UInt32BitsToSingle produces float 0xFFE0_0000

@tannergooding
Copy link
Member

👍, so we'd need to intercept this prir to extracting the scalar value out of the SIMD input (that is prior to EvaluateBinaryScalar

I think the simplest place to do that is EvaluateBinarySimd<simd32_t>, basically check for oper in case TYP_FLOAT/TYP_DOUBLE and dispatch instead to EvaluateBinarySimd<TSimd, int32_t/int64_t>: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/simd.h#L1189-L1199

I can get a fix up for this.

@AndyAyersMS
Copy link
Member

Ok, I will reassign this one to you.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI in-pr There is an active PR which will close this issue when it is merged
Projects
None yet
4 participants