Implement Vector.AddSaturate/SubtractSaturate #107193

lilinus · 2024-08-30T16:31:03Z

Implement #82559

dotnet-issue-labeler · 2024-08-30T16:31:09Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

dotnet-issue-labeler · 2024-08-30T16:31:10Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

dotnet-policy-service · 2024-08-30T16:31:47Z

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

src/libraries/System.Private.CoreLib/src/System/Numerics/Vector.cs

xtqqczze · 2024-09-15T18:52:21Z

Could the existing internal AddSaturate and SubtractSaturate methods be removed?

runtime/src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector128.cs

Line 3904 in 4c10eff

    
           internal static Vector128<byte> AddSaturate(Vector128<byte> left, Vector128<byte> right)

lilinus · 2024-09-16T08:42:03Z

Could the existing internal AddSaturate and SubtractSaturate methods be removed?

I removed the existing methods that I could find in this PR, but perhaps there are additional methods I have missed.

tannergooding · 2024-09-16T14:54:21Z

src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector128.cs

+            if (AdvSimd.IsSupported)
+            {
+                if (typeof(T) == typeof(byte))
+                {
+                    return AdvSimd.AddSaturate(left.AsByte(), right.AsByte()).As<byte, T>();
+                }
+                if (typeof(T) == typeof(sbyte))
+                {
+                    return AdvSimd.AddSaturate(left.AsSByte(), right.AsSByte()).As<sbyte, T>();
+                }
+                if (typeof(T) == typeof(short))
+                {
+                    return AdvSimd.AddSaturate(left.AsInt16(), right.AsInt16()).As<short, T>();
+                }
+                if (typeof(T) == typeof(ushort))
+                {
+                    return AdvSimd.AddSaturate(left.AsUInt16(), right.AsUInt16()).As<ushort, T>();
+                }
+                if (typeof(T) == typeof(int))
+                {
+                    return AdvSimd.AddSaturate(left.AsInt32(), right.AsInt32()).As<int, T>();
+                }
+                if (typeof(T) == typeof(uint))
+                {
+                    return AdvSimd.AddSaturate(left.AsUInt32(), right.AsUInt32()).As<uint, T>();
+                }
+                if (typeof(T) == typeof(long))
+                {
+                    return AdvSimd.AddSaturate(left.AsInt64(), right.AsInt64()).As<long, T>();
+                }
+                if (typeof(T) == typeof(ulong))
+                {
+                    return AdvSimd.AddSaturate(left.AsUInt64(), right.AsUInt64()).As<ulong, T>();
+                }
+            }
+
+            if (Sse2.IsSupported)
+            {
+                if (typeof(T) == typeof(byte))
+                {
+                    return Sse2.AddSaturate(left.AsByte(), right.AsByte()).As<byte, T>();
+                }
+                if (typeof(T) == typeof(sbyte))
+                {
+                    return Sse2.AddSaturate(left.AsSByte(), right.AsSByte()).As<sbyte, T>();
+                }
+                if (typeof(T) == typeof(short))
+                {
+                    return Sse2.AddSaturate(left.AsInt16(), right.AsInt16()).As<short, T>();
+                }
+                if (typeof(T) == typeof(ushort))
+                {
+                    return Sse2.AddSaturate(left.AsUInt16(), right.AsUInt16()).As<ushort, T>();
+                }
+            }
+
+            if (PackedSimd.IsSupported)
+            {
+                if (typeof(T) == typeof(byte))
+                {
+                    return PackedSimd.AddSaturate(left.AsByte(), right.AsByte()).As<byte, T>();
+                }
+                if (typeof(T) == typeof(sbyte))
+                {
+                    return PackedSimd.AddSaturate(left.AsSByte(), right.AsSByte()).As<sbyte, T>();
+                }
+                if (typeof(T) == typeof(short))
+                {
+                    return PackedSimd.AddSaturate(left.AsInt16(), right.AsInt16()).As<short, T>();
+                }
+                if (typeof(T) == typeof(ushort))
+                {
+                    return PackedSimd.AddSaturate(left.AsUInt16(), right.AsUInt16()).As<ushort, T>();
+                }
+            }
+
+            if (IsHardwareAccelerated)
+            {
+                return VectorMath.AddSaturate<Vector128<T>, T>(left, right);
+            }
+
+            return Create(
+                Vector64.AddSaturate(left._lower, right._lower),
+                Vector64.AddSaturate(left._upper, right._upper)
+            );


This is not an approach we want to take for most of the xplat APIs, which are considered "perf critical".

Rather instead we want them to be implemented in the JIT so that they don't eat away at the inlining budget or run into other issues.

Doing this requires adding an AddSaturate entry to https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsiclistxarch.h and https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsiclistarm64.h, for the relevant vector sizes (and mostly mirroring the entry for op_Additition)

You'd then add handling for that in https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsicxarch.cpp#L1387-L1402 and https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsicarm64.cpp#L700-L710, mostly following op_Addition again; but since we don't have a general GT_* kind, you'd instead use gtNewSimdHWIntrinsicNode(retType, op1, op2, intrinsic, simdBaseJitType, simdSize) where intrinsic is NI_ISA_Name, such as NI_SSE2_AddSaturate

For int, uint, long, and ulong on x86/x64, you'd need to implement handling as well. Unsigned is simple as its effectively just the following, as x + y will always be greater than or equal to either input, unless it overflows:

var tmp = x + y; return Vector.ConditionalSelect( Vector.LessThan(tmp, x), MaxValue, tmp );

Signed is a bit trickier, but it basically boils down to (there may be a more efficient way, but this is the basics):

var z = x + y; return Vector.ConditionalSelect( (((x ^ y) ^ SignMask) & (x ^ z)) >> (sizeof(T) * 8 - 1), SignMask ^ (z >> (sizeof(T) * 8 - 1))), z );

This works because x + y for differing signs cannot overflow; while for same signs it can. In general, given two bool you can detect equality via x ^ y ^ 1 and inequality via x ^ y. Given that we want (signX == signY) && (signX != signZ) that gives us the (x ^ y ^ 1) & (x ^ z) given above to determine if overflow occurred. We then arithmetic right shift to propagate the bit so we get AllBitsSet (overflow occurred) or Zero (no overflow) per-element.

If overflow did occur, then we know that a negative result means it should be MaxValue while a positive result means it should be MinValue. Artihmetic shifting z gives us AllBitsSet (negative) or Zero (positive) on a per-element basis, we just need to xor with the sign mask. This gives us 0xFFFF_FFFF ^ 0x8000_0000 or 0x0000_0000 ^ 0x8000_0000, thus negative results become 0x7FFF_FFFF (MaxValue) and positive results become 0x8000_0000 (MinValue)

This is not an approach we want to take for most of the xplat APIs, which are considered "perf critical".

Understood. Thanks for the clear instructions on how to implement this in JIT instead 👍 .

For int, uint, long, and ulong on x86/x64, you'd need to implement handling as well.

There is a "fallback" algorithm in the PR already in VectorMath class.
Should the substitution be done in JIT as well for x86/x64 case too, or does it suffice to leave as it is for those cases? If handled in JIT, should the fallback in VectorMath be kept?

I'll try setting this PR as draft until I have successfully made necessary changes.

am11 · 2024-09-19T06:43:52Z

@lilinus in case you didn't knew, there is a patch created by the format leg: https://github.com/dotnet/runtime/actions/runs/10928828860?pr=107193 (under artifacts)

$ cd /path/to/runtime
$ unzip ~/Downloads/format.linux.patch.zip
$ git apply format.patch
$ rm format.patch
# commit and push

tannergooding · 2024-11-08T17:35:27Z

I should be getting to this soon, just working through the backlog of PRs now that I can start focusing on things for .NET 10

lilinus added 2 commits August 30, 2024 18:29

Implement Vector.AddSaturate / SubtractSaturate

3a12fe6

Add tests

ad5f344

dotnet-issue-labeler bot added area-System.Numerics new-api-needs-documentation labels Aug 30, 2024

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Aug 30, 2024

lilinus changed the title ~~Add sub saturate~~ Implement.AddSaturate/SubtractSaturate Aug 30, 2024

lilinus commented Aug 30, 2024

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Numerics/Vector.cs Show resolved Hide resolved

Cleanup VectorMath

9ee124c

build-analysis bot mentioned this pull request Sep 4, 2024

restarted. Azure DevOps can't recover from restarts. dotnet/dnceng#3879

Open

3 tasks

lilinus changed the title ~~Implement.AddSaturate/SubtractSaturate~~ Implement Vector.AddSaturate/SubtractSaturate Sep 16, 2024

lilinus marked this pull request as ready for review September 16, 2024 08:48

tannergooding reviewed Sep 16, 2024

View reviewed changes

lilinus and others added 3 commits September 17, 2024 11:26

Optimize add/sub saturate fallback

0e69a94

Implement intrinsics in runtime

6cf7033

Merge branch 'main' into add-sub-saturate

3defc25

build-analysis bot mentioned this pull request Sep 18, 2024

SIGKILL (OOM?) while running LibraryImportGenerator.Tests w/o actionable log messages or artifacts dotnet/dnceng#2496

Open

3 tasks

Fixes to Vector.Add/SubSaturate

e7a637b

lilinus and others added 2 commits September 19, 2024 09:22

Apply format patch

435a7be

Merge branch 'main' into add-sub-saturate

d357672

build-analysis bot mentioned this pull request Sep 19, 2024

ProcessThreadTests.TestStartTimeProperty failure in CI #105526

Open

build-analysis bot mentioned this pull request Sep 5, 2024

System.Runtime.Serialization.Formatters CI failure. #107309

Closed

Merge branch 'main' into add-sub-saturate

ec2b8c4

build-analysis bot mentioned this pull request Nov 8, 2024

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Vector.AddSaturate/SubtractSaturate #107193

Implement Vector.AddSaturate/SubtractSaturate #107193

lilinus commented Aug 30, 2024

dotnet-issue-labeler bot commented Aug 30, 2024

dotnet-issue-labeler bot commented Aug 30, 2024

dotnet-policy-service bot commented Aug 30, 2024

xtqqczze commented Sep 15, 2024

lilinus commented Sep 16, 2024 •

edited

Loading

tannergooding Sep 16, 2024

lilinus Sep 17, 2024 •

edited

Loading

am11 commented Sep 19, 2024

tannergooding commented Nov 8, 2024

Implement Vector.AddSaturate/SubtractSaturate #107193

Are you sure you want to change the base?

Implement Vector.AddSaturate/SubtractSaturate #107193

Conversation

lilinus commented Aug 30, 2024

dotnet-issue-labeler bot commented Aug 30, 2024

dotnet-issue-labeler bot commented Aug 30, 2024

dotnet-policy-service bot commented Aug 30, 2024

xtqqczze commented Sep 15, 2024

lilinus commented Sep 16, 2024 • edited Loading

tannergooding Sep 16, 2024

Choose a reason for hiding this comment

lilinus Sep 17, 2024 • edited Loading

Choose a reason for hiding this comment

am11 commented Sep 19, 2024

tannergooding commented Nov 8, 2024

lilinus commented Sep 16, 2024 •

edited

Loading

lilinus Sep 17, 2024 •

edited

Loading