Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Vector.AddSaturate/SubtractSaturate #107193

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

lilinus
Copy link
Contributor

@lilinus lilinus commented Aug 30, 2024

Implement #82559

Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

1 similar comment
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Aug 30, 2024
@lilinus lilinus changed the title Add sub saturate Implement.AddSaturate/SubtractSaturate Aug 30, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

@xtqqczze
Copy link
Contributor

Could the existing internal AddSaturate and SubtractSaturate methods be removed?

internal static Vector128<byte> AddSaturate(Vector128<byte> left, Vector128<byte> right)

@lilinus lilinus changed the title Implement.AddSaturate/SubtractSaturate Implement Vector.AddSaturate/SubtractSaturate Sep 16, 2024
@lilinus
Copy link
Contributor Author

lilinus commented Sep 16, 2024

Could the existing internal AddSaturate and SubtractSaturate methods be removed?

I removed the existing methods that I could find in this PR, but perhaps there are additional methods I have missed.

@lilinus lilinus marked this pull request as ready for review September 16, 2024 08:48
Comment on lines 95 to 179
if (AdvSimd.IsSupported)
{
if (typeof(T) == typeof(byte))
{
return AdvSimd.AddSaturate(left.AsByte(), right.AsByte()).As<byte, T>();
}
if (typeof(T) == typeof(sbyte))
{
return AdvSimd.AddSaturate(left.AsSByte(), right.AsSByte()).As<sbyte, T>();
}
if (typeof(T) == typeof(short))
{
return AdvSimd.AddSaturate(left.AsInt16(), right.AsInt16()).As<short, T>();
}
if (typeof(T) == typeof(ushort))
{
return AdvSimd.AddSaturate(left.AsUInt16(), right.AsUInt16()).As<ushort, T>();
}
if (typeof(T) == typeof(int))
{
return AdvSimd.AddSaturate(left.AsInt32(), right.AsInt32()).As<int, T>();
}
if (typeof(T) == typeof(uint))
{
return AdvSimd.AddSaturate(left.AsUInt32(), right.AsUInt32()).As<uint, T>();
}
if (typeof(T) == typeof(long))
{
return AdvSimd.AddSaturate(left.AsInt64(), right.AsInt64()).As<long, T>();
}
if (typeof(T) == typeof(ulong))
{
return AdvSimd.AddSaturate(left.AsUInt64(), right.AsUInt64()).As<ulong, T>();
}
}

if (Sse2.IsSupported)
{
if (typeof(T) == typeof(byte))
{
return Sse2.AddSaturate(left.AsByte(), right.AsByte()).As<byte, T>();
}
if (typeof(T) == typeof(sbyte))
{
return Sse2.AddSaturate(left.AsSByte(), right.AsSByte()).As<sbyte, T>();
}
if (typeof(T) == typeof(short))
{
return Sse2.AddSaturate(left.AsInt16(), right.AsInt16()).As<short, T>();
}
if (typeof(T) == typeof(ushort))
{
return Sse2.AddSaturate(left.AsUInt16(), right.AsUInt16()).As<ushort, T>();
}
}

if (PackedSimd.IsSupported)
{
if (typeof(T) == typeof(byte))
{
return PackedSimd.AddSaturate(left.AsByte(), right.AsByte()).As<byte, T>();
}
if (typeof(T) == typeof(sbyte))
{
return PackedSimd.AddSaturate(left.AsSByte(), right.AsSByte()).As<sbyte, T>();
}
if (typeof(T) == typeof(short))
{
return PackedSimd.AddSaturate(left.AsInt16(), right.AsInt16()).As<short, T>();
}
if (typeof(T) == typeof(ushort))
{
return PackedSimd.AddSaturate(left.AsUInt16(), right.AsUInt16()).As<ushort, T>();
}
}

if (IsHardwareAccelerated)
{
return VectorMath.AddSaturate<Vector128<T>, T>(left, right);
}

return Create(
Vector64.AddSaturate(left._lower, right._lower),
Vector64.AddSaturate(left._upper, right._upper)
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an approach we want to take for most of the xplat APIs, which are considered "perf critical".

Rather instead we want them to be implemented in the JIT so that they don't eat away at the inlining budget or run into other issues.

Doing this requires adding an AddSaturate entry to https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsiclistxarch.h and https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsiclistarm64.h, for the relevant vector sizes (and mostly mirroring the entry for op_Additition)

You'd then add handling for that in https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsicxarch.cpp#L1387-L1402 and https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsicarm64.cpp#L700-L710, mostly following op_Addition again; but since we don't have a general GT_* kind, you'd instead use gtNewSimdHWIntrinsicNode(retType, op1, op2, intrinsic, simdBaseJitType, simdSize) where intrinsic is NI_ISA_Name, such as NI_SSE2_AddSaturate

For int, uint, long, and ulong on x86/x64, you'd need to implement handling as well. Unsigned is simple as its effectively just the following, as x + y will always be greater than or equal to either input, unless it overflows:

var tmp = x + y;
return Vector.ConditionalSelect(
    Vector.LessThan(tmp, x),
    MaxValue,
    tmp
);

Signed is a bit trickier, but it basically boils down to (there may be a more efficient way, but this is the basics):

var z = x + y;

return Vector.ConditionalSelect(
    (((x ^ y) ^ SignMask) & (x ^ z)) >> (sizeof(T) * 8 - 1),
    SignMask ^ (z >> (sizeof(T) * 8 - 1))),
    z
);

This works because x + y for differing signs cannot overflow; while for same signs it can. In general, given two bool you can detect equality via x ^ y ^ 1 and inequality via x ^ y. Given that we want (signX == signY) && (signX != signZ) that gives us the (x ^ y ^ 1) & (x ^ z) given above to determine if overflow occurred. We then arithmetic right shift to propagate the bit so we get AllBitsSet (overflow occurred) or Zero (no overflow) per-element.

If overflow did occur, then we know that a negative result means it should be MaxValue while a positive result means it should be MinValue. Artihmetic shifting z gives us AllBitsSet (negative) or Zero (positive) on a per-element basis, we just need to xor with the sign mask. This gives us 0xFFFF_FFFF ^ 0x8000_0000 or 0x0000_0000 ^ 0x8000_0000, thus negative results become 0x7FFF_FFFF (MaxValue) and positive results become 0x8000_0000 (MinValue)

Copy link
Contributor Author

@lilinus lilinus Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an approach we want to take for most of the xplat APIs, which are considered "perf critical".

Understood. Thanks for the clear instructions on how to implement this in JIT instead 👍 .

For int, uint, long, and ulong on x86/x64, you'd need to implement handling as well.

There is a "fallback" algorithm in the PR already in VectorMath class.
Should the substitution be done in JIT as well for x86/x64 case too, or does it suffice to leave as it is for those cases? If handled in JIT, should the fallback in VectorMath be kept?

I'll try setting this PR as draft until I have successfully made necessary changes.

@am11
Copy link
Member

am11 commented Sep 19, 2024

@lilinus in case you didn't knew, there is a patch created by the format leg: https://github.com/dotnet/runtime/actions/runs/10928828860?pr=107193 (under artifacts)

$ cd /path/to/runtime
$ unzip ~/Downloads/format.linux.patch.zip
$ git apply format.patch
$ rm format.patch
# commit and push

@tannergooding
Copy link
Member

I should be getting to this soon, just working through the backlog of PRs now that I can start focusing on things for .NET 10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Numerics community-contribution Indicates that the PR has been added by a community member new-api-needs-documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants