Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose System.Runtime.Intrinsics.X86.Avx512F #73604

Closed
Tracked by #77034 ...
tannergooding opened this issue Aug 9, 2022 · 16 comments
Closed
Tracked by #77034 ...

Expose System.Runtime.Intrinsics.X86.Avx512F #73604

tannergooding opened this issue Aug 9, 2022 · 16 comments
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.Intrinsics avx512 Related to the AVX-512 architecture
Milestone

Comments

@tannergooding
Copy link
Member

tannergooding commented Aug 9, 2022

Summary

Today .NET exposes hardware specific intrinsics for most of the x86/x64 optional ISAs. However, we do not currently support AVX-512 despite it having been out for several years.

Important Notes: See #73604 (comment) where a longer explanation of some concepts and details around other ISAs and AVX-512 instruction support is given.

API Proposal

We should expose the Avx512F class as the first part of the AVX512 feature set.

public enum FloatRoundingMode : byte
{
    ToEven = 0x08,                // _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC
    ToNegativeInfinity = 0x09,    // _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC
    ToPositiveInfinity = 0x0A,    // _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC
    ToZero = 0x0B,                // _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC
}

public abstract partial class Avx512F : Avx2
{
    public static new bool IsSupported { get; }

    // SSE-SSE4.2

    public static Vector512<int> Abs(Vector512<int> value);
    public static Vector512<long> Abs(Vector512<long> value);

    public static Vector512<double> Add(Vector512<double> left, Vector512<double> right);
    public static Vector512<double> Add(Vector512<double> left, Vector512<double> right, FloatRoundingMode mode);

    public static Vector512<int> Add(Vector512<int> left, Vector512<int> right);
    public static Vector512<long> Add(Vector512<long> left, Vector512<long> right);

    public static Vector512<float> Add(Vector512<float> left, Vector512<float> right);
    public static Vector512<float> Add(Vector512<float> left, Vector512<float> right, FloatRoundingMode mode);

    public static Vector128<double> AddScalar(Vector128<double> left, Vector128<double> right, FloatRoundingMode mode);
    public static Vector128<float> AddScalar(Vector128<float> left, Vector128<float> right, FloatRoundingMode mode);

    public static Vector512<int> And(Vector512<int> left, Vector512<int> right);
    public static Vector512<long> And(Vector512<long> left, Vector512<long> right);

    public static Vector512<int> AndNot(Vector512<int> left, Vector512<int> right);
    public static Vector512<long> AndNot(Vector512<long> left, Vector512<long> right);

    public static Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, Vector128<double> value, FloatRoundingMode mode);
    public static Vector128<double> ConvertScalarToVector128Double(Vector128<double> upper, Vector128<float> value, FloatRoundingMode mode);

    public static int ConvertToInt32(Vector128<double> value, FloatRoundingMode mode);
    public static int ConvertToInt32(Vector128<float> value, FloatRoundingMode mode);

    public static Vector512<double> ConvertToVector256Double(Vector256<int> value);

    public static Vector256<int> ConvertToVector256Int32(Vector512<double> value);
    public static Vector256<int> ConvertToVector256Int32(Vector512<double> value, FloatRoundingMode mode);

    public static Vector512<float> ConvertToVector256Single(Vector512<int> value);
    public static Vector512<float> ConvertToVector256Single(Vector512<int> value, FloatRoundingMode mode);

    public static Vector256<float> ConvertToVector256Single(Vector512<double> value);
    public static Vector256<float> ConvertToVector256Single(Vector512<double> value, FloatRoundingMode mode);

    public static Vector512<double> ConvertToVector512Double(Vector256<float> value);

    public static Vector512<int> ConvertToVector512Int32(Vector128<byte> value);
    public static Vector512<int> ConvertToVector512Int32(Vector512<short> value);
    public static Vector512<int> ConvertToVector512Int32(Vector512<sbyte> value);
    public static Vector512<int> ConvertToVector512Int32(Vector128<ushort> value);

    public static Vector512<int> ConvertToVector512Int32(Vector512<float> value);
    public static Vector512<int> ConvertToVector512Int32(Vector512<float> value, FloatRoundingMode mode);

    public static Vector256<int> ConvertToVector256Int32WithTruncation(Vector512<double> value);
    public static Vector512<int> ConvertToVector512Int32WithTruncation(Vector512<float> value);

    public static Vector512<long> ConvertToVector512Int64(Vector128<byte> value);
    public static Vector512<long> ConvertToVector512Int64(Vector512<short> value);
    public static Vector512<long> ConvertToVector512Int64(Vector512<int> value);
    public static Vector512<long> ConvertToVector512Int64(Vector512<sbyte> value);
    public static Vector512<long> ConvertToVector512Int64(Vector256<uint> value);
    public static Vector512<long> ConvertToVector512Int64(Vector256<ushort> value);

    public static Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, int value, FloatRoundingMode mode);

    public static Vector512<double> Divide(Vector512<double> left, Vector512<double> right);
    public static Vector512<double> Divide(Vector512<double> left, Vector512<double> right, FloatRoundingMode mode);

    public static Vector512<float> Divide(Vector512<float> left, Vector512<float> right);
    public static Vector512<float> Divide(Vector512<float> left, Vector512<float> right, FloatRoundingMode mode);

    public static Vector128<double> DivideScalar(Vector128<double> left, Vector128<double> right, FloatRoundingMode mode);
    public static Vector128<float> DivideScalar(Vector128<float> left, Vector128<float> right, FloatRoundingMode mode);

    public static Vector512<float> DuplicateOddIndexed(Vector512<float> value);
    public static Vector512<float> DuplicateEvenIndexed(Vector512<float> value);

    public static Vector512<byte> LoadVector512(byte* address);
    public static Vector512<double> LoadVector512(double* address);
    public static Vector512<short> LoadVector512(short* address);
    public static Vector512<int> LoadVector512(int* address);
    public static Vector512<long> LoadVector512(long* address);
    public static Vector512<nint> LoadVector512(nint* address);
    public static Vector512<sbyte> LoadVector512(sbyte* address);
    public static Vector512<float> LoadVector512(float* address);
    public static Vector512<ushort> LoadVector512(ushort* address);
    public static Vector512<uint> LoadVector512(uint* address);
    public static Vector512<ulong> LoadVector512(ulong* address);
    public static Vector512<nuint> LoadVector512(nuint* address);

    public static Vector512<byte> LoadAlignedVector512(byte* address);
    public static Vector512<double> LoadAlignedVector512(double* address);
    public static Vector512<short> LoadAlignedVector512(short* address);
    public static Vector512<int> LoadAlignedVector512(int* address);
    public static Vector512<long> LoadAlignedVector512(long* address);
    public static Vector512<nint> LoadAlignedVector512(nint* address);
    public static Vector512<sbyte> LoadAlignedVector512(sbyte* address);
    public static Vector512<float> LoadAlignedVector512(float* address);
    public static Vector512<ushort> LoadAlignedVector512(ushort* address);
    public static Vector512<uint> LoadAlignedVector512(uint* address);
    public static Vector512<ulong> LoadAlignedVector512(ulong* address);
    public static Vector512<nuint> LoadAlignedVector512(nuint* address);

    public static Vector512<byte> LoadAlignedVector512NonTemporal(byte* address);
    public static Vector512<short> LoadAlignedVector512NonTemporal(short* address);
    public static Vector512<int> LoadAlignedVector512NonTemporal(int* address);
    public static Vector512<long> LoadAlignedVector512NonTemporal(long* address);
    public static Vector512<nint> LoadAlignedVector512NonTemporal(nint* address);
    public static Vector512<sbyte> LoadAlignedVector512NonTemporal(sbyte* address);
    public static Vector512<ushort> LoadAlignedVector512NonTemporal(ushort* address);
    public static Vector512<uint> LoadAlignedVector512NonTemporal(uint* address);
    public static Vector512<ulong> LoadAlignedVector512NonTemporal(ulong* address);
    public static Vector512<nuint> LoadAlignedVector512NonTemporal(nuint* address);

    public static Vector512<double> Max(Vector512<double> left, Vector512<double> right);
    public static Vector512<int> Max(Vector512<int> left, Vector512<int> right);
    public static Vector512<long> Max(Vector512<long> left, Vector512<long> right);
    public static Vector512<float> Max(Vector512<float> left, Vector512<float> right);
    public static Vector512<uint> Max(Vector512<uint> left, Vector512<uint> right);
    public static Vector512<ulong> Max(Vector512<ulong> left, Vector512<ulong> right);

    public static Vector512<double> Min(Vector512<double> left, Vector512<double> right);
    public static Vector512<int> Min(Vector512<int> left, Vector512<int> right);
    public static Vector512<long> Min(Vector512<long> left, Vector512<long> right);
    public static Vector512<float> Min(Vector512<float> left, Vector512<float> right);
    public static Vector512<uint> Min(Vector512<uint> left, Vector512<uint> right);
    public static Vector512<ulong> Min(Vector512<ulong> left, Vector512<ulong> right);

    public static Vector512<double> MoveAndDuplicate(Vector512<double> value);

    public static Vector512<double> Multiply(Vector512<double> left, Vector512<double> right);
    public static Vector512<double> Multiply(Vector512<double> left, Vector512<double> right, FloatRoundingMode mode);

    public static Vector512<int> Multiply(Vector512<int> left, Vector512<int> right);
    public static Vector512<uint> Multiply(Vector512<uint> left, Vector512<uint> right);

    public static Vector512<float> Multiply(Vector512<float> left, Vector512<float> right, FloatRoundingMode mode);
    public static Vector512<float> Multiply(Vector512<float> left, Vector512<float> right);

    public static Vector512<int> MultiplyLow(Vector512<int> left, Vector512<int> right);

    public static Vector128<double> MultiplyScalar(Vector128<double> left, Vector128<double> right, FloatRoundingMode mode);
    public static Vector128<float> MultiplyScalar(Vector128<float> left, Vector128<float> right, FloatRoundingMode mode);

    public static Vector512<int> Or(Vector512<int> left, Vector512<int> right);
    public static Vector512<long> Or(Vector512<long> left, Vector512<long> right);

    public static Vector512<int> ShiftLeftLogical(Vector512<int> value, byte count);
    public static Vector512<int> ShiftLeftLogical(Vector512<int> value, Vector128<int> count);
    public static Vector512<long> ShiftLeftLogical(Vector512<long> value, byte count);
    public static Vector512<long> ShiftLeftLogical(Vector512<long> value, Vector128<long> count);

    public static Vector512<int> ShiftRightArithmetic(Vector512<int> value, byte count);
    public static Vector512<int> ShiftRightArithmetic(Vector512<int> value, Vector128<int> count);
    public static Vector512<long> ShiftRightArithmetic(Vector512<long> value, byte count);
    public static Vector512<long> ShiftRightArithmetic(Vector512<long> value, Vector128<long> count);

    public static Vector512<int> ShiftRightLogical(Vector512<int> value, byte count);
    public static Vector512<int> ShiftRightLogical(Vector512<int> value, Vector128<int> count);
    public static Vector512<long> ShiftRightLogical(Vector512<long> value, byte count);
    public static Vector512<long> ShiftRightLogical(Vector512<long> value, Vector128<long> count);

    public static Vector512<double> Shuffle(Vector512<double> left, Vector512<double> right, byte control);
    public static Vector512<float> Shuffle(Vector512<float> left, Vector512<float> right, byte control);

    public static Vector512<int> Shuffle(Vector512<int> value, byte control);

    public static Vector512<double> Sqrt(Vector512<double> value, FloatRoundingMode mode);
    public static Vector512<float> Sqrt(Vector512<float> value, FloatRoundingMode mode);

    public static Vector128<double> SqrtScalar(Vector128<double> upper, Vector128<double> value, FloatRoundingMode mode);
    public static Vector128<float> SqrtScalar(Vector128<float> upper, Vector128<float> value, FloatRoundingMode mode);

    public static void Store(byte* address, Vector512<byte> value);
    public static void Store(double* address, Vector512<double> value);
    public static void Store(short* address, Vector512<short> value);
    public static void Store(int* address, Vector512<int> value);
    public static void Store(long* address, Vector512<long> value);
    public static void Store(nint* address, Vector512<nint> value);
    public static void Store(sbyte* address, Vector512<sbyte> value);
    public static void Store(float* address, Vector512<float> value);
    public static void Store(ushort* address, Vector512<ushort> value);
    public static void Store(uint* address, Vector512<uint> value);
    public static void Store(ulong* address, Vector512<ulong> value);
    public static void Store(nuint* address, Vector512<nuint> value);

    public static void StoreAligned(byte* address, Vector512<byte> value);
    public static void StoreAligned(double* address, Vector512<double> value);
    public static void StoreAligned(short* address, Vector512<short> value);
    public static void StoreAligned(int* address, Vector512<int> value);
    public static void StoreAligned(long* address, Vector512<long> value);
    public static void StoreAligned(nint* address, Vector512<nint> value);
    public static void StoreAligned(sbyte* address, Vector512<sbyte> value);
    public static void StoreAligned(float* address, Vector512<float> value);
    public static void StoreAligned(ushort* address, Vector512<ushort> value);
    public static void StoreAligned(uint* address, Vector512<uint> value);
    public static void StoreAligned(ulong* address, Vector512<ulong> value);
    public static void StoreAligned(nuint* address, Vector512<nuint> value);

    public static void StoreAlignedNonTemporal(byte* address, Vector512<byte> value);
    public static void StoreAlignedNonTemporal(double* address, Vector512<double> value);
    public static void StoreAlignedNonTemporal(short* address, Vector512<short> value);
    public static void StoreAlignedNonTemporal(int* address, Vector512<int> value);
    public static void StoreAlignedNonTemporal(long* address, Vector512<long> value);
    public static void StoreAlignedNonTemporal(nint* address, Vector512<nint> value);
    public static void StoreAlignedNonTemporal(sbyte* address, Vector512<sbyte> value);
    public static void StoreAlignedNonTemporal(float* address, Vector512<float> value);
    public static void StoreAlignedNonTemporal(ushort* address, Vector512<ushort> value);
    public static void StoreAlignedNonTemporal(uint* address, Vector512<uint> value);
    public static void StoreAlignedNonTemporal(ulong* address, Vector512<ulong> value);
    public static void StoreAlignedNonTemporal(nuint* address, Vector512<nuint> value);

    public static Vector512<double> Subtract(Vector512<double> left, Vector512<double> right);
    public static Vector512<double> Subtract(Vector512<double> left, Vector512<double> right, FloatRoundingMode mode);

    public static Vector512<int> Subtract(Vector512<int> left, Vector512<int> right);
    public static Vector512<long> Subtract(Vector512<long> left, Vector512<long> right);

    public static Vector512<float> Subtract(Vector512<float> left, Vector512<float> right);
    public static Vector512<float> Subtract(Vector512<float> left, Vector512<float> right, FloatRoundingMode mode);

    public static Vector128<double> SubtractScalar(Vector128<double> left, Vector128<double> right, FloatRoundingMode mode);
    public static Vector128<float> SubtractScalar(Vector128<float> left, Vector128<float> right, FloatRoundingMode mode);

    public static Vector512<double> UnpackHigh(Vector512<double> left, Vector512<double> right);
    public static Vector512<float> UnpackHigh(Vector512<float> left, Vector512<float> right);

    public static Vector512<double> UnpackLow(Vector512<double> left, Vector512<double> right);
    public static Vector512<float> UnpackLow(Vector512<float> left, Vector512<float> right);

    public static Vector512<int> UnpackHigh(Vector512<int> left, Vector512<int> right);
    public static Vector512<int> UnpackLow(Vector512<int> left, Vector512<int> right);

    public static Vector512<long> UnpackHigh(Vector512<long> left, Vector512<long> right);
    public static Vector512<long> UnpackLow(Vector512<long> left, Vector512<long> right);

    public static Vector512<int> Xor(Vector512<int> left, Vector512<int> right);
    public static Vector512<long> Xor(Vector512<long> left, Vector512<long> right);

    // AVX-AVX2

    public static Vector512<double> BroadcastScalarToVector512(Vector128<double> value);
    public static Vector512<int> BroadcastScalarToVector512(Vector128<int> value);
    public static Vector512<float> BroadcastScalarToVector512(Vector128<float> value);
    public static Vector512<long> BroadcastScalarToVector512(Vector128<long> value);

    public static Vector128<float> ExtractVector128(Vector512<float> value, byte index);
    public static Vector128<int> ExtractVector128(Vector512<int> value, byte index);

    public static Vector256<double> ExtractVector256(Vector512<double> value, byte index);
    public static Vector256<long> ExtractVector256(Vector512<long> value, byte index);

    public static Vector512<int> InsertVector128(Vector512<int> value, Vector128<int> data, byte index);
    public static Vector512<float> InsertVector128(Vector512<float> value, Vector128<float> data, byte index);

    public static Vector512<double> InsertVector256(Vector512<double> value, Vector256<double> data, byte index);
    public static Vector512<long> InsertVector256(Vector512<long> value, Vector256<long> data, byte index);

    public static Vector512<double> Permute2x64(Vector512<double> value, byte control);

    public static Vector512<float> Permute(Vector512<float> value, byte control);
    public static Vector512<double> Permute(Vector512<double> value, byte control);

    public static Vector512<long> Permute4x64(Vector512<long> value, byte control);

    public static Vector512<double> PermuteVar(Vector512<double> value, Vector512<long> control);
    public static Vector512<float> PermuteVar(Vector512<float> value, Vector512<int> control);

    public static Vector512<long> PermuteVar4x64(Vector512<long> value, Vector512<long> control);

    public static Vector512<int> PermuteVar8x32(Vector512<int> value, Vector512<int> control);

    public static Vector512<double> PermuteVar8x64(Vector512<double> value, Vector512<long> control);

    public static Vector512<float> PermuteVar16x32(Vector512<float> value, Vector512<int> control);

    public static Vector512<int> ShiftLeftLogicalVariable(Vector512<int> value, Vector512<int> count);
    public static Vector512<long> ShiftLeftLogicalVariable(Vector512<long> value, Vector512<long> count);

    public static Vector512<int> ShiftRightArithmeticVariable(Vector512<int> value, Vector512<int> count);
    public static Vector512<long> ShiftRightArithmeticVariable(Vector512<long> value, Vector512<long> count);

    public static Vector512<int> ShiftRightLogicalVariable(Vector512<int> value, Vector512<int> count);
    public static Vector512<long> ShiftRightLogicalVariable(Vector512<long> value, Vector512<long> count);

    // AVX512

    public static Vector512<int> AlignRight(Vector512<int> left, Vector512<int> right, byte mask);
    public static Vector512<long> AlignRight(Vector512<long> left, Vector512<long> right, byte mask);

    public static Vector512<double> BroadcastToVector512(Vector256<double> value);
    public static Vector512<int> BroadcastToVector512(Vector128<int> value);
    public static Vector512<long> BroadcastToVector512(Vector256<long> value);
    public static Vector512<float> BroadcastToVector512(Vector128<float> value);

    public static uint ConvertToUInt32(Vector128<double> value);
    public static uint ConvertToUInt32(Vector128<double> value, FloatRoundingMode mode);

    public static uint ConvertToUInt32(Vector128<float> value);
    public static uint ConvertToUInt32(Vector128<float> value, FloatRoundingMode mode);

    public static uint ConvertToUInt32WithTruncation(Vector128<double> value);
    public static uint ConvertToUInt32WithTruncation(Vector128<float> value);

    public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector512<long> value);
    public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector512<uint> value);

    public static Vector128<short> ConvertToVector128Int16(Vector512<long> value);
    public static Vector128<short> ConvertToVector128Int16WithSaturation(Vector512<long> value);

    public static Vector128<sbyte> ConvertToVector128SByte(Vector512<int> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector512<long> value);
    public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector512<int> value);
    public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector512<long> value);

    public static Vector128<short> ConvertToVector128UInt16WithSaturation(Vector512<long> value);

    public static Vector256<short> ConvertToVector256Int16(Vector512<int> value);
    public static Vector256<short> ConvertToVector256Int16WithSaturation(Vector512<int> value);

    public static Vector256<int> ConvertToVector256Int32(Vector512<long> value);
    public static Vector256<int> ConvertToVector256Int32WithSaturation(Vector512<long> value);

    public static Vector256<ushort> ConvertToVector256UInt16WithSaturation(Vector512<uint> value);

    public static Vector256<uint> ConvertToVector256UInt32(Vector512<double> value);
    public static Vector256<uint> ConvertToVector256UInt32(Vector512<double> value, FloatRoundingMode mode);
    public static Vector256<uint> ConvertToVector256UInt32WithSaturation(Vector512<long> value);
    public static Vector256<uint> ConvertToVector256UInt32WithTruncation(Vector512<double> value);

    public static Vector512<uint> ConvertToVector512UInt32(Vector512<float> value);
    public static Vector512<uint> ConvertToVector512UInt32(Vector512<float> value, FloatRoundingMode mode);
    public static Vector512<uint> ConvertToVector512UInt32WithTruncation(Vector512<float> value);

    public static Vector512<double> ConvertToVector512Double(Vector256<uint> value);
    public static Vector512<float> ConvertToVector512Single(Vector512<uint> value);

    public static Vector128<double> ConvertScalarToVector128Double(Vector128<double> upper, uint value);
    public static Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, uint value);

    public static Vector512<double> Fixup(Vector512<double> left, Vector512<double> right, Vector512<long> table);
    public static Vector512<float> Fixup(Vector512<float> left, Vector512<float> right, Vector512<int> table);

    public static Vector128<double> FixupScalar(Vector128<double> left, Vector128<double> right, Vector128<long> table);
    public static Vector128<float> FixupScalar(Vector128<float> left, Vector128<float> right, Vector128<int> table);

    public static Vector512<double> GatherVector512(double* baseAddress, Vector256<int> index, byte scale);
    public static Vector512<double> GatherVector512(double* baseAddress, Vector512<long> index, byte scale);

    public static Vector256<int> GatherVector256(int* baseAddress, Vector512<int> index, byte scale);
    public static Vector512<int> GatherVector512(int* baseAddress, Vector512<int> index, byte scale);

    public static Vector512<long> GatherVector512(void* baseAddress, Vector256<int> index, byte scale);
    public static Vector512<long> GatherVector512(void* baseAddress, Vector512<long> index, byte scale);

    public static Vector256<float> GatherVector256(float* baseAddress, Vector512<float> index, byte scale);
    public static Vector512<float> GatherVector512(Vector512<float> index, void* baseAddress, byte scale);

    public static Vector512<double> GetExponent(Vector512<double> value);
    public static Vector512<float> GetExponent(Vector512<float> value);

    public static Vector128<double> GetExponentScalar(Vector128<double> upper, Vector128<double> value);
    public static Vector128<float> GetExponentScalar(Vector128<float> upper, Vector128<float> value);

    public static Vector512<double> GetMantissa(Vector512<double> value, byte interval, byte signControl);
    public static Vector512<double> GetMantissa(Vector512<double> value, byte interval, byte signControl, FloatRoundingMode mode);

    public static Vector512<float> GetMantissa(Vector512<float> value, byte interval, byte signControl);
    public static Vector512<float> GetMantissa(Vector512<float> value, byte interval, byte signControl, FloatRoundingMode mode);

    public static Vector128<double> GetMantissaScalar(Vector128<double> upper, Vector128<double> value, byte interval, byte signControl);
    public static Vector128<double> GetMantissaScalar(Vector128<double> upper, Vector128<double> value, byte interval, byte signControl, FloatRoundingMode mode);

    public static Vector128<float> GetMantissaScalar(Vector128<float> upper, Vector128<float> value, byte interval, byte signControl);
    public static Vector128<float> GetMantissaScalar(Vector128<float> upper, Vector128<float> value, byte interval, byte signControl, FloatRoundingMode mode);

    public static Vector512<double> PermuteVar8x64(Vector512<double> left, Vector512<double> right, Vector512<double> control);
    public static Vector512<long> PermuteVar8x64(Vector512<long> left, Vector512<long> right, Vector512<long> control);

    public static Vector512<int> PermuteVar16x32(Vector512<int> left, Vector512<int> right, Vector512<int> control);
    public static Vector512<float> PermuteVar16x32(Vector512<float> left, Vector512<float> right, Vector512<float> control);

    public static Vector512<int> RotateLeft(Vector512<int> value, byte count);
    public static Vector512<long> RotateLeft(Vector512<long> value, byte count);

    public static Vector512<int> RotateLeftVariable(Vector512<int> value, Vector512<int> count);
    public static Vector512<long> RotateLeftVariable(Vector512<long> value, Vector512<long> count);

    public static Vector512<int> RotateRight(Vector512<int> value, byte count);
    public static Vector512<long> RotateRight(Vector512<long> value, byte count);

    public static Vector512<int> RotateRightVariable(Vector512<int> value, Vector512<int> count);
    public static Vector512<long> RotateRightVariable(Vector512<long> value, Vector512<long> count);

    public static Vector512<double> Reciprocal14(Vector512<double> value);
    public static Vector512<float> Reciprocal14(Vector512<float> value);

    public static Vector128<double> Reciprocal14Scalar(Vector128<double> upper, Vector128<double> value);
    public static Vector128<float> Reciprocal14Scalar(Vector128<float> upper, Vector128<float> value);

    public static Vector512<double> RoundScale(Vector512<double> value, byte scale);
    public static Vector512<float> RoundScale(Vector512<float> value, byte scale);

    public static Vector128<double> RoundScaleScalar(Vector128<double> upper, Vector128<double> value, byte scale);
    public static Vector128<float> RoundScaleScalar(Vector128<float> upper, Vector128<float> value, byte scale);

    public static Vector512<double> ReciprocalSqrt14(Vector512<double> value);
    public static Vector512<float> ReciprocalSqrt14(Vector512<float> value);

    public static Vector128<double> ReciprocalSqrt14Scalar(Vector128<double> upper, Vector128<double> value);
    public static Vector128<float> ReciprocalSqrt14Scalar(Vector128<float> upper, Vector128<float> value);

    public static Vector512<double> Scale(Vector512<double> left, Vector512<double> right);
    public static Vector512<double> Scale(Vector512<double> left, Vector512<double> right, FloatRoundingMode mode);

    public static Vector512<float> Scale(Vector512<float> left, Vector512<float> right);
    public static Vector512<float> Scale(Vector512<float> left, Vector512<float> right, FloatRoundingMode mode);

    public static void Scatter(double* baseAddress, Vector256<int> index, byte scale, Vector512<double> value);
    public static void Scatter(double* baseAddress, Vector512<long> index, byte scale, Vector512<double> value);

    public static void Scatter(int* baseAddress, Vector512<int> index, byte scale, Vector512<int> value);
    public static void Scatter(int* baseAddress, Vector512<long> index, byte scale, Vector256<int> value);

    public static void Scatter(long* baseAddress, Vector256<int> index, byte scale, Vector512<long> value);
    public static void Scatter(long* baseAddress, Vector512<long> index, byte scale, Vector512<long> value);

    public static void Scatter(float* baseAddress, Vector512<int> index, byte scale, Vector512<float> value);
    public static void Scatter(float* baseAddress, Vector512<long> index, byte scale, Vector256<float> value);

    public static Vector512<double> Shuffle(Vector512<double> left, Vector512<double> right, byte control);
    public static Vector512<int> Shuffle(Vector512<int> left, Vector512<int> right, byte control);
    public static Vector512<long> Shuffle(Vector512<long> left, Vector512<long> right, byte control);
    public static Vector512<float> Shuffle(Vector512<float> left, Vector512<float> right, byte control);

    public static Vector512<int> TernaryLogic(Vector512<int> left, Vector512<int> right, byte control);
    public static Vector512<long> TernaryLogic(Vector512<long> left, Vector512<long> right, byte control);

    public new abstract partial class X64
    {
        public static new bool IsSupported { get; }

        // SSE-SSE4.2

        public static long ConvertToInt64(Vector128<double> value, FloatRoundingMode mode);
        public static long ConvertToInt64(Vector128<float> value, FloatRoundingMode mode);

        public static Vector128<double> ConvertScalarToVector128Double(Vector128<double> upper, long value, FloatRoundingMode mode);
        public static Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, long value, FloatRoundingMode mode);

        // AVX512

        public static ulong ConvertToUInt64(Vector128<double> value);
        public static ulong ConvertToUInt64(Vector128<double> value, FloatRoundingMode mode);
        public static ulong ConvertToUInt64(Vector128<float> value);
        public static ulong ConvertToUInt64(Vector128<float> value, FloatRoundingMode mode);

        public static ulong ConvertToUInt64WithTruncation(Vector128<double> value);
        public static ulong ConvertToUInt64WithTruncation(Vector128<float> value);

        public static Vector128<double> ConvertScalarToVector128Double(Vector128<double> upper, ulong value);
        public static Vector128<double> ConvertScalarToVector128Double(Vector128<double> upper, ulong value, FloatRoundingMode mode);

        public static Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, ulong value);
        public static Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, ulong value, FloatRoundingMode mode);
    }
}
@tannergooding tannergooding added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime.Intrinsics labels Aug 9, 2022
@tannergooding tannergooding added this to the 8.0.0 milestone Aug 9, 2022
@tannergooding
Copy link
Member Author

tannergooding commented Aug 9, 2022

Important Notes

AVX512F technically implies AVX2, F16C, and FMA. However, .NET does not currently support multi-inheritance and so the inheritance from F16C and FMA is somewhat "implied". The downside of this is that doing Avx512.* will not show the baseline APIs from Fma. We do not currently expose F16C so there isn't as much impact there.

The API surface below is a subset of the overall AVX512 surface area. It represents the new API surface that doesn't also require KMASK register support. Such functionality will be proposed separately after a corresponding proposal for handling the masking concept also goes up. Additionally, this does not currently cover the AVX512VL ISA which is an extension on top providing access to new AVX-512 instructions/concepts to the Vector128/Vector256 sized registers.

The total surface area (from this proposal, the kmask proposal, and the VL proposal) is approximately 3000 new methods. Most of this is because the kmask proposal adds 2 more overloads per existing method signature. I am currently investigating whether we can slim this down at all (such as by requiring the user to specify Vector128<T>.Zero to cut it down to 1 more overload per method signature).

For the below, FloatRoundingMode could be defaulted as ToEven and reduce the method count a bit. .NET currently doesn't support floating-point exceptions and so 0x08 has the "same" behavior as 0x00. However, we might support it in the future and we'd want to ensure the enum is extensible towards that. So we might want ToEvenNoExceptions = 0x08 instead (or a better name, ToEven = 0x00, ToEvenUnchecked = 0x08 or ToEvenChecked = 0x00, ToEven = 0x08 maybe?)

A few of the methods take a byte sae parameter on the C++ side. This controls whether or not the method will throw (this is effectively "just" _MM_FROUND_RAISE_EXC and _MM_FROUND_NO_EXC from FloatRoundingMode). Given that .NET doesn't support exceptions, these parameters aren't currently exposed. It might be good to expose them anyways and just default them for the same reason as above, that we might support them in the future.

The methods in the Avx512F class are split into 3 groups. The first is overloads to instructions originally introduced in SSE through SSE4.2. The second is overloads to the instructions originally introduced in AVX through AVX2. The final is instructions entirely new to AVX512. The FMA surface area isn't currently included below.

There are potentially some tweaks to a couple of the byte control we could make where we expose an enum instead. Some APIs might due with a little bit more thought on the name.

@ghost
Copy link

ghost commented Aug 9, 2022

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Issue Details

Summary

Today .NET exposes hardware specific intrinsics for most of the x86/x64 optional ISAs. However, we do not currently support AVX-512 despite it having been out for several years.

API Proposal

We should expose the Avx512F class as the first part of the AVX512 feature set.

public enum FloatRoundingMode : byte
{
    ToEven = 0x08,                // _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC
    ToNegativeInfinity = 0x09,    // _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC
    ToPositiveInfinity = 0x0A,    // _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC
    ToZero = 0x0B,                // _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC
}

public abstract partial class Avx512F : Avx2
{
    // SSE-SSE4.2

    public static Vector512<int> Abs(Vector512<int> value);
    public static Vector512<long> Abs(Vector512<long> value);

    public static Vector512<double> Add(Vector512<double> left, Vector512<double> right);
    public static Vector512<double> Add(Vector512<double> left, Vector512<double> right, FloatRoundingMode mode);

    public static Vector512<int> Add(Vector512<int> left, Vector512<int> right);
    public static Vector512<long> Add(Vector512<long> left, Vector512<long> right);

    public static Vector512<float> Add(Vector512<float> left, Vector512<float> right);
    public static Vector512<float> Add(Vector512<float> left, Vector512<float> right, FloatRoundingMode mode);

    public static Vector128<double> AddScalar(Vector128<double> left, Vector128<double> right, FloatRoundingMode mode);
    public static Vector128<float> AddScalar(Vector128<float> left, Vector128<float> right, FloatRoundingMode mode);

    public static Vector512<int> And(Vector512<int> left, Vector512<int> right);
    public static Vector512<long> And(Vector512<long> left, Vector512<long> right);

    public static Vector512<int> AndNot(Vector512<int> left, Vector512<int> right);
    public static Vector512<long> AndNot(Vector512<long> left, Vector512<long> right);

    public static Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, Vector128<double> value, FloatRoundingMode mode);
    public static Vector128<double> ConvertScalarToVector128Double(Vector128<double> upper, Vector128<float> value, FloatRoundingMode mode);

    public static int ConvertToInt32(Vector128<double>, FloatRoundingMode mode);
    public static int ConvertToInt32(Vector128<float> value, FloatRoundingMode mode);

    public static Vector512<double> ConvertToVector256Double(Vector256<int> value);

    public static Vector256<int> ConvertToVector256Int32(Vector512<double> value);
    public static Vector256<int> ConvertToVector256Int32(Vector512<double> value, FloatRoundingMode mode);

    public static Vector512<float> ConvertToVector256Single(Vector512<int> value);
    public static Vector512<float> ConvertToVector256Single(Vector512<int> value, FloatRoundingMode mode);

    public static Vector256<float> ConvertToVector256Single(Vector512<double> value);
    public static Vector256<float> ConvertToVector256Single(Vector512<double> value, FloatRoundingMode mode);

    public static Vector512<double> ConvertToVector512Double(Vector256<float> value);

    public static Vector512<int> ConvertToVector512Int32(Vector128<byte> value);
    public static Vector512<int> ConvertToVector512Int32(Vector512<short> value);
    public static Vector512<int> ConvertToVector512Int32(Vector512<sbyte> value);
    public static Vector512<int> ConvertToVector512Int32(Vector128<ushort> value);

    public static Vector512<int> ConvertToVector512Int32(Vector512<float> value);
    public static Vector512<int> ConvertToVector512Int32(Vector512<float> value, FloatRoundingMode mode);

    public static Vector256<int> ConvertToVector256Int32WithTruncation(Vector512<double> value);
    public static Vector512<int> ConvertToVector512Int32WithTruncation(Vector512<float> value);

    public static Vector512<long> ConvertToVector512Int64(Vector128<byte> value);
    public static Vector512<long> ConvertToVector512Int64(Vector512<short> value);
    public static Vector512<long> ConvertToVector512Int64(Vector512<int> value);
    public static Vector512<long> ConvertToVector512Int64(Vector512<sbyte> value);
    public static Vector512<long> ConvertToVector512Int64(Vector256<uint> value);
    public static Vector512<long> ConvertToVector512Int64(Vector256<ushort> value);

    public static Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, int value, FloatRoundingMode mode);

    public static Vector512<double> Divide(Vector512<double> left, Vector512<double> right);
    public static Vector512<double> Divide(Vector512<double> left, Vector512<double> right, FloatRoundingMode mode);

    public static Vector512<float> Divide(Vector512<float> left, Vector512<float> right);
    public static Vector512<float> Divide(Vector512<float> left, Vector512<float> right, FloatRoundingMode mode);

    public static Vector128<double> DivideScalar(Vector128<double> left, Vector128<double> right, FloatRoundingMode mode);
    public static Vector128<float> DivideScalar(Vector128<float> left, Vector128<float> right, FloatRoundingMode mode);

    public static Vector512<float> DuplicateOddIndexed(Vector512<float> value);
    public static Vector512<float> DuplicateEvenIndexed(Vector512<float> value);

    public static Vector512<byte> LoadVector512(byte* address);
    public static Vector512<double> LoadVector512(double* address);
    public static Vector512<short> LoadVector512(short* address);
    public static Vector512<int> LoadVector512(int* address);
    public static Vector512<long> LoadVector512(long* address);
    public static Vector512<nint> LoadVector512(nint* address);
    public static Vector512<sbyte> LoadVector512(sbyte* address);
    public static Vector512<float> LoadVector512(float* address);
    public static Vector512<ushort> LoadVector512(ushort* address);
    public static Vector512<uint> LoadVector512(uint* address);
    public static Vector512<ulong> LoadVector512(ulong* address);
    public static Vector512<nuint> LoadVector512(nuint* address);

    public static Vector512<byte> LoadAlignedVector512(byte* address);
    public static Vector512<double> LoadAlignedVector512(double* address);
    public static Vector512<short> LoadAlignedVector512(short* address);
    public static Vector512<int> LoadAlignedVector512(int* address);
    public static Vector512<long> LoadAlignedVector512(long* address);
    public static Vector512<nint> LoadAlignedVector512(nint* address);
    public static Vector512<sbyte> LoadAlignedVector512(sbyte* address);
    public static Vector512<float> LoadAlignedVector512(float* address);
    public static Vector512<ushort> LoadAlignedVector512(ushort* address);
    public static Vector512<uint> LoadAlignedVector512(uint* address);
    public static Vector512<ulong> LoadAlignedVector512(ulong* address);
    public static Vector512<nuint> LoadAlignedVector512(nuint* address);

    public static Vector512<byte> LoadAlignedVector512NonTemporal(byte* address);
    public static Vector512<short> LoadAlignedVector512NonTemporal(short* address);
    public static Vector512<int> LoadAlignedVector512NonTemporal(int* address);
    public static Vector512<long> LoadAlignedVector512NonTemporal(long* address);
    public static Vector512<nint> LoadAlignedVector512NonTemporal(nint* address);
    public static Vector512<sbyte> LoadAlignedVector512NonTemporal(sbyte* address);
    public static Vector512<ushort> LoadAlignedVector512NonTemporal(ushort* address);
    public static Vector512<uint> LoadAlignedVector512NonTemporal(uint* address);
    public static Vector512<ulong> LoadAlignedVector512NonTemporal(ulong* address);
    public static Vector512<nuint> LoadAlignedVector512NonTemporal(nuint* address);

    public static Vector512<double> Max(Vector512<double> left, Vector512<double> right);
    public static Vector512<int> Max(Vector512<int> left, Vector512<int> right);
    public static Vector512<long> Max(Vector512<long> left, Vector512<long> right);
    public static Vector512<float> Max(Vector512<float> left, Vector512<float> right);
    public static Vector512<uint> Max(Vector512<uint> left, Vector512<uint> right);
    public static Vector512<ulong> Max(Vector512<ulong> left, Vector512<ulong> right);

    public static Vector512<double> Min(Vector512<double> left, Vector512<double> right);
    public static Vector512<int> Min(Vector512<int> left, Vector512<int> right);
    public static Vector512<long> Min(Vector512<long> left, Vector512<long> right);
    public static Vector512<float> Min(Vector512<float> left, Vector512<float> right);
    public static Vector512<uint> Min(Vector512<uint> left, Vector512<uint> right);
    public static Vector512<ulong> Min(Vector512<ulong> left, Vector512<ulong> right);

    public static Vector512<double> MoveAndDuplicate(Vector512<double> value);

    public static Vector512<double> Multiply(Vector512<double> left, Vector512<double> right);
    public static Vector512<double> Multiply(Vector512<double> left, Vector512<double> right, FloatRoundingMode mode;

    public static Vector512<int> Multiply(Vector512<int> left, Vector512<int> right);
    public static Vector512<uint> Multiply(Vector512<uint> left, Vector512<uint> right);

    public static Vector512<float> Multiply(Vector512<float> left, Vector512<float> right, FloatRoundingMode mode);
    public static Vector512<float> Multiply(Vector512<float> left, Vector512<float> right);

    public static Vector512<int> MultiplyLow(Vector512<int> left, Vector512<int> right);

    public static Vector128<double> MultiplyScalar(Vector128<double> left, Vector128<double> right, FloatRoundingMode mode);
    public static Vector128<float> MultiplyScalar(Vector128<float> left, Vector128<float> right, FloatRoundingMode mode);

    public static Vector512<int> Or(Vector512<int> left, Vector512<int> right);
    public static Vector512<long> Or(Vector512<long> left, Vector512<long> right);

    public static Vector512<int> ShiftLeftLogical(Vector512<int> value, byte count);
    public static Vector512<int> ShiftLeftLogical(Vector512<int> value, Vector128<int> count);
    public static Vector512<long> ShiftLeftLogical(Vector512<long> value, byte count);
    public static Vector512<long> ShiftLeftLogical(Vector512<long> value, Vector128<long> count);

    public static Vector512<int> ShiftRightArithmetic(Vector512<int> value, byte count);
    public static Vector512<int> ShiftRightArithmetic(Vector512<int> value, Vector128<int> count);
    public static Vector512<long> ShiftRightArithmetic(Vector512<long> value, byte count);
    public static Vector512<long> ShiftRightArithmetic(Vector512<long> value, Vector128<long> count);

    public static Vector512<int> ShiftRightLogical(Vector512<int> value, byte count);
    public static Vector512<int> ShiftRightLogical(Vector512<int> value, Vector128<int> count);
    public static Vector512<long> ShiftRightLogical(Vector512<long> value, byte count);
    public static Vector512<long> ShiftRightLogical(Vector512<long> value, Vector128<long> count);

    public static Vector512<double> Shuffle(Vector512<double> left, Vector512<double> right, byte control);
    public static Vector512<float> Shuffle(Vector512<float> left, Vector512<float> right, byte control);

    public static Vector512<int> Shuffle(Vector512<int> value, byte control);

    public static Vector512<double> Sqrt(Vector512<double> a, FloatRoundingMode mode);
    public static Vector512<float> Sqrt(Vector512<float> a, FloatRoundingMode mode);

    public static Vector128<double> SqrtScalar(Vector128<double> upper, Vector128<double> value, FloatRoundingMode mode);
    public static Vector128<float> SqrtScalar(Vector128<float> upper, Vector128<float> value, FloatRoundingMode mode);

    public static void Store(byte* address, Vector512<byte> value);
    public static void Store(double* address, Vector512<double> value);
    public static void Store(short* address, Vector512<short> value);
    public static void Store(int* address, Vector512<int> value);
    public static void Store(long* address, Vector512<long> value);
    public static void Store(nint* address, Vector512<nint> value);
    public static void Store(sbyte* address, Vector512<sbyte> value);
    public static void Store(float* address, Vector512<float> value);
    public static void Store(ushort* address, Vector512<ushort> value);
    public static void Store(uint* address, Vector512<uint> value);
    public static void Store(ulong* address, Vector512<ulong> value);
    public static void Store(nuint* address, Vector512<nuint> value);

    public static void StoreAligned(byte* address, Vector512<byte> value);
    public static void StoreAligned(double* address, Vector512<double> value);
    public static void StoreAligned(short* address, Vector512<short> value);
    public static void StoreAligned(int* address, Vector512<int> value);
    public static void StoreAligned(long* address, Vector512<long> value);
    public static void StoreAligned(nint* address, Vector512<nint> value);
    public static void StoreAligned(sbyte* address, Vector512<sbyte> value);
    public static void StoreAligned(float* address, Vector512<float> value);
    public static void StoreAligned(ushort* address, Vector512<ushort> value);
    public static void StoreAligned(uint* address, Vector512<uint> value);
    public static void StoreAligned(ulong* address, Vector512<ulong> value);
    public static void StoreAligned(nuint* address, Vector512<nuint> value);

    public static void StoreAlignedNonTemporal(byte* address, Vector512<byte> value);
    public static void StoreAlignedNonTemporal(double* address, Vector512<double> value);
    public static void StoreAlignedNonTemporal(short* address, Vector512<short> value);
    public static void StoreAlignedNonTemporal(int* address, Vector512<int> value);
    public static void StoreAlignedNonTemporal(long* address, Vector512<long> value);
    public static void StoreAlignedNonTemporal(nint* address, Vector512<nint> value);
    public static void StoreAlignedNonTemporal(sbyte* address, Vector512<sbyte> value);
    public static void StoreAlignedNonTemporal(float* address, Vector512<float> value);
    public static void StoreAlignedNonTemporal(ushort* address, Vector512<ushort> value);
    public static void StoreAlignedNonTemporal(uint* address, Vector512<uint> value);
    public static void StoreAlignedNonTemporal(ulong* address, Vector512<ulong> value);
    public static void StoreAlignedNonTemporal(nuint* address, Vector512<nuint> value);

    public static Vector512<double> Subtract(Vector512<double> left, Vector512<double> right);
    public static Vector512<double> Subtract(Vector512<double> left, Vector512<double> right, FloatRoundingMode mode);

    public static Vector512<int> Subtract(Vector512<int> left, Vector512<int> right);
    public static Vector512<long> Subtract(Vector512<long> left, Vector512<long> right);

    public static Vector512<float> Subtract(Vector512<float> left, Vector512<float> right);
    public static Vector512<float> Subtract(Vector512<float> left, Vector512<float> right, FloatRoundingMode mode);

    public static Vector128<double> SubtractScalar(Vector128<double> left, Vector128<double> right, FloatRoundingMode mode);
    public static Vector128<float> SubtractScalar(Vector128<float> left, Vector128<float> right, FloatRoundingMode mode);

    public static Vector512<double> UnpackHigh(Vector512<double> left, Vector512<double> right);
    public static Vector512<float> UnpackHigh(Vector512<float> left, Vector512<float> right);

    public static Vector512<double> UnpackLow(Vector512<double> left, Vector512<double> right);
    public static Vector512<float> UnpackLow(Vector512<float> left, Vector512<float> right);

    public static Vector512<int> UnpackHigh(Vector512<int> left, Vector512<int> right);
    public static Vector512<int> UnpackLow(Vector512<int> left, Vector512<int> right);

    public static Vector512<long> UnpackHigh(Vector512<long> left, Vector512<long> right);
    public static Vector512<long> UnpackLow(Vector512<long> left, Vector512<long> right);

    public static Vector512<int> Xor(Vector512<int> left, Vector512<int> right);
    public static Vector512<long> Xor(Vector512<long> left, Vector512<long> right);

    // AVX-AVX2

    public static Vector512<double> BroadcastScalarToVector512(Vector128<double> value);
    public static Vector512<int> BroadcastScalarToVector512(Vector128<int> value);
    public static Vector512<float> BroadcastScalarToVector512(Vector128<float> value);
    public static Vector512<long> BroadcastScalarToVector512(Vector128<long> value);

    public static Vector128<float> ExtractVector128(Vector512<float> value, byte index);
    public static Vector128<int> ExtractVector128(Vector512<int> value, byte index);

    public static Vector256<double> ExtractVector256(Vector512<double> value, byte index);
    public static Vector256<long> ExtractVector256(Vector512<long> value, byte index);

    public static Vector512<int> InsertVector128(Vector512<int> a, Vector128<int> data, byte index);
    public static Vector512<float> InsertVector128(Vector512<float> a, Vector128<float> data, byte index);

    public static Vector512<double> InsertVector256(Vector512<double> a, Vector256<double> data, byte index);
    public static Vector512<long> InsertVector256(Vector512<long> a, Vector256<long> data, byte index);

    public static Vector512<double> Permute2x64(Vector512<double> value, byte control);

    public static Vector512<float> Permute(Vector512<float> value, byte control);
    public static Vector512<double> Permute(Vector512<double> value, byte control);

    public static Vector512<long> Permute4x64(Vector512<long> value, byte control);

    public static Vector512<double> PermuteVar(Vector512<double> value, Vector512<long> control);
    public static Vector512<float> PermuteVar(Vector512<float> value, Vector512<int> control);

    public static Vector512<long> PermuteVar4x64(Vector512<long> value, Vector512<long> control);

    public static Vector512<int> PermuteVar8x32(Vector512<int> value, Vector512<int> control);

    public static Vector512<double> PermuteVar8x64(Vector512<double> value, Vector512<long> control);

    public static Vector512<float> PermuteVar16x32(Vector512<float> value, Vector512<int> control);

    public static Vector512<int> ShiftLeftLogicalVariable(Vector512<int> value, Vector512<int> count);
    public static Vector512<long> ShiftLeftLogicalVariable(Vector512<long> value, Vector512<long> count);

    public static Vector512<int> ShiftRightArithmeticVariable(Vector512<int> value, Vector512<int> count);
    public static Vector512<long> ShiftRightArithmeticVariable(Vector512<long> value, Vector512<long> count);

    public static Vector512<int> ShiftRightLogicalVariable(Vector512<int> value, Vector512<int> count);
    public static Vector512<long> ShiftRightLogicalVariable(Vector512<long> value, Vector512<long> count);

    // AVX512

    public static Vector512<int> AlignRight(Vector512<int> a, Vector512<int> b, int count);
    public static Vector512<long> AlignRight(Vector512<long> a, Vector512<long> b, int count);

    public static Vector512<double> BroadcastToVector512(Vector256<double> value);
    public static Vector512<int> BroadcastToVector512(Vector128<int> a);
    public static Vector512<long> BroadcastToVector512(Vector256<long> a);
    public static Vector512<float> BroadcastToVector512(Vector128<float> value);

    public static uint ConvertToUInt32(Vector128<double> value);
    public static uint ConvertToUInt32(Vector128<double> value, FloatRoundingMode mode);

    public static uint ConvertToUInt32(Vector128<float> value);
    public static uint ConvertToUInt32(Vector128<float> value, FloatRoundingMode mode);

    public static uint ConvertToUInt32WithTruncation(Vector128<double> value);
    public static uint ConvertToUInt32WithTruncation(Vector128<float> value);

    public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector512<long> value);
    public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector512<uint> value);

    public static Vector128<short> ConvertToVector128Int16(Vector512<long> value);
    public static Vector128<short> ConvertToVector128Int16WithSaturation(Vector512<long> value);

    public static Vector128<sbyte> ConvertToVector128SByte(Vector512<int> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector512<long> value);
    public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector512<int> value);
    public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector512<long> value);

    public static Vector128<short> ConvertToVector128UInt16WithSaturation(Vector512<long> value);

    public static Vector256<short> ConvertToVector256Int16(Vector512<int> value);
    public static Vector256<short> ConvertToVector256Int16WithSaturation(Vector512<int> value);

    public static Vector256<int> ConvertToVector256Int32(Vector512<long> value);
    public static Vector256<int> ConvertToVector256Int32WithSaturation(Vector512<long> value);

    public static Vector256<ushort> ConvertToVector256UInt16WithSaturation(Vector512<uint> value);

    public static Vector256<uint> ConvertToVector256UInt32(Vector512<double> value);
    public static Vector256<uint> ConvertToVector256UInt32(Vector512<double> value, FloatRoundingMode mode);
    public static Vector256<uint> ConvertToVector256UInt32WithSaturation(Vector512<long> value);
    public static Vector256<uint> ConvertToVector256UInt32WithTruncation(Vector512<double> value);
    public static Vector256<uint> ConvertToVector256UInt32WithTruncation(Vector512<double> value, int sae);

    public static Vector512<uint> ConvertToVector512UInt32(Vector512<float> value);
    public static Vector512<uint> ConvertToVector512UInt32(Vector512<float> value, FloatRoundingMode mode);
    public static Vector512<uint> ConvertToVector512UInt32WithTruncation(Vector512<float> value);
    public static Vector512<uint> ConvertToVector512UInt32WithTruncation(Vector512<float> value, int sae);

    public static Vector512<double> ConvertToVector512Double(Vector256<uint> value);
    public static Vector512<float> ConvertToVector512Single(Vector512<uint> value);

    public static Vector128<double> ConvertScalarToVector128Double(Vector128<double> upper, uint value);
    public static Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, uint value);

    public static Vector512<double> Fixup(Vector512<double> left, Vector512<double> right, Vector512<long> table);
    public static Vector512<float> Fixup(Vector512<float> left, Vector512<float> right, Vector512<int> table);

    public static Vector128<double> FixupScalar(Vector128<double> left, Vector128<double> right, Vector128<long> table);
    public static Vector128<float> FixupScalar(Vector128<float> left, Vector128<float> right, Vector128<int> table);

    public static Vector512<double> GatherVector512(double* baseAddress, Vector256<int> index, byte scale);
    public static Vector512<double> GatherVector512(double* baseAddress, Vector512<long> index, byte scale);

    public static Vector256<int> GatherVector256(int* baseAddress, Vector512<int> index, byte scale);
    public static Vector512<int> GatherVector512(int* baseAddress, Vector512<int> index, byte scale);

    public static Vector512<long> GatherVector512(void* baseAddress, Vector256<int> index, byte scale);
    public static Vector512<long> GatherVector512(void* baseAddress, Vector512<long> index, byte scale);

    public static Vector256<float> GatherVector256(float* baseAddress, Vector512<float> index, byte scale);
    public static Vector512<float> GatherVector512(Vector512<float> vdx, void* base, int scale);

    public static Vector512<double> GetExponent(Vector512<double> value);
    public static Vector512<float> GetExponent(Vector512<float> value);

    public static Vector128<double> GetExponentScalar(Vector128<double> upper, Vector128<double> value);
    public static Vector128<float> GetExponentScalar(Vector128<float> upper, Vector128<float> value);

    public static Vector512<double> GetMantissa(Vector512<double> value, byte interval, byte signControl);
    public static Vector512<double> GetMantissa(Vector512<double> value, byte interval, byte signControl, FloatRoundingMode mode);

    public static Vector512<float> GetMantissa(Vector512<float> value, byte interval, byte signControl);
    public static Vector512<float> GetMantissa(Vector512<float> value, byte interval, byte signControl, FloatRoundingMode mode);

    public static Vector128<double> GetMantissaScalar(Vector128<double> upper, Vector128<double> value, byte interval, byte signControl);
    public static Vector128<double> GetMantissaScalar(Vector128<double> upper, Vector128<double> value, byte interval, byte signControl, FloatRoundingMode mode);

    public static Vector128<float> GetMantissaScalar(Vector128<float> upper, Vector128<float> value, byte interval, byte signControl);
    public static Vector128<float> GetMantissaScalar(Vector128<float> upper, Vector128<float> value, byte interval, byte signControl, FloatRoundingMode mode);

    public static Vector512<double> PermuteVar8x64(Vector512<double> left, Vector512<double> right, Vector512<double> control);
    public static Vector512<long> PermuteVar8x64(Vector512<long> left, Vector512<long> right, Vector512<long> control);

    public static Vector512<int> PermuteVar16x32(Vector512<int> left, Vector512<int> right, Vector512<int> control);
    public static Vector512<float> PermuteVar16x32(Vector512<float> left, Vector512<float> right, Vector512<float> control);

    public static Vector512<int> RotateLeft(Vector512<int> value, byte count);
    public static Vector512<long> RotateLeft(Vector512<long> value, byte count);

    public static Vector512<int> RotateLeftVariable(Vector512<int> value, Vector512<int> count);
    public static Vector512<long> RotateLeftVariable(Vector512<long> value, Vector512<long> count);

    public static Vector512<int> RotateRight(Vector512<int> value, byte count);
    public static Vector512<long> RotateRight(Vector512<long> value, byte count);

    public static Vector512<int> RotateRightVariable(Vector512<int> value, Vector512<int> count);
    public static Vector512<long> RotateRightVariable(Vector512<long> value, Vector512<long> count);

    public static Vector512<double> Reciprocal14(Vector512<double> value);
    public static Vector512<float> Reciprocal14(Vector512<float> value);

    public static Vector128<double> Reciprocal14Scalar(Vector128<double> upper, Vector128<double> value);
    public static Vector128<float> Reciprocal14Scalar(Vector128<float> upper, Vector128<float> value);

    public static Vector512<double> RoundScale(Vector512<double> value, byte scale);
    public static Vector512<float> RoundScale(Vector512<float> value, byte scale);

    public static Vector128<double> RoundScaleScalar(Vector128<double> upper, Vector128<double> value, byte scale);
    public static Vector128<float> RoundScaleScalar(Vector128<float> upper, Vector128<float> value, byte scale);

    public static Vector512<double> ReciprocalSqrt14(Vector512<double> value);
    public static Vector512<float> ReciprocalSqrt14(Vector512<float> value);

    public static Vector128<double> ReciprocalSqrt14Scalar(Vector128<double> upper, Vector128<double> value);
    public static Vector128<float> ReciprocalSqrt14Scalar(Vector128<float> upper, Vector128<float> value);

    public static Vector512<double> Scale(Vector512<double> left, Vector512<double> right);
    public static Vector512<double> Scale(Vector512<double> left, Vector512<double> right, FloatRoundingMode mode);

    public static Vector512<float> Scale(Vector512<float> left, Vector512<float> right);
    public static Vector512<float> Scale(Vector512<float> left, Vector512<float> right, FloatRoundingMode mode);

    public static void Scatter(double* baseAddress, Vector256<int> index, byte scale, Vector512<double> value);
    public static void Scatter(double* baseAddress, Vector512<long> index, byte scale, Vector512<double> value);

    public static void Scatter(int* baseAddress, Vector512<int> index, byte scale, Vector512<int> value);
    public static void Scatter(int* baseAddress, Vector512<long> index, byte scale, Vector256<int> value);

    public static void Scatter(long* baseAddress, Vector256<int> index, byte scale, Vector512<long> value);
    public static void Scatter(long* baseAddress, Vector512<long> index, byte scale, Vector512<long> value);

    public static void Scatter(float* baseAddress, Vector512<int> index, byte scale, Vector512<float> value);
    public static void Scatter(float* baseAddress, Vector512<long> index, byte scale, Vector256<float> value);

    public static Vector512<double> Shuffle(Vector512<double> left, Vector512<double> right, byte control);
    public static Vector512<int> Shuffle(Vector512<int> left, Vector512<int> right, byte control);
    public static Vector512<long> Shuffle(Vector512<long> left, Vector512<long> right, byte control);
    public static Vector512<float> Shuffle(Vector512<float> left, Vector512<float> right, byte control);

    public static Vector512<int> TernaryLogic(Vector512<int> left, Vector512<int> right, byte control);
    public static Vector512<long> TernaryLogic(Vector512<long> left, Vector512<long> right, byte control);

    public abstract partial class X64
    {
        // SSE-SSE4.2

        public static long ConvertToInt64(Vector128<double>, FloatRoundingMode mode);
        public static long ConvertToInt64(Vector128<float> value, FloatRoundingMode mode);

        public static Vector128<double> ConvertScalarToVector128Double(Vector128<double> upper, long value, FloatRoundingMode mode);
        public static Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, long value, FloatRoundingMode mode);

        // AVX512

        public static ulong ConvertToUInt64(Vector128<double> value);
        public static ulong ConvertToUInt64(Vector128<double> value, FloatRoundingMode mode);
        public static ulong ConvertToUInt64(Vector128<float> value);
        public static ulong ConvertToUInt64(Vector128<float> value, FloatRoundingMode mode);

        public static ulong ConvertToUInt64WithTruncation(Vector128<double> value);
        public static ulong ConvertToUInt64WithTruncation(Vector128<float> value);

        public static Vector128<double> ConvertScalarToVector128Double(Vector128<double> upper, ulong value);
        public static Vector128<double> ConvertScalarToVector128Double(Vector128<double> upper, ulong value, FloatRoundingMode mode);

        public static Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, ulong value);
        public static Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, ulong value, FloatRoundingMode mode);
    }
}
Author: tannergooding
Assignees: -
Labels:

api-suggestion, area-System.Runtime.Intrinsics

Milestone: 8.0.0

@tannergooding
Copy link
Member Author

Please review and double check the surface area described here. As indicated above, I'm working on the VL surface area with a proposal for how that should be exposed.

Given that VL is a special ISA and is itself an extension to both AVX512F and many other AVX512* ISAs, but only when they are present; it may need special consideration for how it's exposed. The two most "obvious" options are given below.

One option would be to just expose another "top level" class and have it inherit from Avx512F. It would mirror the hierarchy for other ISAs on its end, but wouldn't directly inherit from the "other" base class. For example:

namespace System.Runtime.Intrinsics.X86;

public abstract partial class Avx512F : Avx2 { }
public abstract partial class Avx512BW : Avx512F { }

public abstract partial class Avx512VL : Avx512F { }
public abstract partial class Avx512BW_VL : Avx512VL { }

An alternative option would be to have it be a nested class, much like X64 currently is. A downside is more . to access (although this isn't really "worse" than needing _ or BWVL or anything) and that it could itself contain a nested X64 class. An upside is that given the existing X64 semantics, it matches user expectations on what IsSupported means:

namespace System.Runtime.Intrinsics.X86;

public abstract partial class Avx512F : Avx2
{
    public abstract partial class VL { }
}

public abstract partial class Avx512BW : Avx512F
{
    public abstract partial class VL : Avx512F.VL { }
}

@tannergooding
Copy link
Member Author

CC. @anthonycanino

@tannergooding tannergooding added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Aug 26, 2022
@tannergooding
Copy link
Member Author

VectorMask proposal is here: #74613

I'll work on getting the API surface that utilizes these types up.

@bartonjs
Copy link
Member

bartonjs commented Aug 30, 2022

Video

  • All of the FloatRoundingMode parameters probably want the "use a constant here" attribute.
  • Many of these method groups should have both signed and unsigned integer types. The updated version here will likely miss some, but they're considered approved.
  • We talked about making the FloatRoundingMode a defaulted parameter, which would require adding a FloatRoundingMode value for "do what the control register says", but that didn't feel necessary at this time.
  • "Fixup" is apparently considered one word in .NET, so that is the correct casing.
  • Consider an enum instead of byte for TernaryLogic
namespace System.Runtime.Intrinsics.X86;

public enum FloatRoundingMode : byte
{
    ToEven = 0x08,                // _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC
    ToNegativeInfinity = 0x09,    // _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC
    ToPositiveInfinity = 0x0A,    // _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC
    ToZero = 0x0B,                // _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC
}

public abstract partial class Avx512F : Avx2
{
    public static new bool IsSupported { get; }

    // SSE-SSE4.2

    public static Vector512<int> Abs(Vector512<int> value);
    public static Vector512<long> Abs(Vector512<long> value);

    public static Vector512<double> Add(Vector512<double> left, Vector512<double> right);
    public static Vector512<double> Add(Vector512<double> left, Vector512<double> right, FloatRoundingMode mode);

    public static Vector512<int> Add(Vector512<int> left, Vector512<int> right);
    public static Vector512<uint> Add(Vector512<uint> left, Vector512<uint> right);
    public static Vector512<long> Add(Vector512<long> left, Vector512<long> right);
    public static Vector512<ulong> Add(Vector512<ulong> left, Vector512<ulong> right);

    public static Vector512<float> Add(Vector512<float> left, Vector512<float> right);
    public static Vector512<float> Add(Vector512<float> left, Vector512<float> right, FloatRoundingMode mode);

    public static Vector128<double> AddScalar(Vector128<double> left, Vector128<double> right, FloatRoundingMode mode);
    public static Vector128<float> AddScalar(Vector128<float> left, Vector128<float> right, FloatRoundingMode mode);

    public static Vector512<int> And(Vector512<int> left, Vector512<int> right);
    public static Vector512<uint> And(Vector512<uint> left, Vector512<uint> right);
    public static Vector512<long> And(Vector512<long> left, Vector512<long> right);
    public static Vector512<ulong> And(Vector512<ulong> left, Vector512<ulong> right);

    public static Vector512<int> AndNot(Vector512<int> left, Vector512<int> right);
    public static Vector512<uint> AndNot(Vector512<uint> left, Vector512<uint> right);
    public static Vector512<long> AndNot(Vector512<long> left, Vector512<long> right);
    public static Vector512<ulong> AndNot(Vector512<ulong> left, Vector512<ulong> right);

    public static Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, Vector128<double> value, FloatRoundingMode mode);
    public static Vector128<double> ConvertScalarToVector128Double(Vector128<double> upper, Vector128<float> value, FloatRoundingMode mode);

    public static int ConvertToInt32(Vector128<double> value, FloatRoundingMode mode);
    public static int ConvertToInt32(Vector128<float> value, FloatRoundingMode mode);

    public static Vector512<double> ConvertToVector256Double(Vector256<int> value);

    public static Vector256<int> ConvertToVector256Int32(Vector512<double> value);
    public static Vector256<int> ConvertToVector256Int32(Vector512<double> value, FloatRoundingMode mode);

    public static Vector512<float> ConvertToVector256Single(Vector512<int> value);
    public static Vector512<float> ConvertToVector256Single(Vector512<int> value, FloatRoundingMode mode);

    public static Vector256<float> ConvertToVector256Single(Vector512<double> value);
    public static Vector256<float> ConvertToVector256Single(Vector512<double> value, FloatRoundingMode mode);

    public static Vector512<double> ConvertToVector512Double(Vector256<float> value);

    public static Vector512<int> ConvertToVector512Int32(Vector128<byte> value);
    public static Vector512<int> ConvertToVector512Int32(Vector512<short> value);
    public static Vector512<int> ConvertToVector512Int32(Vector512<sbyte> value);
    public static Vector512<int> ConvertToVector512Int32(Vector128<ushort> value);

    public static Vector512<int> ConvertToVector512Int32(Vector512<float> value);
    public static Vector512<int> ConvertToVector512Int32(Vector512<float> value, FloatRoundingMode mode);

    public static Vector256<int> ConvertToVector256Int32WithTruncation(Vector512<double> value);
    public static Vector512<int> ConvertToVector512Int32WithTruncation(Vector512<float> value);

    public static Vector512<long> ConvertToVector512Int64(Vector128<byte> value);
    public static Vector512<long> ConvertToVector512Int64(Vector512<short> value);
    public static Vector512<long> ConvertToVector512Int64(Vector512<int> value);
    public static Vector512<long> ConvertToVector512Int64(Vector512<sbyte> value);
    public static Vector512<long> ConvertToVector512Int64(Vector256<uint> value);
    public static Vector512<long> ConvertToVector512Int64(Vector256<ushort> value);

    public static Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, int value, FloatRoundingMode mode);

    public static Vector512<double> Divide(Vector512<double> left, Vector512<double> right);
    public static Vector512<double> Divide(Vector512<double> left, Vector512<double> right, FloatRoundingMode mode);

    public static Vector512<float> Divide(Vector512<float> left, Vector512<float> right);
    public static Vector512<float> Divide(Vector512<float> left, Vector512<float> right, FloatRoundingMode mode);

    public static Vector128<double> DivideScalar(Vector128<double> left, Vector128<double> right, FloatRoundingMode mode);
    public static Vector128<float> DivideScalar(Vector128<float> left, Vector128<float> right, FloatRoundingMode mode);

    public static Vector512<float> DuplicateOddIndexed(Vector512<float> value);
    public static Vector512<float> DuplicateEvenIndexed(Vector512<float> value);

    public static Vector512<byte> LoadVector512(byte* address);
    public static Vector512<double> LoadVector512(double* address);
    public static Vector512<short> LoadVector512(short* address);
    public static Vector512<int> LoadVector512(int* address);
    public static Vector512<long> LoadVector512(long* address);
    public static Vector512<nint> LoadVector512(nint* address);
    public static Vector512<sbyte> LoadVector512(sbyte* address);
    public static Vector512<float> LoadVector512(float* address);
    public static Vector512<ushort> LoadVector512(ushort* address);
    public static Vector512<uint> LoadVector512(uint* address);
    public static Vector512<ulong> LoadVector512(ulong* address);
    public static Vector512<nuint> LoadVector512(nuint* address);

    public static Vector512<byte> LoadAlignedVector512(byte* address);
    public static Vector512<double> LoadAlignedVector512(double* address);
    public static Vector512<short> LoadAlignedVector512(short* address);
    public static Vector512<int> LoadAlignedVector512(int* address);
    public static Vector512<long> LoadAlignedVector512(long* address);
    public static Vector512<nint> LoadAlignedVector512(nint* address);
    public static Vector512<sbyte> LoadAlignedVector512(sbyte* address);
    public static Vector512<float> LoadAlignedVector512(float* address);
    public static Vector512<ushort> LoadAlignedVector512(ushort* address);
    public static Vector512<uint> LoadAlignedVector512(uint* address);
    public static Vector512<ulong> LoadAlignedVector512(ulong* address);
    public static Vector512<nuint> LoadAlignedVector512(nuint* address);

    public static Vector512<byte> LoadAlignedVector512NonTemporal(byte* address);
    public static Vector512<short> LoadAlignedVector512NonTemporal(short* address);
    public static Vector512<int> LoadAlignedVector512NonTemporal(int* address);
    public static Vector512<long> LoadAlignedVector512NonTemporal(long* address);
    public static Vector512<nint> LoadAlignedVector512NonTemporal(nint* address);
    public static Vector512<sbyte> LoadAlignedVector512NonTemporal(sbyte* address);
    public static Vector512<ushort> LoadAlignedVector512NonTemporal(ushort* address);
    public static Vector512<uint> LoadAlignedVector512NonTemporal(uint* address);
    public static Vector512<ulong> LoadAlignedVector512NonTemporal(ulong* address);
    public static Vector512<nuint> LoadAlignedVector512NonTemporal(nuint* address);

    public static Vector512<double> Max(Vector512<double> left, Vector512<double> right);
    public static Vector512<int> Max(Vector512<int> left, Vector512<int> right);
    public static Vector512<long> Max(Vector512<long> left, Vector512<long> right);
    public static Vector512<float> Max(Vector512<float> left, Vector512<float> right);
    public static Vector512<uint> Max(Vector512<uint> left, Vector512<uint> right);
    public static Vector512<ulong> Max(Vector512<ulong> left, Vector512<ulong> right);

    public static Vector512<double> Min(Vector512<double> left, Vector512<double> right);
    public static Vector512<int> Min(Vector512<int> left, Vector512<int> right);
    public static Vector512<long> Min(Vector512<long> left, Vector512<long> right);
    public static Vector512<float> Min(Vector512<float> left, Vector512<float> right);
    public static Vector512<uint> Min(Vector512<uint> left, Vector512<uint> right);
    public static Vector512<ulong> Min(Vector512<ulong> left, Vector512<ulong> right);

    public static Vector512<double> MoveAndDuplicate(Vector512<double> value);

    public static Vector512<double> Multiply(Vector512<double> left, Vector512<double> right);
    public static Vector512<double> Multiply(Vector512<double> left, Vector512<double> right, FloatRoundingMode mode);

    public static Vector512<int> Multiply(Vector512<int> left, Vector512<int> right);
    public static Vector512<uint> Multiply(Vector512<uint> left, Vector512<uint> right);

    public static Vector512<float> Multiply(Vector512<float> left, Vector512<float> right, FloatRoundingMode mode);
    public static Vector512<float> Multiply(Vector512<float> left, Vector512<float> right);

    public static Vector512<int> MultiplyLow(Vector512<int> left, Vector512<int> right);
    // uint?

    public static Vector128<double> MultiplyScalar(Vector128<double> left, Vector128<double> right, FloatRoundingMode mode);
    public static Vector128<float> MultiplyScalar(Vector128<float> left, Vector128<float> right, FloatRoundingMode mode);

    public static Vector512<int> Or(Vector512<int> left, Vector512<int> right);
    public static Vector512<uint> Or(Vector512<uint> left, Vector512<uint> right);
    public static Vector512<long> Or(Vector512<long> left, Vector512<long> right);
    public static Vector512<ulong> Or(Vector512<ulong> left, Vector512<ulong> right);

    public static Vector512<int> ShiftLeftLogical(Vector512<int> value, byte count);
    public static Vector512<int> ShiftLeftLogical(Vector512<int> value, Vector128<int> count);
    // uint?
    public static Vector512<long> ShiftLeftLogical(Vector512<long> value, byte count);
    public static Vector512<long> ShiftLeftLogical(Vector512<long> value, Vector128<long> count);
    // ulong?

    public static Vector512<int> ShiftRightArithmetic(Vector512<int> value, byte count);
    public static Vector512<int> ShiftRightArithmetic(Vector512<int> value, Vector128<int> count);
    public static Vector512<long> ShiftRightArithmetic(Vector512<long> value, byte count);
    public static Vector512<long> ShiftRightArithmetic(Vector512<long> value, Vector128<long> count);
    // unsigned types?

    public static Vector512<int> ShiftRightLogical(Vector512<int> value, byte count);
    public static Vector512<int> ShiftRightLogical(Vector512<int> value, Vector128<int> count);
    public static Vector512<long> ShiftRightLogical(Vector512<long> value, byte count);
    public static Vector512<long> ShiftRightLogical(Vector512<long> value, Vector128<long> count);
    // unsigned types?

    public static Vector512<double> Shuffle(Vector512<double> left, Vector512<double> right, byte control);
    public static Vector512<float> Shuffle(Vector512<float> left, Vector512<float> right, byte control);

    public static Vector512<int> Shuffle(Vector512<int> value, byte control);
    // uint?

    public static Vector512<double> Sqrt(Vector512<double> value, FloatRoundingMode mode);
    public static Vector512<float> Sqrt(Vector512<float> value, FloatRoundingMode mode);

    public static Vector128<double> SqrtScalar(Vector128<double> upper, Vector128<double> value, FloatRoundingMode mode);
    public static Vector128<float> SqrtScalar(Vector128<float> upper, Vector128<float> value, FloatRoundingMode mode);

    public static void Store(byte* address, Vector512<byte> value);
    public static void Store(double* address, Vector512<double> value);
    public static void Store(short* address, Vector512<short> value);
    public static void Store(int* address, Vector512<int> value);
    public static void Store(long* address, Vector512<long> value);
    public static void Store(nint* address, Vector512<nint> value);
    public static void Store(sbyte* address, Vector512<sbyte> value);
    public static void Store(float* address, Vector512<float> value);
    public static void Store(ushort* address, Vector512<ushort> value);
    public static void Store(uint* address, Vector512<uint> value);
    public static void Store(ulong* address, Vector512<ulong> value);
    public static void Store(nuint* address, Vector512<nuint> value);

    public static void StoreAligned(byte* address, Vector512<byte> value);
    public static void StoreAligned(double* address, Vector512<double> value);
    public static void StoreAligned(short* address, Vector512<short> value);
    public static void StoreAligned(int* address, Vector512<int> value);
    public static void StoreAligned(long* address, Vector512<long> value);
    public static void StoreAligned(nint* address, Vector512<nint> value);
    public static void StoreAligned(sbyte* address, Vector512<sbyte> value);
    public static void StoreAligned(float* address, Vector512<float> value);
    public static void StoreAligned(ushort* address, Vector512<ushort> value);
    public static void StoreAligned(uint* address, Vector512<uint> value);
    public static void StoreAligned(ulong* address, Vector512<ulong> value);
    public static void StoreAligned(nuint* address, Vector512<nuint> value);

    public static void StoreAlignedNonTemporal(byte* address, Vector512<byte> value);
    public static void StoreAlignedNonTemporal(double* address, Vector512<double> value);
    public static void StoreAlignedNonTemporal(short* address, Vector512<short> value);
    public static void StoreAlignedNonTemporal(int* address, Vector512<int> value);
    public static void StoreAlignedNonTemporal(long* address, Vector512<long> value);
    public static void StoreAlignedNonTemporal(nint* address, Vector512<nint> value);
    public static void StoreAlignedNonTemporal(sbyte* address, Vector512<sbyte> value);
    public static void StoreAlignedNonTemporal(float* address, Vector512<float> value);
    public static void StoreAlignedNonTemporal(ushort* address, Vector512<ushort> value);
    public static void StoreAlignedNonTemporal(uint* address, Vector512<uint> value);
    public static void StoreAlignedNonTemporal(ulong* address, Vector512<ulong> value);
    public static void StoreAlignedNonTemporal(nuint* address, Vector512<nuint> value);

    public static Vector512<double> Subtract(Vector512<double> left, Vector512<double> right);
    public static Vector512<double> Subtract(Vector512<double> left, Vector512<double> right, FloatRoundingMode mode);

    public static Vector512<int> Subtract(Vector512<int> left, Vector512<int> right);
    public static Vector512<long> Subtract(Vector512<long> left, Vector512<long> right);
    // unsigned types?

    public static Vector512<float> Subtract(Vector512<float> left, Vector512<float> right);
    public static Vector512<float> Subtract(Vector512<float> left, Vector512<float> right, FloatRoundingMode mode);

    public static Vector128<double> SubtractScalar(Vector128<double> left, Vector128<double> right, FloatRoundingMode mode);
    public static Vector128<float> SubtractScalar(Vector128<float> left, Vector128<float> right, FloatRoundingMode mode);

    public static Vector512<double> UnpackHigh(Vector512<double> left, Vector512<double> right);
    public static Vector512<float> UnpackHigh(Vector512<float> left, Vector512<float> right);

    public static Vector512<double> UnpackLow(Vector512<double> left, Vector512<double> right);
    public static Vector512<float> UnpackLow(Vector512<float> left, Vector512<float> right);

    public static Vector512<int> UnpackHigh(Vector512<int> left, Vector512<int> right);
    public static Vector512<int> UnpackLow(Vector512<int> left, Vector512<int> right);
    // unsigned types?

    public static Vector512<long> UnpackHigh(Vector512<long> left, Vector512<long> right);
    public static Vector512<long> UnpackLow(Vector512<long> left, Vector512<long> right);
    // unsigned types?

    public static Vector512<int> Xor(Vector512<int> left, Vector512<int> right);
    public static Vector512<long> Xor(Vector512<long> left, Vector512<long> right);
    // unsigned types?

    // AVX-AVX2

    public static Vector512<double> BroadcastScalarToVector512(Vector128<double> value);
    public static Vector512<int> BroadcastScalarToVector512(Vector128<int> value);
    public static Vector512<float> BroadcastScalarToVector512(Vector128<float> value);
    public static Vector512<long> BroadcastScalarToVector512(Vector128<long> value);
    // unsigned types?

    public static Vector128<float> ExtractVector128(Vector512<float> value, byte index);
    public static Vector128<int> ExtractVector128(Vector512<int> value, byte index);
    // unsigned types?

    public static Vector256<double> ExtractVector256(Vector512<double> value, byte index);
    public static Vector256<long> ExtractVector256(Vector512<long> value, byte index);
    // unsigned types?

    public static Vector512<int> InsertVector128(Vector512<int> value, Vector128<int> data, byte index);
    public static Vector512<float> InsertVector128(Vector512<float> value, Vector128<float> data, byte index);
    // unsigned types?

    public static Vector512<double> InsertVector256(Vector512<double> value, Vector256<double> data, byte index);
    public static Vector512<long> InsertVector256(Vector512<long> value, Vector256<long> data, byte index);
    // unsigned types?

    public static Vector512<double> Permute2x64(Vector512<double> value, byte control);

    public static Vector512<float> Permute(Vector512<float> value, byte control);
    public static Vector512<double> Permute(Vector512<double> value, byte control);

    public static Vector512<long> Permute4x64(Vector512<long> value, byte control);
    // unsigned types?

    public static Vector512<double> PermuteVar(Vector512<double> value, Vector512<long> control);
    public static Vector512<float> PermuteVar(Vector512<float> value, Vector512<int> control);

    public static Vector512<long> PermuteVar4x64(Vector512<long> value, Vector512<long> control);
    // unsigned types?

    public static Vector512<int> PermuteVar8x32(Vector512<int> value, Vector512<int> control);
    // unsigned types?

    public static Vector512<double> PermuteVar8x64(Vector512<double> value, Vector512<long> control);

    public static Vector512<float> PermuteVar16x32(Vector512<float> value, Vector512<int> control);

    public static Vector512<int> ShiftLeftLogicalVariable(Vector512<int> value, Vector512<int> count);
    public static Vector512<long> ShiftLeftLogicalVariable(Vector512<long> value, Vector512<long> count);
    // unsigned types?

    public static Vector512<int> ShiftRightArithmeticVariable(Vector512<int> value, Vector512<int> count);
    public static Vector512<long> ShiftRightArithmeticVariable(Vector512<long> value, Vector512<long> count);
    // unsigned types?

    public static Vector512<int> ShiftRightLogicalVariable(Vector512<int> value, Vector512<int> count);
    public static Vector512<long> ShiftRightLogicalVariable(Vector512<long> value, Vector512<long> count);
    // unsigned types?

    // AVX512

    public static Vector512<int> AlignRight(Vector512<int> left, Vector512<int> right, byte mask);
    public static Vector512<long> AlignRight(Vector512<long> left, Vector512<long> right, byte mask);
    // unsigned types?

    public static Vector512<double> BroadcastToVector512(Vector256<double> value);
    public static Vector512<int> BroadcastToVector512(Vector128<int> value);
    public static Vector512<long> BroadcastToVector512(Vector256<long> value);
    public static Vector512<float> BroadcastToVector512(Vector128<float> value);
    // unsigned types?

    public static uint ConvertToUInt32(Vector128<double> value);
    public static uint ConvertToUInt32(Vector128<double> value, FloatRoundingMode mode);

    public static uint ConvertToUInt32(Vector128<float> value);
    public static uint ConvertToUInt32(Vector128<float> value, FloatRoundingMode mode);

    public static uint ConvertToUInt32WithTruncation(Vector128<double> value);
    public static uint ConvertToUInt32WithTruncation(Vector128<float> value);

    public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector512<long> value);
    public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector512<uint> value);

    public static Vector128<short> ConvertToVector128Int16(Vector512<long> value);
    public static Vector128<short> ConvertToVector128Int16WithSaturation(Vector512<long> value);

    public static Vector128<sbyte> ConvertToVector128SByte(Vector512<int> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector512<long> value);
    public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector512<int> value);
    public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector512<long> value);

    public static Vector128<short> ConvertToVector128UInt16WithSaturation(Vector512<long> value);

    public static Vector256<short> ConvertToVector256Int16(Vector512<int> value);
    public static Vector256<short> ConvertToVector256Int16WithSaturation(Vector512<int> value);

    public static Vector256<int> ConvertToVector256Int32(Vector512<long> value);
    public static Vector256<int> ConvertToVector256Int32WithSaturation(Vector512<long> value);

    public static Vector256<ushort> ConvertToVector256UInt16WithSaturation(Vector512<uint> value);

    public static Vector256<uint> ConvertToVector256UInt32(Vector512<double> value);
    public static Vector256<uint> ConvertToVector256UInt32(Vector512<double> value, FloatRoundingMode mode);
    public static Vector256<uint> ConvertToVector256UInt32WithSaturation(Vector512<long> value);
    public static Vector256<uint> ConvertToVector256UInt32WithTruncation(Vector512<double> value);

    public static Vector512<uint> ConvertToVector512UInt32(Vector512<float> value);
    public static Vector512<uint> ConvertToVector512UInt32(Vector512<float> value, FloatRoundingMode mode);
    public static Vector512<uint> ConvertToVector512UInt32WithTruncation(Vector512<float> value);

    public static Vector512<double> ConvertToVector512Double(Vector256<uint> value);
    public static Vector512<float> ConvertToVector512Single(Vector512<uint> value);

    public static Vector128<double> ConvertScalarToVector128Double(Vector128<double> upper, uint value);
    public static Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, uint value);

    public static Vector512<double> Fixup(Vector512<double> left, Vector512<double> right, Vector512<long> table);
    public static Vector512<float> Fixup(Vector512<float> left, Vector512<float> right, Vector512<int> table);

    public static Vector128<double> FixupScalar(Vector128<double> left, Vector128<double> right, Vector128<long> table);
    public static Vector128<float> FixupScalar(Vector128<float> left, Vector128<float> right, Vector128<int> table);

    public static Vector512<double> GatherVector512(double* baseAddress, Vector256<int> index, byte scale);
    public static Vector512<double> GatherVector512(double* baseAddress, Vector512<long> index, byte scale);

    public static Vector256<int> GatherVector256(int* baseAddress, Vector512<int> index, byte scale);
    public static Vector512<int> GatherVector512(int* baseAddress, Vector512<int> index, byte scale);

    public static Vector512<long> GatherVector512(void* baseAddress, Vector256<int> index, byte scale);
    public static Vector512<long> GatherVector512(void* baseAddress, Vector512<long> index, byte scale);

    public static Vector256<float> GatherVector256(float* baseAddress, Vector512<float> index, byte scale);
    public static Vector512<float> GatherVector512(void* baseAddress, Vector512<float> index, byte scale);

    public static Vector512<double> GetExponent(Vector512<double> value);
    public static Vector512<float> GetExponent(Vector512<float> value);

    public static Vector128<double> GetExponentScalar(Vector128<double> upper, Vector128<double> value);
    public static Vector128<float> GetExponentScalar(Vector128<float> upper, Vector128<float> value);

    public static Vector512<double> GetMantissa(Vector512<double> value, byte interval, byte signControl);
    public static Vector512<double> GetMantissa(Vector512<double> value, byte interval, byte signControl, FloatRoundingMode mode);

    public static Vector512<float> GetMantissa(Vector512<float> value, byte interval, byte signControl);
    public static Vector512<float> GetMantissa(Vector512<float> value, byte interval, byte signControl, FloatRoundingMode mode);

    public static Vector128<double> GetMantissaScalar(Vector128<double> upper, Vector128<double> value, byte interval, byte signControl);
    public static Vector128<double> GetMantissaScalar(Vector128<double> upper, Vector128<double> value, byte interval, byte signControl, FloatRoundingMode mode);

    public static Vector128<float> GetMantissaScalar(Vector128<float> upper, Vector128<float> value, byte interval, byte signControl);
    public static Vector128<float> GetMantissaScalar(Vector128<float> upper, Vector128<float> value, byte interval, byte signControl, FloatRoundingMode mode);

    public static Vector512<double> PermuteVar8x64(Vector512<double> left, Vector512<double> right, Vector512<double> control);
    public static Vector512<long> PermuteVar8x64(Vector512<long> left, Vector512<long> right, Vector512<long> control);

    public static Vector512<int> PermuteVar16x32(Vector512<int> left, Vector512<int> right, Vector512<int> control);
    public static Vector512<float> PermuteVar16x32(Vector512<float> left, Vector512<float> right, Vector512<float> control);

    public static Vector512<int> RotateLeft(Vector512<int> value, byte count);
    public static Vector512<long> RotateLeft(Vector512<long> value, byte count);

    public static Vector512<int> RotateLeftVariable(Vector512<int> value, Vector512<int> count);
    public static Vector512<long> RotateLeftVariable(Vector512<long> value, Vector512<long> count);

    public static Vector512<int> RotateRight(Vector512<int> value, byte count);
    public static Vector512<long> RotateRight(Vector512<long> value, byte count);

    public static Vector512<int> RotateRightVariable(Vector512<int> value, Vector512<int> count);
    public static Vector512<long> RotateRightVariable(Vector512<long> value, Vector512<long> count);

    public static Vector512<double> Reciprocal14(Vector512<double> value);
    public static Vector512<float> Reciprocal14(Vector512<float> value);

    public static Vector128<double> Reciprocal14Scalar(Vector128<double> upper, Vector128<double> value);
    public static Vector128<float> Reciprocal14Scalar(Vector128<float> upper, Vector128<float> value);

    public static Vector512<double> RoundScale(Vector512<double> value, byte scale);
    public static Vector512<float> RoundScale(Vector512<float> value, byte scale);

    public static Vector128<double> RoundScaleScalar(Vector128<double> upper, Vector128<double> value, byte scale);
    public static Vector128<float> RoundScaleScalar(Vector128<float> upper, Vector128<float> value, byte scale);

    public static Vector512<double> ReciprocalSqrt14(Vector512<double> value);
    public static Vector512<float> ReciprocalSqrt14(Vector512<float> value);

    public static Vector128<double> ReciprocalSqrt14Scalar(Vector128<double> upper, Vector128<double> value);
    public static Vector128<float> ReciprocalSqrt14Scalar(Vector128<float> upper, Vector128<float> value);

    public static Vector512<double> Scale(Vector512<double> left, Vector512<double> right);
    public static Vector512<double> Scale(Vector512<double> left, Vector512<double> right, FloatRoundingMode mode);

    public static Vector512<float> Scale(Vector512<float> left, Vector512<float> right);
    public static Vector512<float> Scale(Vector512<float> left, Vector512<float> right, FloatRoundingMode mode);

    public static void Scatter(double* baseAddress, Vector256<int> index, byte scale, Vector512<double> value);
    public static void Scatter(double* baseAddress, Vector512<long> index, byte scale, Vector512<double> value);

    public static void Scatter(int* baseAddress, Vector512<int> index, byte scale, Vector512<int> value);
    public static void Scatter(int* baseAddress, Vector512<long> index, byte scale, Vector256<int> value);

    public static void Scatter(long* baseAddress, Vector256<int> index, byte scale, Vector512<long> value);
    public static void Scatter(long* baseAddress, Vector512<long> index, byte scale, Vector512<long> value);

    public static void Scatter(float* baseAddress, Vector512<int> index, byte scale, Vector512<float> value);
    public static void Scatter(float* baseAddress, Vector512<long> index, byte scale, Vector256<float> value);

    public static Vector512<double> Shuffle(Vector512<double> left, Vector512<double> right, byte control);
    public static Vector512<int> Shuffle(Vector512<int> left, Vector512<int> right, byte control);
    public static Vector512<long> Shuffle(Vector512<long> left, Vector512<long> right, byte control);
    public static Vector512<float> Shuffle(Vector512<float> left, Vector512<float> right, byte control);
    // unsigned types?

    public static Vector512<int> TernaryLogic(Vector512<int> left, Vector512<int> right, byte control);
    public static Vector512<long> TernaryLogic(Vector512<long> left, Vector512<long> right, byte control);
    // unsigned types?

    public new abstract partial class X64
    {
        public static new bool IsSupported { get; }

        // SSE-SSE4.2

        public static long ConvertToInt64(Vector128<double> value, FloatRoundingMode mode);
        public static long ConvertToInt64(Vector128<float> value, FloatRoundingMode mode);

        public static Vector128<double> ConvertScalarToVector128Double(Vector128<double> upper, long value, FloatRoundingMode mode);
        public static Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, long value, FloatRoundingMode mode);

        // AVX512

        public static ulong ConvertToUInt64(Vector128<double> value);
        public static ulong ConvertToUInt64(Vector128<double> value, FloatRoundingMode mode);
        public static ulong ConvertToUInt64(Vector128<float> value);
        public static ulong ConvertToUInt64(Vector128<float> value, FloatRoundingMode mode);

        public static ulong ConvertToUInt64WithTruncation(Vector128<double> value);
        public static ulong ConvertToUInt64WithTruncation(Vector128<float> value);

        public static Vector128<double> ConvertScalarToVector128Double(Vector128<double> upper, ulong value);
        public static Vector128<double> ConvertScalarToVector128Double(Vector128<double> upper, ulong value, FloatRoundingMode mode);

        public static Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, ulong value);
        public static Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, ulong value, FloatRoundingMode mode);
    }
}

@crsawyer
Copy link

Parameters for TernaryLogic appear incorrect. Intrinsic takes 3 operands plus an immediate.
Suggest changing:

    public static Vector512<int> TernaryLogic(Vector512<int> left, Vector512<int> right, byte control);
    public static Vector512<long> TernaryLogic(Vector512<long> left, Vector512<long> right, byte control);

To:

    public static Vector512<int> TernaryLogic(Vector512<int> a, Vector512<int> b, Vector512<int> c, byte control);
    public static Vector512<long> TernaryLogic(Vector512<long> a, Vector512<long> b, Vector512<long> c, byte control);

This would also affect similar areas of #74813

@tannergooding
Copy link
Member Author

Definitely possible that there are a couple APIs that are slightly incorrect. We'll ensure they're correct in the actual implementation.

@tannergooding
Copy link
Member Author

Most of this was successfully implemented for .NET 8. What didn't land is essentially just the "embedded rounding control" overloads (gather/scatter were also not done, but were superseded by #87097):

namespace System.Runtime.Intrinsics.X86;

public enum FloatRoundingMode : byte
{
    ToEven = 0x08,                // _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC
    ToNegativeInfinity = 0x09,    // _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC
    ToPositiveInfinity = 0x0A,    // _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC
    ToZero = 0x0B,                // _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC
}

public partial class Avx512F
{
    public static Vector512<double> Add(Vector512<double> left, Vector512<double> right, FloatRoundingMode mode);
    public static Vector512<float> Add(Vector512<float> left, Vector512<float> right, FloatRoundingMode mode);

    public static Vector128<double> AddScalar(Vector128<double> left, Vector128<double> right, FloatRoundingMode mode);
    public static Vector128<float> AddScalar(Vector128<float> left, Vector128<float> right, FloatRoundingMode mode);

    public static Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, Vector128<double> value, FloatRoundingMode mode);
    public static Vector128<double> ConvertScalarToVector128Double(Vector128<double> upper, Vector128<float> value, FloatRoundingMode mode);

    public static int ConvertToInt32(Vector128<double> value, FloatRoundingMode mode);
    public static int ConvertToInt32(Vector128<float> value, FloatRoundingMode mode);

    public static Vector256<int> ConvertToVector256Int32(Vector512<double> value, FloatRoundingMode mode);
    public static Vector512<float> ConvertToVector256Single(Vector512<int> value, FloatRoundingMode mode);
    public static Vector256<float> ConvertToVector256Single(Vector512<double> value, FloatRoundingMode mode);

    public static Vector512<double> ConvertToVector512Double(Vector256<float> value);
    public static Vector512<int> ConvertToVector512Int32(Vector512<float> value, FloatRoundingMode mode);

    public static Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, int value, FloatRoundingMode mode);

    public static Vector512<double> Divide(Vector512<double> left, Vector512<double> right, FloatRoundingMode mode);
    public static Vector512<float> Divide(Vector512<float> left, Vector512<float> right, FloatRoundingMode mode);

    public static Vector128<double> DivideScalar(Vector128<double> left, Vector128<double> right, FloatRoundingMode mode);
    public static Vector128<float> DivideScalar(Vector128<float> left, Vector128<float> right, FloatRoundingMode mode);


    public static Vector512<double> Multiply(Vector512<double> left, Vector512<double> right, FloatRoundingMode mode);
    public static Vector512<float> Multiply(Vector512<float> left, Vector512<float> right, FloatRoundingMode mode);

    public static Vector128<double> MultiplyScalar(Vector128<double> left, Vector128<double> right, FloatRoundingMode mode);
    public static Vector128<float> MultiplyScalar(Vector128<float> left, Vector128<float> right, FloatRoundingMode mode);

    public static Vector512<double> Sqrt(Vector512<double> value, FloatRoundingMode mode);
    public static Vector512<float> Sqrt(Vector512<float> value, FloatRoundingMode mode);

    public static Vector128<double> SqrtScalar(Vector128<double> upper, Vector128<double> value, FloatRoundingMode mode);
    public static Vector128<float> SqrtScalar(Vector128<float> upper, Vector128<float> value, FloatRoundingMode mode);

    public static Vector512<double> Subtract(Vector512<double> left, Vector512<double> right, FloatRoundingMode mode);
    public static Vector512<float> Subtract(Vector512<float> left, Vector512<float> right, FloatRoundingMode mode);

    public static Vector128<double> SubtractScalar(Vector128<double> left, Vector128<double> right, FloatRoundingMode mode);
    public static Vector128<float> SubtractScalar(Vector128<float> left, Vector128<float> right, FloatRoundingMode mode);

    // AVX512

    public static uint ConvertToUInt32(Vector128<double> value, FloatRoundingMode mode);
    public static uint ConvertToUInt32(Vector128<float> value, FloatRoundingMode mode);

    public static Vector256<uint> ConvertToVector256UInt32(Vector512<double> value, FloatRoundingMode mode);
    public static Vector512<uint> ConvertToVector512UInt32(Vector512<float> value, FloatRoundingMode mode);

    public static Vector512<double> GatherVector512(double* baseAddress, Vector256<int> index, byte scale);
    public static Vector512<double> GatherVector512(double* baseAddress, Vector512<long> index, byte scale);

    public static Vector256<int> GatherVector256(int* baseAddress, Vector512<int> index, byte scale);
    public static Vector512<int> GatherVector512(int* baseAddress, Vector512<int> index, byte scale);

    public static Vector512<long> GatherVector512(void* baseAddress, Vector256<int> index, byte scale);
    public static Vector512<long> GatherVector512(void* baseAddress, Vector512<long> index, byte scale);

    public static Vector256<float> GatherVector256(float* baseAddress, Vector512<float> index, byte scale);
    public static Vector512<float> GatherVector512(void* baseAddress, Vector512<float> index, byte scale);

    public static Vector512<double> GetMantissa(Vector512<double> value, byte interval, byte signControl, FloatRoundingMode mode);
    public static Vector512<float> GetMantissa(Vector512<float> value, byte interval, byte signControl, FloatRoundingMode mode);

    public static Vector128<double> GetMantissaScalar(Vector128<double> upper, Vector128<double> value, byte interval, byte signControl);
    public static Vector128<double> GetMantissaScalar(Vector128<double> upper, Vector128<double> value, byte interval, byte signControl, FloatRoundingMode mode);

    public static Vector128<float> GetMantissaScalar(Vector128<float> upper, Vector128<float> value, byte interval, byte signControl);
    public static Vector128<float> GetMantissaScalar(Vector128<float> upper, Vector128<float> value, byte interval, byte signControl, FloatRoundingMode mode);


    public static Vector512<double> Scale(Vector512<double> left, Vector512<double> right, FloatRoundingMode mode);
    public static Vector512<float> Scale(Vector512<float> left, Vector512<float> right, FloatRoundingMode mode);

    public static void Scatter(double* baseAddress, Vector256<int> index, byte scale, Vector512<double> value);
    public static void Scatter(double* baseAddress, Vector512<long> index, byte scale, Vector512<double> value);

    public static void Scatter(int* baseAddress, Vector512<int> index, byte scale, Vector512<int> value);
    public static void Scatter(int* baseAddress, Vector512<long> index, byte scale, Vector256<int> value);

    public static void Scatter(long* baseAddress, Vector256<int> index, byte scale, Vector512<long> value);
    public static void Scatter(long* baseAddress, Vector512<long> index, byte scale, Vector512<long> value);

    public static void Scatter(float* baseAddress, Vector512<int> index, byte scale, Vector512<float> value);
    public static void Scatter(float* baseAddress, Vector512<long> index, byte scale, Vector256<float> value);

    public partial class X64
    {
        public static long ConvertToInt64(Vector128<double> value, FloatRoundingMode mode);
        public static long ConvertToInt64(Vector128<float> value, FloatRoundingMode mode);

        public static Vector128<double> ConvertScalarToVector128Double(Vector128<double> upper, long value, FloatRoundingMode mode);
        public static Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, long value, FloatRoundingMode mode);

        // AVX512

        public static ulong ConvertToUInt64(Vector128<double> value, FloatRoundingMode mode);
        public static ulong ConvertToUInt64(Vector128<float> value, FloatRoundingMode mode);

        public static Vector128<double> ConvertScalarToVector128Double(Vector128<double> upper, ulong value, FloatRoundingMode mode);
        public static Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, ulong value, FloatRoundingMode mode);
    }
}

@sayurin
Copy link

sayurin commented Nov 5, 2023

Why Avx512F.ConvertToVector512Int64 has no overload version of pointer? Avx2.ConvertToVector256Int64 has byte* version, it generates VPMOVZXBQ ymm, m32.

@saucecontrol
Copy link
Member

saucecontrol commented Nov 5, 2023

Avx512F.ConvertToVector512Int64(Vector128.Load(byte*)) should emit vpmovzxbq zmm, m64, however it appears it does not.

image

Edit: This one definitely should have a pointer overload, for the reasons given in #28868

@sayurin
Copy link

sayurin commented Nov 6, 2023

Yes, I think Avx512F.ConvertToVector512Int64(byte*) is equals to

var xmm0 = Vector128.CreateScalar(*(ulong*)ptr).AsByte();
var zmm0 = Avx512F.ConvertToVector512Int64(xmm0);

But, JIT cannot combine this code into VPMOVZXBQ zmm0, qword ptr [addr].

@tannergooding
Copy link
Member Author

But, JIT cannot combine this code into VPMOVZXBQ zmm0, qword ptr [addr].

It could, it just doesn't today.

Edit: This one definitely should have a pointer overload, for the reasons given in #28868

Yes, most APIs that read less than 128-bits but support taking a vector register or memory operand need pointer overloads.

@Neme12
Copy link

Neme12 commented Mar 4, 2024

Why add AVX512 support now that Intel killed it? It's no longer enabled in new processors.

@tannergooding
Copy link
Member Author

Its not been killed and it is still available in new processors.

It was removed in their latest mobile and regular desktop SKUs, most notably when you have both Power and Efficiency cores. It remains in their Server SKUs and would be possible to expose in any Desktop SKU that only has Power cores.

Additionally, it is available in AMD Zen4 and is functionally part of AVX10, the new converged ISA, under which EVEX, masking, and the new instructions are available in 128-bit and 256-bit forms without the CPU being required to also provide 512-bit support, so providing the support still laid the groundwork for other processors and the general future of AVX support.

@tannergooding
Copy link
Member Author

The remaining work except for Gather/Scatter was completed in #97415

Gather/Scatter was removed with #86168 due to having an incorrect signature and is now tracked by #87097 instead

@github-actions github-actions bot locked and limited conversation to collaborators Jul 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.Intrinsics avx512 Related to the AVX-512 architecture
Projects
None yet
Development

No branches or pull requests

7 participants