Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement StoreVectorNxM for Arm64 #94129

Merged
merged 6 commits into from
Nov 9, 2023

Conversation

SwapnilGaikwad
Copy link
Contributor

Contribute towards #84510

// ST3 (multiple structures)
public static unsafe void StoreVector128x3(byte*   address, (Vector128<byte>   Value1, Vector128<byte>   Value2, Vector128<byte>   Value3) value);
public static unsafe void StoreVector128x3(sbyte*  address, (Vector128<sbyte>  Value1, Vector128<sbyte>  Value2, Vector128<sbyte>  Value3) value);
public static unsafe void StoreVector128x3(short*  address, (Vector128<short>  Value1, Vector128<short>  Value2, Vector128<short>  Value3) value);
public static unsafe void StoreVector128x3(ushort* address, (Vector128<ushort> Value1, Vector128<ushort> Value2, Vector128<ushort> Value3) value);
public static unsafe void StoreVector128x3(int*    address, (Vector128<int>    Value1, Vector128<int>    Value2, Vector128<int>    Value3) value);
public static unsafe void StoreVector128x3(uint*   address, (Vector128<uint>   Value1, Vector128<uint>   Value2, Vector128<uint>   Value3) value);
public static unsafe void StoreVector128x3(long*   address, (Vector128<long>   Value1, Vector128<long>   Value2, Vector128<long>   Value3) value);
public static unsafe void StoreVector128x3(ulong*  address, (Vector128<ulong>  Value1, Vector128<ulong>  Value2, Vector128<ulong>  Value3) value);
public static unsafe void StoreVector128x3(float*  address, (Vector128<float>  Value1, Vector128<float>  Value2, Vector128<float>  Value3) value);
public static unsafe void StoreVector128x3(double* address, (Vector128<double> Value1, Vector128<double> Value2, Vector128<double> Value3) value);

public static unsafe void StoreVector64x3(byte*   address, (Vector64<byte>   Value1, Vector64<byte>   Value2, Vector64<byte>   Value3) value);
public static unsafe void StoreVector64x3(sbyte*  address, (Vector64<sbyte>  Value1, Vector64<sbyte>  Value2, Vector64<sbyte>  Value3) value);
public static unsafe void StoreVector64x3(short*  address, (Vector64<short>  Value1, Vector64<short>  Value2, Vector64<short>  Value3) value);
public static unsafe void StoreVector64x3(ushort* address, (Vector64<ushort> Value1, Vector64<ushort> Value2, Vector64<ushort> Value3) value);
public static unsafe void StoreVector64x3(int*    address, (Vector64<int>    Value1, Vector64<int>    Value2, Vector64<int>    Value3) value);
public static unsafe void StoreVector64x3(uint*   address, (Vector64<uint>   Value1, Vector64<uint>   Value2, Vector64<uint>   Value3) value);
public static unsafe void StoreVector64x3(float*  address, (Vector64<float>  Value1, Vector64<float>  Value2, Vector64<float>  Value3) value);

// ST4 (multiple structures)
public static unsafe void StoreVector128x4(byte*   address, (Vector128<byte>   Value1, Vector128<byte>   Value2, Vector128<byte>   Value3, Vector128<byte>   Value4) value);
public static unsafe void StoreVector128x4(sbyte*  address, (Vector128<sbyte>  Value1, Vector128<sbyte>  Value2, Vector128<sbyte>  Value3, Vector128<sbyte>  Value4) value);
public static unsafe void StoreVector128x4(short*  address, (Vector128<short>  Value1, Vector128<short>  Value2, Vector128<short>  Value3, Vector128<short>  Value4) value);
public static unsafe void StoreVector128x4(ushort* address, (Vector128<ushort> Value1, Vector128<ushort> Value2, Vector128<ushort> Value3, Vector128<ushort> Value4) value);
public static unsafe void StoreVector128x4(int*    address, (Vector128<int>    Value1, Vector128<int>    Value2, Vector128<int>    Value3, Vector128<int>    Value4) value);
public static unsafe void StoreVector128x4(uint*   address, (Vector128<uint>   Value1, Vector128<uint>   Value2, Vector128<uint>   Value3, Vector128<uint>   Value4) value);
public static unsafe void StoreVector128x4(long*   address, (Vector128<long>   Value1, Vector128<long>   Value2, Vector128<long>   Value3, Vector128<long>   Value4) value);
public static unsafe void StoreVector128x4(ulong*  address, (Vector128<ulong>  Value1, Vector128<ulong>  Value2, Vector128<ulong>  Value3, Vector128<ulong>  Value4) value);
public static unsafe void StoreVector128x4(float*  address, (Vector128<float>  Value1, Vector128<float>  Value2, Vector128<float>  Value3, Vector128<float>  Value4) value);
public static unsafe void StoreVector128x4(double* address, (Vector128<double> Value1, Vector128<double> Value2, Vector128<double> Value3, Vector128<double> Value4) value);

public static unsafe void StoreVector64x4(byte*   address, (Vector64<byte>   Value1, Vector64<byte>   Value2, Vector64<byte>   Value3, Vector64<byte>   Value4) value);
public static unsafe void StoreVector64x4(sbyte*  address, (Vector64<sbyte>  Value1, Vector64<sbyte>  Value2, Vector64<sbyte>  Value3, Vector64<sbyte>  Value4) value);
public static unsafe void StoreVector64x4(short*  address, (Vector64<short>  Value1, Vector64<short>  Value2, Vector64<short>  Value3, Vector64<short>  Value4) value);
public static unsafe void StoreVector64x4(ushort* address, (Vector64<ushort> Value1, Vector64<ushort> Value2, Vector64<ushort> Value3, Vector64<ushort> Value4) value);
public static unsafe void StoreVector64x4(int*    address, (Vector64<int>    Value1, Vector64<int>    Value2, Vector64<int>    Value3, Vector64<int>    Value4) value);
public static unsafe void StoreVector64x4(uint*   address, (Vector64<uint>   Value1, Vector64<uint>   Value2, Vector64<uint>   Value3, Vector64<uint>   Value4) value);
public static unsafe void StoreVector64x4(float*  address, (Vector64<float>  Value1, Vector64<float>  Value2, Vector64<float>  Value3, Vector64<float>  Value4) value);

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Oct 28, 2023
@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@ghost
Copy link

ghost commented Oct 28, 2023

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Issue Details

Contribute towards #84510

// ST3 (multiple structures)
public static unsafe void StoreVector128x3(byte*   address, (Vector128<byte>   Value1, Vector128<byte>   Value2, Vector128<byte>   Value3) value);
public static unsafe void StoreVector128x3(sbyte*  address, (Vector128<sbyte>  Value1, Vector128<sbyte>  Value2, Vector128<sbyte>  Value3) value);
public static unsafe void StoreVector128x3(short*  address, (Vector128<short>  Value1, Vector128<short>  Value2, Vector128<short>  Value3) value);
public static unsafe void StoreVector128x3(ushort* address, (Vector128<ushort> Value1, Vector128<ushort> Value2, Vector128<ushort> Value3) value);
public static unsafe void StoreVector128x3(int*    address, (Vector128<int>    Value1, Vector128<int>    Value2, Vector128<int>    Value3) value);
public static unsafe void StoreVector128x3(uint*   address, (Vector128<uint>   Value1, Vector128<uint>   Value2, Vector128<uint>   Value3) value);
public static unsafe void StoreVector128x3(long*   address, (Vector128<long>   Value1, Vector128<long>   Value2, Vector128<long>   Value3) value);
public static unsafe void StoreVector128x3(ulong*  address, (Vector128<ulong>  Value1, Vector128<ulong>  Value2, Vector128<ulong>  Value3) value);
public static unsafe void StoreVector128x3(float*  address, (Vector128<float>  Value1, Vector128<float>  Value2, Vector128<float>  Value3) value);
public static unsafe void StoreVector128x3(double* address, (Vector128<double> Value1, Vector128<double> Value2, Vector128<double> Value3) value);

public static unsafe void StoreVector64x3(byte*   address, (Vector64<byte>   Value1, Vector64<byte>   Value2, Vector64<byte>   Value3) value);
public static unsafe void StoreVector64x3(sbyte*  address, (Vector64<sbyte>  Value1, Vector64<sbyte>  Value2, Vector64<sbyte>  Value3) value);
public static unsafe void StoreVector64x3(short*  address, (Vector64<short>  Value1, Vector64<short>  Value2, Vector64<short>  Value3) value);
public static unsafe void StoreVector64x3(ushort* address, (Vector64<ushort> Value1, Vector64<ushort> Value2, Vector64<ushort> Value3) value);
public static unsafe void StoreVector64x3(int*    address, (Vector64<int>    Value1, Vector64<int>    Value2, Vector64<int>    Value3) value);
public static unsafe void StoreVector64x3(uint*   address, (Vector64<uint>   Value1, Vector64<uint>   Value2, Vector64<uint>   Value3) value);
public static unsafe void StoreVector64x3(float*  address, (Vector64<float>  Value1, Vector64<float>  Value2, Vector64<float>  Value3) value);

// ST4 (multiple structures)
public static unsafe void StoreVector128x4(byte*   address, (Vector128<byte>   Value1, Vector128<byte>   Value2, Vector128<byte>   Value3, Vector128<byte>   Value4) value);
public static unsafe void StoreVector128x4(sbyte*  address, (Vector128<sbyte>  Value1, Vector128<sbyte>  Value2, Vector128<sbyte>  Value3, Vector128<sbyte>  Value4) value);
public static unsafe void StoreVector128x4(short*  address, (Vector128<short>  Value1, Vector128<short>  Value2, Vector128<short>  Value3, Vector128<short>  Value4) value);
public static unsafe void StoreVector128x4(ushort* address, (Vector128<ushort> Value1, Vector128<ushort> Value2, Vector128<ushort> Value3, Vector128<ushort> Value4) value);
public static unsafe void StoreVector128x4(int*    address, (Vector128<int>    Value1, Vector128<int>    Value2, Vector128<int>    Value3, Vector128<int>    Value4) value);
public static unsafe void StoreVector128x4(uint*   address, (Vector128<uint>   Value1, Vector128<uint>   Value2, Vector128<uint>   Value3, Vector128<uint>   Value4) value);
public static unsafe void StoreVector128x4(long*   address, (Vector128<long>   Value1, Vector128<long>   Value2, Vector128<long>   Value3, Vector128<long>   Value4) value);
public static unsafe void StoreVector128x4(ulong*  address, (Vector128<ulong>  Value1, Vector128<ulong>  Value2, Vector128<ulong>  Value3, Vector128<ulong>  Value4) value);
public static unsafe void StoreVector128x4(float*  address, (Vector128<float>  Value1, Vector128<float>  Value2, Vector128<float>  Value3, Vector128<float>  Value4) value);
public static unsafe void StoreVector128x4(double* address, (Vector128<double> Value1, Vector128<double> Value2, Vector128<double> Value3, Vector128<double> Value4) value);

public static unsafe void StoreVector64x4(byte*   address, (Vector64<byte>   Value1, Vector64<byte>   Value2, Vector64<byte>   Value3, Vector64<byte>   Value4) value);
public static unsafe void StoreVector64x4(sbyte*  address, (Vector64<sbyte>  Value1, Vector64<sbyte>  Value2, Vector64<sbyte>  Value3, Vector64<sbyte>  Value4) value);
public static unsafe void StoreVector64x4(short*  address, (Vector64<short>  Value1, Vector64<short>  Value2, Vector64<short>  Value3, Vector64<short>  Value4) value);
public static unsafe void StoreVector64x4(ushort* address, (Vector64<ushort> Value1, Vector64<ushort> Value2, Vector64<ushort> Value3, Vector64<ushort> Value4) value);
public static unsafe void StoreVector64x4(int*    address, (Vector64<int>    Value1, Vector64<int>    Value2, Vector64<int>    Value3, Vector64<int>    Value4) value);
public static unsafe void StoreVector64x4(uint*   address, (Vector64<uint>   Value1, Vector64<uint>   Value2, Vector64<uint>   Value3, Vector64<uint>   Value4) value);
public static unsafe void StoreVector64x4(float*  address, (Vector64<float>  Value1, Vector64<float>  Value2, Vector64<float>  Value3, Vector64<float>  Value4) value);
Author: SwapnilGaikwad
Assignees: -
Labels:

area-System.Runtime.Intrinsics, new-api-needs-documentation, community-contribution

Milestone: -

@SwapnilGaikwad
Copy link
Contributor Author

Hi @kunalspathak , would you prefer an overloaded name for the interleaved multi-structure stores like we did for StoreSelectedScalar? If yes then what should we call such methods? Store or StoreVector may not convey the interleaving nature of underlying store instructions. 🤔

@kunalspathak
Copy link
Member

for the interleaved multi-structure stores

sorry, but could you confirm which APIs are you asking about?

@SwapnilGaikwad
Copy link
Contributor Author

for the interleaved multi-structure stores

sorry, but could you confirm which APIs are you asking about?

The StoreVectorNX2, StoreVectorNx3, StoreVectorNx3 that result in ST2, ST3 and ST4 instructions respectively.

@SwapnilGaikwad
Copy link
Contributor Author

SwapnilGaikwad commented Oct 30, 2023

Hi @kunalspathak,

Could you please confirm if my understanding is correct?

  • LoadVector(N)x(M) is mapped to LD(M). E.g., LoadVector128x3 to LD3

  • StoreVector(N)x(M) should map to ST(M), E.g., StoreVector128x3 to ST3

  • StoreVector(N)x(M)AndZip should map to ST1 with M registers.

If this is correct, then aren't these names a little confusing?
LD2/3/4 would unzip/de-interleave while loading the input registers while LD1 will load given input vectors 2/3/4 consecutively.
Thus, wouldn't it make more sense for LoadVector(N)x(M) to emit LD1 with M input values and vice-versa for LoadVector(N)x(M)AndZip?

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes looks good, some minor fixes needed.

@ghost ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Oct 31, 2023
@kunalspathak
Copy link
Member

Thus, wouldn't it make more sense for LoadVector(N)x(M) to emit LD1 with M input values and vice-versa for LoadVector(N)x(M)AndZip?

Looking carefully, yes, you are right. I swapped the two. I will send a PR to fix it. Thanks for spotting it.

@SwapnilGaikwad
Copy link
Contributor Author

Thus, wouldn't it make more sense for LoadVector(N)x(M) to emit LD1 with M input values and vice-versa for LoadVector(N)x(M)AndZip?

Looking carefully, yes, you are right. I swapped the two. I will send a PR to fix it. Thanks for spotting it.

Cool, I'll update this PR accordingly 👍

@ghost ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Nov 1, 2023
@SwapnilGaikwad
Copy link
Contributor Author

After merging #93223, I'll rebase/merge this PR and then mark it ready for review.

@SwapnilGaikwad
Copy link
Contributor Author

After merging #93223, I'll rebase/merge this PR and then mark it ready for review.

As the #93223 taking longer, I'll mark this PR ready for review. If this can progress further quickly then we can merge this. I'll rebase/merge these changes wherever necessary.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a request to update the documentation for LoadVector equivalent.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kunalspathak kunalspathak merged commit e138ff1 into dotnet:main Nov 9, 2023
192 of 195 checks passed
@SwapnilGaikwad SwapnilGaikwad deleted the github-st3-st4 branch November 9, 2023 20:07
@github-actions github-actions bot locked and limited conversation to collaborators Dec 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Runtime.Intrinsics community-contribution Indicates that the PR has been added by a community member new-api-needs-documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants