Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AArch64: Implement arraycmp evaluator #6904

Merged
merged 1 commit into from
Mar 2, 2023

Conversation

Akira1Saitoh
Copy link
Contributor

This commit implements arraycmp evaluator.
It implements two variants. One of them returns the length of the identical data from the beginning of the arrays.
The other returns 2/0/1 when the first array is greater than/equal to/less than the second array.
The main loop reads a 16-byte chunk from the both array in a single interation. It uses ldp instruction to read 16-byte data into two 64-bit registers. Then, it compares each 64-bit registers to find mismatch. If any mismatch is found in 16-byte chunks, the bit position of the first mismatch is searched by clz instruction for the variant returning the length. If no mismatch is found in 16-byte chunks and there is still remaining data to be compared (which is smaller than 16 bytes),
the secondary loop, which reads a single byte in each iteration, is executed.

@Akira1Saitoh Akira1Saitoh force-pushed the aarch64ArraycmpLDP branch 6 times, most recently from a64ae5c to 80ec4b1 Compare February 28, 2023 00:56
@Akira1Saitoh Akira1Saitoh marked this pull request as ready for review March 2, 2023 02:59
This commit implements arraycmp evaluator.
It implements two variants. One of them returns the length of the identical data
from the beginning of the arrays.
The other returns 2/0/1 when the first array is greater than/equal to/less than
the second array.
The main loop reads a 16-byte chunk from the both array in a single interation.
It uses ldp instruction to read 16-byte data into two 64-bit registers.
Then, it compares each 64-bit registers to find mismatch.
If any mismatch is found in 16-byte chunks, the bit position of
the first mismatch is searched by clz instruction for the variant returning the length.
If no mismatch is found in 16-byte chunks and there is still remaining data
to be compared (which is smaller than 16 bytes),
the secondary loop, which reads a single byte in each iteration, is executed.

Signed-off-by: Akira Saitoh <[email protected]>
@knn-k
Copy link
Contributor

knn-k commented Mar 2, 2023

Jenkins build aarch64,amac

@knn-k
Copy link
Contributor

knn-k commented Mar 2, 2023

See Issue #6516 for the socket test failure on macOS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants