Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use _mm_set1_epi{32,64x} to init mask in x86-64 [cz]asum #4330

Merged
merged 1 commit into from
Nov 20, 2023

Conversation

bartoldeman
Copy link
Contributor

for skylake kernels. This is the same method as used in [sd]asum. _mm_set1_epi64x was commented out for zasum, but has the advantage of avoiding possible undefined behaviour (using an uninitialized variable), optimized out by NVHPC and icx. The new code works fine with those compilers.

For GCC 12.3 the generated code is identical; no matter what method you use, the compiler optimizes the code into a compile-time constant, there is no performance benefit using mm_cmpeq_epi8 since the corresponding instruction (VPCMPEQB) isn't actually generated!

for skylake kernels. This is the same method as used in [sd]asum.
_mm_set1_epi64x was commented out for zasum, but has the advantage
of avoiding possible undefined behaviour (using an uninitialized
variable), optimized out by NVHPC and icx. The new code works
fine with those compilers.

For GCC 12.3 the generated code is identical; no matter what method
you use, the compiler optimizes the code into a compile-time
constant, there is no performance benefit using mm_cmpeq_epi8
since the corresponding instruction (VPCMPEQB) isn't actually
generated!
@bartoldeman
Copy link
Contributor Author

@bartoldeman
Copy link
Contributor Author

@xiegengxin If you are still around, do you remember why you commented out // abs_mask1 = (__m128d)_mm_set1_epi64x(0x7fffffffffffffff); ?

@martin-frbg martin-frbg added this to the 0.3.26 milestone Nov 19, 2023
@martin-frbg
Copy link
Collaborator

Eek, thanks. Learn something new every bug...

@martin-frbg martin-frbg merged commit 2ea65ba into OpenMathLib:develop Nov 20, 2023
62 of 64 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants