Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<bit>: popcount() utilizes cnt instruction on arm64 #2127

Merged
merged 16 commits into from
Sep 11, 2021

Conversation

fsb4000
Copy link
Contributor

@fsb4000 fsb4000 commented Aug 15, 2021

Fixes #1924

I didn't test it.

godbolt: https://godbolt.org/z/zj3ojEeaM

@fsb4000 fsb4000 requested a review from a team as a code owner August 15, 2021 19:58
@fsb4000 fsb4000 changed the title utilize cnt instruction on arm64 <bit>: popcount() utilizes cnt instruction on arm64 Aug 15, 2021
stl/inc/limits Outdated Show resolved Hide resolved
Co-authored-by: Alex Guteniev <[email protected]>
stl/inc/limits Outdated Show resolved Hide resolved
@fsb4000

This comment has been minimized.

@AlexGuteniev

This comment has been minimized.

@StephanTLavavej StephanTLavavej added ARM64 Related to the ARM64 architecture performance Must go faster labels Aug 16, 2021
stl/inc/limits Outdated Show resolved Hide resolved
stl/inc/limits Outdated Show resolved Hide resolved
stl/inc/limits Outdated Show resolved Hide resolved
stl/inc/limits Outdated Show resolved Hide resolved
stl/inc/limits Outdated Show resolved Hide resolved
stl/inc/limits Outdated Show resolved Hide resolved
stl/inc/limits Outdated Show resolved Hide resolved
stl/inc/limits Outdated Show resolved Hide resolved
@StephanTLavavej StephanTLavavej removed their assignment Sep 8, 2021
@StephanTLavavej
Copy link
Member

@fsb4000 Thanks! I've pushed a merge with main (no conflicts), a comment cleanup, and a preprocessor change to avoid an empty controlled statement (which restores/extends the original logic).

Also FYI @barcharcraz @AlexGuteniev.

@StephanTLavavej StephanTLavavej self-assigned this Sep 10, 2021
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo now. Please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit 5b0fb2e into microsoft:main Sep 11, 2021
@StephanTLavavej
Copy link
Member

Thanks for improving ARM64 codegen! 🦾 🚀 ✔️

@fsb4000 fsb4000 deleted the fix1924 branch September 11, 2021 03:47
PeterJohnson added a commit to wpilibsuite/opencv that referenced this pull request Aug 29, 2023
MSVC on arm64 doesn't have a __popcnt intrinsic.
Use NEON instructions instead (core implementation from
microsoft/STL#2127).
PeterJohnson added a commit to PeterJohnson/openjpeg that referenced this pull request Sep 7, 2023
Use NEON instructions for ARM64 (implementation based on microsoft/STL#2127).

Godbolt output here: https://godbolt.org/z/q7GPTqT14
rouault pushed a commit to uclouvain/openjpeg that referenced this pull request Dec 9, 2023
Use NEON instructions for ARM64 (implementation based on microsoft/STL#2127).

Godbolt output here: https://godbolt.org/z/q7GPTqT14
asmorkalov pushed a commit to opencv/opencv that referenced this pull request Dec 10, 2023
ht_dec.c: Improve MSVC arm64 popcount performance #24205

Use NEON instructions for ARM64 (implementation based on microsoft/STL#2127, which is Apache licensed).

Godbolt output here: https://godbolt.org/z/q7GPTqT14
Related patch to openjpeg: uclouvain/openjpeg#1479

### Pull Request Readiness Checklist

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
thewoz pushed a commit to thewoz/opencv that referenced this pull request May 29, 2024
ht_dec.c: Improve MSVC arm64 popcount performance opencv#24205

Use NEON instructions for ARM64 (implementation based on microsoft/STL#2127, which is Apache licensed).

Godbolt output here: https://godbolt.org/z/q7GPTqT14
Related patch to openjpeg: uclouvain/openjpeg#1479

### Pull Request Readiness Checklist

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ARM64 Related to the ARM64 architecture performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

<bit>: popcount() does not utilize cnt instruction on arm64
4 participants