-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Popcnt vectorization #198
base: master
Are you sure you want to change the base?
Popcnt vectorization #198
Conversation
# get_filename_component(_cpu_id "[HKEY_LOCAL_MACHINE\\Hardware\\Description\\System\\CentralProcessor\\0;Identifier]" NAME CACHE) | ||
elseif(CMAKE_SYSTEM_NAME STREQUAL "Darwin") | ||
# handle MacOs | ||
execute_process(COMMAND sysctl -n machdep.cpu.features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Diego, thanks for your contribution. I'm just testing the code on a Mac equipped with a CPU (i7-4850HQ) which supports AVX2. Surprisingly the command sysctl -n machdep.cpu.features
does not list AVX2 as feature, but just:
FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C
. However AVX2 is listed in the output of sysctl -n machdep.cpu
:
13 2147483656 GenuineIntel Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz 6 70 4 0 1 3219913727 2147154943 12219 739248384 33 263777 0 FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C SMEP ENFSTRG RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE AVX2 BMI2 INVPCID RTM SYSCALL XD 1GBPAGE EM64T LAHF RDTSCP TSCI 16 8 15 5 64 64 3 270624 1 1 1 2 1 1 1 1 0 1 7 832 832 0 3 4 48 7 0 3 48 64 8 256 8 64 64 1024 39 48 4 8
So maybe just a match on the latter output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable :) I have run my code on Linux machines only, so I did not face this problem
It would be good to specialize bitvector_interleaved to use these operations if the blocksize%256==0. should result a nice speed improvement. |
Can one of the admins verify this patch? |
|
Hi Simon,
Same as in the other pull request, added the missing masks
Best regards,
Diego