You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are options like core::arch::is_x86_feature_detected which can detect which instruction sets are available. Unfortunately the checks cannot be done inside each function call due to the cost of feature detection.
One potential way around this is to do feature detection during initialization, and use a tagged pointer to store the features detected. As any SIMD-supporting platform is at least 32-bit wide, there are at least two bits at the bottom of every pointer to a backing allocation that are always zero. If the default block size is increased to 8 to 64 bytes, the number of tag bits increases. An example mapping for x86 may include:
00 - Default, none detected.
01 - SSE2 detected
10 - SSE4.1 detected
11 - AVX detected
These bits can then be zeroed out on access in a branchless way, which should have a slight impact negative performance impact to point queries (contains, insert, etc.), but allow for the most performant instructions to be used without explicitly compiling for a particular target feature set.
The text was updated successfully, but these errors were encountered:
There are options like
core::arch::is_x86_feature_detected
which can detect which instruction sets are available. Unfortunately the checks cannot be done inside each function call due to the cost of feature detection.One potential way around this is to do feature detection during initialization, and use a tagged pointer to store the features detected. As any SIMD-supporting platform is at least 32-bit wide, there are at least two bits at the bottom of every pointer to a backing allocation that are always zero. If the default block size is increased to 8 to 64 bytes, the number of tag bits increases. An example mapping for x86 may include:
These bits can then be zeroed out on access in a branchless way, which should have a slight impact negative performance impact to point queries (contains, insert, etc.), but allow for the most performant instructions to be used without explicitly compiling for a particular target feature set.
The text was updated successfully, but these errors were encountered: