Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR lowers the requirement for 256-bit wide vectors on x86/x86_64 platforms from AVX2 to AVX. #86 mistakenly assumes all of the operations are not available until AVX2, while in reality operations working on
__m256d
were viable from the start. The main difference is that the code doesn't really use the floating point operations on the type, so the two can be treated the same.Performance-wise, benchmarks were run and there were zero shown deviations in performance between AVX and AVX2 other than some other incidental speedups in non-set/batch operations.
Used the opportunity to clean up the block directory and deduplicate repeated code and clean up some of the cfg attributes. Also added the new compilation configurations to CI.