-
Notifications
You must be signed in to change notification settings - Fork 758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor: Document SIMD rationale and tips #6554
base: master
Are you sure you want to change the base?
Conversation
### Usage if SIMD / Auto vectorization | ||
|
||
This create does not use SIMD intrinsics (e.g. [`std::simd`] directly, but | ||
instead relies on LLVM's auto-vectorization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"... on the compiler's ..." ?
(in fact, vectorization could be applied on Rust MIR level, before LLVM?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ill confess it is a while since i dug into rustc, but I would have thought MIR to be to high level to effectively perform auto-vectorisation which is extremely ISA specific, the best it could do would be to use LLVMs vector types, but general heiristics for doing this would be hard
|
||
SIMD intrinsics are difficult to maintain and can be difficult to reason about. | ||
The auto-vectorizer in LLVM is quite good and often produces better code than | ||
hand-written manual uses of SIMD. In fact, this crate used to to have a fair |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stuterred "to"
The auto-vectorizer in LLVM is quite good and often produces better code than | ||
hand-written manual uses of SIMD. In fact, this crate used to to have a fair | ||
amount of manual SIMD, and over time we've removed it as the auto-vectorized | ||
code was faster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was -> turned out ?
LLVM is relatively good at vectorizing vertical operations provided: | ||
|
||
1. No conditionals within the loop body | ||
2. Not too much inlining , as the vectorizer gives up if the code is too complex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra whitespace before ,
|
||
1. No conditionals within the loop body | ||
2. Not too much inlining , as the vectorizer gives up if the code is too complex | ||
3. No bitwise horizontal reductions or masking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is "bitwise horizontal reductions" an obvious term?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a class of SIMD operations, I think if people don't know to what this refers, they probably aren't the audience for this
1. No conditionals within the loop body | ||
2. Not too much inlining , as the vectorizer gives up if the code is too complex | ||
3. No bitwise horizontal reductions or masking | ||
4. You've enabled SIMD instructions in the target ISA (e.g. `target-cpu` `RUSTFLAGS` flag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer passive voice. "SIMD instructions are enabled in the target ISA"
support many SIMD instructions. See the Performance Tips section at the | ||
end of <https://crates.io/crates/arrow> | ||
|
||
To ensure your code is fully vectorized, we recommend getting familiar with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your code -> the code
end of <https://crates.io/crates/arrow> | ||
|
||
To ensure your code is fully vectorized, we recommend getting familiar with | ||
tools like <https://rust.godbolt.org/> (again being sure to set `RUSTFLAGS`) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again being sure to set
RUSTFLAGS
requires to set RUSTFLAGS
properly
tools like <https://rust.godbolt.org/> (again being sure to set `RUSTFLAGS`) and | ||
only once you've exhausted that avenue think of reaching for manual SIMD. | ||
Generally the hard part is getting the algorithm structured in such a way that | ||
it can be vectorized, regardless of what goes and generates those instructions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe
it can be vectorized, regardless of what goes and generates those instructions. | |
it can be vectorized, regardless of what generates those instructions. |
Which issue does this PR close?
Closes #.
Rationale for this change
@tustvold wrote up some great tips / rationale on apache/datafusion#12821 (comment) that I thought would be good to add in the docs of this repo
What changes are included in this PR?
Add documentation on the rationale for not using manual SIMD, as well as tips/tricks to get the code to properly vectorize.
Are there any user-facing changes?
Just docs