Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: Document SIMD rationale and tips #6554

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Oct 13, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

@tustvold wrote up some great tips / rationale on apache/datafusion#12821 (comment) that I thought would be good to add in the docs of this repo

What changes are included in this PR?

Add documentation on the rationale for not using manual SIMD, as well as tips/tricks to get the code to properly vectorize.

Are there any user-facing changes?

Just docs

@alamb alamb added the documentation Improvements or additions to documentation label Oct 13, 2024
@github-actions github-actions bot added the arrow Changes to the arrow crate label Oct 13, 2024
### Usage if SIMD / Auto vectorization

This create does not use SIMD intrinsics (e.g. [`std::simd`] directly, but
instead relies on LLVM's auto-vectorization.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... on the compiler's ..." ?

(in fact, vectorization could be applied on Rust MIR level, before LLVM?)

Copy link
Contributor

@tustvold tustvold Oct 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ill confess it is a while since i dug into rustc, but I would have thought MIR to be to high level to effectively perform auto-vectorisation which is extremely ISA specific, the best it could do would be to use LLVMs vector types, but general heiristics for doing this would be hard


SIMD intrinsics are difficult to maintain and can be difficult to reason about.
The auto-vectorizer in LLVM is quite good and often produces better code than
hand-written manual uses of SIMD. In fact, this crate used to to have a fair
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stuterred "to"

The auto-vectorizer in LLVM is quite good and often produces better code than
hand-written manual uses of SIMD. In fact, this crate used to to have a fair
amount of manual SIMD, and over time we've removed it as the auto-vectorized
code was faster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was -> turned out ?

LLVM is relatively good at vectorizing vertical operations provided:

1. No conditionals within the loop body
2. Not too much inlining , as the vectorizer gives up if the code is too complex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra whitespace before ,


1. No conditionals within the loop body
2. Not too much inlining , as the vectorizer gives up if the code is too complex
3. No bitwise horizontal reductions or masking
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "bitwise horizontal reductions" an obvious term?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a class of SIMD operations, I think if people don't know to what this refers, they probably aren't the audience for this

1. No conditionals within the loop body
2. Not too much inlining , as the vectorizer gives up if the code is too complex
3. No bitwise horizontal reductions or masking
4. You've enabled SIMD instructions in the target ISA (e.g. `target-cpu` `RUSTFLAGS` flag)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer passive voice. "SIMD instructions are enabled in the target ISA"

support many SIMD instructions. See the Performance Tips section at the
end of <https://crates.io/crates/arrow>

To ensure your code is fully vectorized, we recommend getting familiar with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your code -> the code

end of <https://crates.io/crates/arrow>

To ensure your code is fully vectorized, we recommend getting familiar with
tools like <https://rust.godbolt.org/> (again being sure to set `RUSTFLAGS`) and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again being sure to set RUSTFLAGS

requires to set RUSTFLAGS properly

tools like <https://rust.godbolt.org/> (again being sure to set `RUSTFLAGS`) and
only once you've exhausted that avenue think of reaching for manual SIMD.
Generally the hard part is getting the algorithm structured in such a way that
it can be vectorized, regardless of what goes and generates those instructions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe

Suggested change
it can be vectorized, regardless of what goes and generates those instructions.
it can be vectorized, regardless of what generates those instructions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants