Minor: Document SIMD rationale and tips #6554

alamb · 2024-10-13T13:03:13Z

Which issue does this PR close?

Closes #.

Rationale for this change

@tustvold wrote up some great tips / rationale on apache/datafusion#12821 (comment) that I thought would be good to add in the docs of this repo

What changes are included in this PR?

Add documentation on the rationale for not using manual SIMD, as well as tips/tricks to get the code to properly vectorize.

Are there any user-facing changes?

Just docs

findepi · 2024-10-13T20:30:35Z

arrow/CONTRIBUTING.md

+### Usage if SIMD / Auto vectorization
+
+This create does not use SIMD intrinsics (e.g. [`std::simd`] directly, but
+instead relies on LLVM's auto-vectorization.


"... on the compiler's ..." ?

(in fact, vectorization could be applied on Rust MIR level, before LLVM?)

Ill confess it is a while since i dug into rustc, but I would have thought MIR to be to high level to effectively perform auto-vectorisation which is extremely ISA specific, the best it could do would be to use LLVMs vector types, but general heiristics for doing this would be hard

findepi · 2024-10-13T20:30:59Z

arrow/CONTRIBUTING.md

+
+SIMD intrinsics are difficult to maintain and can be difficult to reason about.
+The auto-vectorizer in LLVM is quite good and often produces better code than
+hand-written manual uses of SIMD. In fact, this crate used to to have a fair


stuterred "to"

findepi · 2024-10-13T20:31:13Z

arrow/CONTRIBUTING.md

+The auto-vectorizer in LLVM is quite good and often produces better code than
+hand-written manual uses of SIMD. In fact, this crate used to to have a fair
+amount of manual SIMD, and over time we've removed it as the auto-vectorized
+code was faster.


was -> turned out ?

findepi · 2024-10-13T20:31:28Z

arrow/CONTRIBUTING.md

+LLVM is relatively good at vectorizing vertical operations provided:
+
+1. No conditionals within the loop body
+2. Not too much inlining , as the vectorizer gives up if the code is too complex


extra whitespace before ,

findepi · 2024-10-13T20:31:47Z

arrow/CONTRIBUTING.md

+
+1. No conditionals within the loop body
+2. Not too much inlining , as the vectorizer gives up if the code is too complex
+3. No bitwise horizontal reductions or masking


is "bitwise horizontal reductions" an obvious term?

It is a class of SIMD operations, I think if people don't know to what this refers, they probably aren't the audience for this

findepi · 2024-10-13T20:32:14Z

arrow/CONTRIBUTING.md

+1. No conditionals within the loop body
+2. Not too much inlining , as the vectorizer gives up if the code is too complex
+3. No bitwise horizontal reductions or masking
+4. You've enabled SIMD instructions in the target ISA (e.g. `target-cpu` `RUSTFLAGS` flag)


Prefer passive voice. "SIMD instructions are enabled in the target ISA"

findepi · 2024-10-13T20:32:35Z

arrow/CONTRIBUTING.md

+support many SIMD instructions. See the Performance Tips section at the
+end of <https://crates.io/crates/arrow>
+
+To ensure your code is fully vectorized, we recommend getting familiar with


your code -> the code

findepi · 2024-10-13T20:33:35Z

arrow/CONTRIBUTING.md

+end of <https://crates.io/crates/arrow>
+
+To ensure your code is fully vectorized, we recommend getting familiar with
+tools like <https://rust.godbolt.org/> (again being sure to set `RUSTFLAGS`) and


again being sure to set RUSTFLAGS

requires to set RUSTFLAGS properly

findepi · 2024-10-13T20:34:12Z

arrow/CONTRIBUTING.md

+tools like <https://rust.godbolt.org/> (again being sure to set `RUSTFLAGS`) and
+only once you've exhausted that avenue think of reaching for manual SIMD.
+Generally the hard part is getting the algorithm structured in such a way that
+it can be vectorized, regardless of what goes and generates those instructions.


maybe

Suggested change

it can be vectorized, regardless of what goes and generates those instructions.

it can be vectorized, regardless of what generates those instructions.

Minor: Document SIMD rationale and tips

773a0b0

alamb added the documentation Improvements or additions to documentation label Oct 13, 2024

github-actions bot added the arrow Changes to the arrow crate label Oct 13, 2024

alamb mentioned this pull request Oct 13, 2024

[DISCUSSION] Make DataFusion the fastest engine for querying parquet data in ClickBench apache/datafusion#12821

Open

findepi approved these changes Oct 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor: Document SIMD rationale and tips #6554

Minor: Document SIMD rationale and tips #6554

alamb commented Oct 13, 2024

findepi Oct 13, 2024

tustvold Oct 13, 2024 •

edited

Loading

findepi Oct 13, 2024

findepi Oct 13, 2024

findepi Oct 13, 2024

findepi Oct 13, 2024

tustvold Oct 13, 2024

findepi Oct 13, 2024

findepi Oct 13, 2024

findepi Oct 13, 2024

findepi Oct 13, 2024

	it can be vectorized, regardless of what goes and generates those instructions.
	it can be vectorized, regardless of what generates those instructions.

Minor: Document SIMD rationale and tips #6554

Are you sure you want to change the base?

Minor: Document SIMD rationale and tips #6554

Conversation

alamb commented Oct 13, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Choose a reason for hiding this comment

tustvold Oct 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold Oct 13, 2024 •

edited

Loading