Allow simd_bitmask to return byte arrays #88868

calebzulawski · 2021-09-11T19:21:35Z

cc @rust-lang/project-portable-simd @workingjubilee

rust-highfive · 2021-09-11T19:21:38Z

(rust-highfive has picked a reviewer for you, use r? to override)

calebzulawski · 2021-09-11T19:22:31Z

compiler/rustc_codegen_llvm/src/intrinsic.rs

+
+                // Convert the integer to a byte array
+                let ptr = bx.alloca(bx.type_ix(expected_bytes * 8), Align::ONE);
+                bx.store(ze, ptr, Align::ONE);


For anyone more familiar with this than me: is there any reason this alignment or the one below must be greater than a byte?

Well, when these are handled by SSE or AVX512 instructions, they are normal integers of 16 to 64 bits. Even though we are saying we want to interpret this as a byte array, we should probably use the natural word-alignment of the processor: that way, kmovq mem, reg (reg to mem) naturally hits the alignment to pick it up in a GPR correctly, and if we for some reason are executing AVX512 features on a 32-bit processor, we still are correctly aligned for such (since then it would go into two GPRs).

I don't think anything is preventing LLVM from using a greater alignment if it's more efficient, but the actual alignment is just 1, right? Regular u8 arrays are definitely used with larger alignments in most situations, but their alignment is still 1.

As long as you're coherent about this (I believe PlaceRef tracks it for you, so it may be worth trying to use it more), you can increase the alignment of the alloca (and then pass that alignment to the store/load - whether manually, or relying on the PlaceRef API, or letting LLVM infer it).

I'm not sure how useful it is here, but it may make LLVM more likely to generate the aligned versions of instructions that have both aligned and unaligned variants (assuming LLVM doesn't rewrite this cast away).

Also, if you're using allocas, please consider using place.storage_{live,dead}() before/after all of the memory operations on the alloca - again I'm unsure whether this is significant in the context (since e.g. a small wrapper that gets inlined will do the right thing itself during LLVM inlining), but the default without those (i.e. without llvm.lifetime.{start,end}) is every alloca adds to the stack frame size (for the entire function).

Hmmm. At least hypothetically this should in fact be "a small wrapper that gets inlined". I don't think either of us is familiar with the PlaceRef API tho'. But that shouldn't be a blocking concern.

calebzulawski · 2021-09-12T02:40:11Z

Forgot about simd_select_bitmask--just updated that too.

programmerjake · 2021-09-13T02:44:39Z

src/test/ui/simd/simd-bitmask.rs

+        let r = simd_select_bitmask(0b0101000000001100u16, a, b);
+        assert_eq!(r.0, e);
+
+        let r = simd_select_bitmask([0b1100u8, 0b01010000u8], a, b);


won't this break on big-endian?

I'm not exactly sure how LLVM handles the bitcast. We've actually seen the opposite issue (bit order, not byte order) on BE: https://github.com/rust-lang/portable-simd/blob/master/crates/core_simd/src/masks/full_masks.rs#L118-L125

bitcasts in llvm are supposed to be semantically equivalent to a memcpy iirc

The ‘bitcast’ instruction converts value to type ty2. It is always a no-op cast because no bits change with this conversion. The conversion is done as if the value had been stored to memory and read back as type ty2.

try running tests on power64 (not power64le), it'll likely break

Additional conversation: https://rust-lang.zulipchat.com/#narrow/stream/257879-project-portable-simd/topic/Meeting.202021-09-13/near/253161847

Inlined from conversation, relevant snippets from @programmerjake:

iirc the issue is that architectures have both a bit-level endianness and a byte-level endianness, both of which can independently vary
install qemu-user and use that to run tests†

† for ppc64

I am not as sure that bit-endianness should be a problem, however, per the LLVM LangRef: Vector Type:

In general vector elements are laid out in memory in the same way as array types. Such an analogy works fine as long as the vector elements are byte sized. However, when the elements of the vector aren’t byte sized it gets a bit more complicated. One way to describe the layout is by describing what happens when a vector such as is bitcasted to an integer type with N*M bits, and then following the rules for storing such an integer to memory.

I think the greater problem might be that it is not clear the i1 to integer bitcast was ever well-formed.

That said, the LLVM LangRef on bitcasts also expounds on vectors:

There is a caveat for bitcasts involving vector types in relation to endianess. For example bitcast <2 x i8> to i16 puts element zero of the vector in the least significant bits of the i16 for little-endian while element zero ends up in the most significant bits for big-endian.

Hmm, I found a comment in a test that completely confirms @programmerjake's remarks, actually:

// ignore-endian-big behavior of simd_select_bitmask is endian-specific

🙃

I just copied this to the test. We can work out the endianness (and add tests if necessary) in another issue.

workingjubilee

r? @workingjubilee

workingjubilee · 2021-10-04T22:21:17Z

src/test/ui/simd/simd-bitmask.rs

+        let r = simd_select_bitmask(0b0101000000001100u16, a, b);
+        assert_eq!(r.0, e);
+
+        let r = simd_select_bitmask([0b1100u8, 0b01010000u8], a, b);


Inlined from conversation, relevant snippets from @programmerjake:

iirc the issue is that architectures have both a bit-level endianness and a byte-level endianness, both of which can independently vary
install qemu-user and use that to run tests†

† for ppc64

I am not as sure that bit-endianness should be a problem, however, per the LLVM LangRef: Vector Type:

In general vector elements are laid out in memory in the same way as array types. Such an analogy works fine as long as the vector elements are byte sized. However, when the elements of the vector aren’t byte sized it gets a bit more complicated. One way to describe the layout is by describing what happens when a vector such as is bitcasted to an integer type with N*M bits, and then following the rules for storing such an integer to memory.

I think the greater problem might be that it is not clear the i1 to integer bitcast was ever well-formed.

That said, the LLVM LangRef on bitcasts also expounds on vectors:

There is a caveat for bitcasts involving vector types in relation to endianess. For example bitcast <2 x i8> to i16 puts element zero of the vector in the least significant bits of the i16 for little-endian while element zero ends up in the most significant bits for big-endian.

workingjubilee · 2021-10-05T00:28:08Z

odd.
r? @workingjubilee

bors · 2021-10-07T20:36:57Z

☔ The latest upstream changes (presumably #89629) made this pull request unmergeable. Please resolve the merge conflicts.

JohnCSimon · 2021-10-24T23:05:37Z

triage: back to author for merge conflicts

calebzulawski · 2021-11-06T02:40:23Z

@rustbot label -S-waiting-on-author +S-waiting-on-review

workingjubilee · 2021-11-10T01:38:10Z

Hm.
I have thought about this and I am fine proceeding without an explicit test for big-endian, since we may as well be testing that issue in our high level API instead, but I would appreciate it if you documented that for simd_bitmask

currently, all tier 1 (and thus "little endian") platforms emit a predictable representation
but the exact representation for other platforms, especially big endian ones, is less clear, thus
in any case, the repr is platform-dependent as a result, and
should only be considered consistent with simd_select_bitmask

somewhere in either the associated tests or the codegen infrastructure itself.

workingjubilee · 2021-11-10T04:11:31Z

I gave it a spin on std::arch's codegen tests and it seems fine. Let's see how it plays against CI.
@bors r+

bors · 2021-11-10T04:11:33Z

📌 Commit fe8ae57 has been approved by workingjubilee

…askrgr Rollup of 5 pull requests Successful merges: - rust-lang#88447 (Use computed visibility in rustdoc) - rust-lang#88868 (Allow simd_bitmask to return byte arrays) - rust-lang#90727 (Remove potential useless data for search index) - rust-lang#90742 (Use AddAssign impl) - rust-lang#90758 (Fix collections entry API documentation.) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup

rust-highfive assigned matthewjasper Sep 11, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 11, 2021

calebzulawski commented Sep 11, 2021

View reviewed changes

calebzulawski force-pushed the feature/simd_bitmask branch from 090336f to 18691ee Compare September 12, 2021 02:39

calebzulawski mentioned this pull request Sep 13, 2021

Use new bitmask intrinsics with byte arrays rust-lang/portable-simd#159

Merged

programmerjake reviewed Sep 13, 2021

View reviewed changes

apiraino added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Sep 13, 2021

workingjubilee added the A-SIMD Area: SIMD (Single Instruction Multiple Data) label Sep 13, 2021

JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 3, 2021

workingjubilee reviewed Oct 4, 2021

View reviewed changes

rust-highfive assigned workingjubilee and unassigned matthewjasper Oct 5, 2021

JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 24, 2021

calebzulawski added 3 commits November 6, 2021 02:08

Allow simd_bitmask to return byte arrays

7964942

Allow simd_select_bitmask to take byte arrays

3981ca0

Fix off-by-one error uncovered by std::simd tests

569c51d

calebzulawski force-pushed the feature/simd_bitmask branch from 683fa8e to 569c51d Compare November 6, 2021 02:22

Disable bitmask test on big endian

1e18869

This comment has been minimized.

Sign in to view

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 6, 2021

Add comment regarding bit order

fe8ae57

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 10, 2021

matthiaskrgr mentioned this pull request Nov 10, 2021

Rollup of 5 pull requests #90769

Merged

bors merged commit 858fea4 into rust-lang:master Nov 10, 2021

rustbot added this to the 1.58.0 milestone Nov 10, 2021

jyn514 mentioned this pull request Nov 21, 2021

Use computed visibility in rustdoc #88447

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow simd_bitmask to return byte arrays #88868

Allow simd_bitmask to return byte arrays #88868

calebzulawski commented Sep 11, 2021

rust-highfive commented Sep 11, 2021

calebzulawski Sep 11, 2021

workingjubilee Oct 4, 2021

calebzulawski Nov 6, 2021

eddyb Nov 9, 2021

workingjubilee Nov 10, 2021

calebzulawski commented Sep 12, 2021

programmerjake Sep 13, 2021

calebzulawski Sep 13, 2021

programmerjake Sep 13, 2021 •

edited

Loading

programmerjake Sep 13, 2021

programmerjake Sep 13, 2021

workingjubilee Oct 4, 2021

workingjubilee Oct 5, 2021

calebzulawski Nov 6, 2021

workingjubilee left a comment

workingjubilee Oct 4, 2021

workingjubilee commented Oct 5, 2021

bors commented Oct 7, 2021

JohnCSimon commented Oct 24, 2021

This comment has been minimized.

This comment has been minimized.

calebzulawski commented Nov 6, 2021

workingjubilee commented Nov 10, 2021

workingjubilee commented Nov 10, 2021

bors commented Nov 10, 2021

Allow simd_bitmask to return byte arrays #88868

Allow simd_bitmask to return byte arrays #88868

Conversation

calebzulawski commented Sep 11, 2021

rust-highfive commented Sep 11, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

calebzulawski commented Sep 12, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

programmerjake Sep 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

workingjubilee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

workingjubilee commented Oct 5, 2021

bors commented Oct 7, 2021

JohnCSimon commented Oct 24, 2021

This comment has been minimized.

This comment has been minimized.

calebzulawski commented Nov 6, 2021

workingjubilee commented Nov 10, 2021

workingjubilee commented Nov 10, 2021

bors commented Nov 10, 2021

programmerjake Sep 13, 2021 •

edited

Loading