-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow simd_bitmask to return byte arrays #88868
Conversation
(rust-highfive has picked a reviewer for you, use r? to override) |
|
||
// Convert the integer to a byte array | ||
let ptr = bx.alloca(bx.type_ix(expected_bytes * 8), Align::ONE); | ||
bx.store(ze, ptr, Align::ONE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For anyone more familiar with this than me: is there any reason this alignment or the one below must be greater than a byte?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, when these are handled by SSE or AVX512 instructions, they are normal integers of 16 to 64 bits. Even though we are saying we want to interpret this as a byte array, we should probably use the natural word-alignment of the processor: that way, kmovq mem, reg
(reg to mem) naturally hits the alignment to pick it up in a GPR correctly, and if we for some reason are executing AVX512 features on a 32-bit processor, we still are correctly aligned for such (since then it would go into two GPRs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think anything is preventing LLVM from using a greater alignment if it's more efficient, but the actual alignment is just 1, right? Regular u8
arrays are definitely used with larger alignments in most situations, but their alignment is still 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as you're coherent about this (I believe PlaceRef
tracks it for you, so it may be worth trying to use it more), you can increase the alignment of the alloca
(and then pass that alignment to the store
/load
- whether manually, or relying on the PlaceRef
API, or letting LLVM infer it).
I'm not sure how useful it is here, but it may make LLVM more likely to generate the aligned versions of instructions that have both aligned and unaligned variants (assuming LLVM doesn't rewrite this cast away).
Also, if you're using alloca
s, please consider using place.storage_{live,dead}()
before/after all of the memory operations on the alloca
- again I'm unsure whether this is significant in the context (since e.g. a small wrapper that gets inlined will do the right thing itself during LLVM inlining), but the default without those (i.e. without llvm.lifetime.{start,end}
) is every alloca
adds to the stack frame size (for the entire function).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm. At least hypothetically this should in fact be "a small wrapper that gets inlined". I don't think either of us is familiar with the PlaceRef API tho'. But that shouldn't be a blocking concern.
090336f
to
18691ee
Compare
Forgot about |
let r = simd_select_bitmask(0b0101000000001100u16, a, b); | ||
assert_eq!(r.0, e); | ||
|
||
let r = simd_select_bitmask([0b1100u8, 0b01010000u8], a, b); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
won't this break on big-endian?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not exactly sure how LLVM handles the bitcast. We've actually seen the opposite issue (bit order, not byte order) on BE: https://github.com/rust-lang/portable-simd/blob/master/crates/core_simd/src/masks/full_masks.rs#L118-L125
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bitcasts in llvm are supposed to be semantically equivalent to a memcpy iirc
The ‘bitcast’ instruction converts value to type ty2. It is always a no-op cast because no bits change with this conversion. The conversion is done as if the value had been stored to memory and read back as type ty2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try running tests on power64 (not power64le), it'll likely break
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inlined from conversation, relevant snippets from @programmerjake:
iirc the issue is that architectures have both a bit-level endianness and a byte-level endianness, both of which can independently vary
install qemu-user and use that to run tests†
† for ppc64
I am not as sure that bit-endianness should be a problem, however, per the LLVM LangRef: Vector Type:
In general vector elements are laid out in memory in the same way as array types. Such an analogy works fine as long as the vector elements are byte sized. However, when the elements of the vector aren’t byte sized it gets a bit more complicated. One way to describe the layout is by describing what happens when a vector such as is bitcasted to an integer type with N*M bits, and then following the rules for storing such an integer to memory.
I think the greater problem might be that it is not clear the i1 to integer bitcast was ever well-formed.
That said, the LLVM LangRef on bitcasts also expounds on vectors:
There is a caveat for bitcasts involving vector types in relation to endianess. For example bitcast <2 x i8> to i16 puts element zero of the vector in the least significant bits of the i16 for little-endian while element zero ends up in the most significant bits for big-endian.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I found a comment in a test that completely confirms @programmerjake's remarks, actually:
// ignore-endian-big behavior of simd_select_bitmask is endian-specific
🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just copied this to the test. We can work out the endianness (and add tests if necessary) in another issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let r = simd_select_bitmask(0b0101000000001100u16, a, b); | ||
assert_eq!(r.0, e); | ||
|
||
let r = simd_select_bitmask([0b1100u8, 0b01010000u8], a, b); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inlined from conversation, relevant snippets from @programmerjake:
iirc the issue is that architectures have both a bit-level endianness and a byte-level endianness, both of which can independently vary
install qemu-user and use that to run tests†
† for ppc64
I am not as sure that bit-endianness should be a problem, however, per the LLVM LangRef: Vector Type:
In general vector elements are laid out in memory in the same way as array types. Such an analogy works fine as long as the vector elements are byte sized. However, when the elements of the vector aren’t byte sized it gets a bit more complicated. One way to describe the layout is by describing what happens when a vector such as is bitcasted to an integer type with N*M bits, and then following the rules for storing such an integer to memory.
I think the greater problem might be that it is not clear the i1 to integer bitcast was ever well-formed.
That said, the LLVM LangRef on bitcasts also expounds on vectors:
There is a caveat for bitcasts involving vector types in relation to endianess. For example bitcast <2 x i8> to i16 puts element zero of the vector in the least significant bits of the i16 for little-endian while element zero ends up in the most significant bits for big-endian.
odd. |
☔ The latest upstream changes (presumably #89629) made this pull request unmergeable. Please resolve the merge conflicts. |
triage: back to author for merge conflicts |
683fa8e
to
569c51d
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@rustbot label -S-waiting-on-author +S-waiting-on-review |
Hm.
somewhere in either the associated tests or the codegen infrastructure itself. |
I gave it a spin on |
📌 Commit fe8ae57 has been approved by |
…askrgr Rollup of 5 pull requests Successful merges: - rust-lang#88447 (Use computed visibility in rustdoc) - rust-lang#88868 (Allow simd_bitmask to return byte arrays) - rust-lang#90727 (Remove potential useless data for search index) - rust-lang#90742 (Use AddAssign impl) - rust-lang#90758 (Fix collections entry API documentation.) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
cc @rust-lang/project-portable-simd @workingjubilee