-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Layout of repr(C) unions has padding #156
Comments
I think you meant to ping @comex here. :-) (Not comes.) |
I do not understand why this is a consequence. Your summary (and thanks for the writeup!) says that "Rust and GCC pass the bottom 32-bits". So it is possible to implement this in a way that preserves all bits. Correct? Note: I am not concerned about bits getting preserved when Rust calls a C function and then C passes data back to Rust. At that point, C rules apply. But I am concerned about the case where Rust code calls an |
So that's a good question. Consider this: #[repr(C)] union U { x: (u8, u32 ) }
extern "C" {
fn foo() -> U;
} If the OTOH, if we were to somehow make Rust to always copy padding bytes for We can't tell whether the code at the other side is Rust or C, so if we want to be able to treat a I tend to think that an |
My reading of rust-lang/rust#60405 (comment) is that it is impossible to achieve "raw bit copy" for |
@comex mentions there that Rust->Rust could be extended to support this. |
I said that it would be possible for some ABIs but not all. |
@RalfJung If you have 8 bytes or more of padding you might not have where to put them. When only some bytes of an |
enjoy what you're doing guys and have a super great time while that. comes |
In MiniRust I am now handling this by making unions "chunked": a union is not simply a bag of bytes as large as its size, there are multiple chunks with that size with possible gaps between them. The data within the chunks is preserved verbatim, data between the chunks is lost as padding. Actually computing the chunks is left to the frontend. I hope we can make it so that |
Apologies for the poorly organized long post. I think @comex misread the RISC-V calling convention spec (which appears to have moved) slightly, but I don't think it really changes much:
So while I think it means there's no obligation to sign-extend when an integer followed by padding occupies an entire register, it is also extremely clear that padding is not preserved. The ARM ABI, on the other hand, appears to me to require aggregates to simply be copied, which might imply preservation of padding, but description of the way that arguments are obtained actually clarifies this (section 8.2):
So this means that the value is copied, meaning padding can take an unspecified value. Unlike what Apple claims, the caller does seem to be responsible for sign- or zero-extension in the latest platform ABI, at least for fundamental integral types. But specifically not for aggregate types. So I think the same logic applies and while the caller is not obligated to pass any particular padding values, it is not obligated to preserve them either. But if we look at the C standard, it's actually even worse, I think. See section 6.2.6.1
and section 6.2.7.1:
Thus we arrive at the classic "you can't use a union to cast" problem: the actual value space of a C union, in the most pedantic and hostile interpretation for the programmer, is a true sum type, and copying a union copies the value, not the object representation, which can clobber any bytes that are padding for that particular value. So the example of Combined with the particular language in the ARM ABI especially which clearly talks about passing the value, rather than the object representation, I think that a maximally defensive definition of (C++, interestingly, is much more clear about the implicit sum type nature of a union, but also requires padding to be preserved if I read it correctly.) tl;dr I'm not convinced that any ABI requires padding bytes to take specific values. But I think that, in C, which bytes of a union are padding are based on the last value properly stored into the union, which is not represented in memory at all. |
Actually, I am apparently wrong about C++, according to a note on this page:
|
I don't think those parts of the C standard are relevant for ABI discussions. ABIs are unavoidably defined on the assembly level. So I don't think that the actual ABI of a union is ever any worse than "copy this range of bytes but possibly leave some gaps where data gets lost". I have recently adjusted MiniRust to support such "chunked" unions, and I think this is good enough for the ABIs out there. Or do you know of a counterexample? |
This is true if that code is written in C. But I don't see how that has any consequence on Rust code calling Rust code via the C ABI -- and that is the case we are concerned about here.
Yeah I could believe that. Lucky enough I think it has no relevance for Rust. |
I think, in the end, I agree with everything you said here. |
I have tried to document everything I believe we have consensus on. I've left some things open that I possibly could have closed, but because this PR is very big, I would like to focus on getting it in as quickly as possible and worrying about whatever's left aftwards. I strongly encourage others to submit follow up PRs to close out the other open issues. Closes rust-lang#156. Closes rust-lang#298. Closes rust-lang#352.
I have tried to document everything I believe we have consensus on. I've left some things open that I possibly could have closed, but because this PR is very big, I would like to focus on getting it in as quickly as possible and worrying about whatever's left aftwards. I strongly encourage others to submit follow up PRs to close out the other open issues. Closes rust-lang#156. Closes rust-lang#298. Closes rust-lang#352.
This isn't intended to create controversy, but document where discussion has settled. Please feel free to open more PRs to clear up additional items. Closes rust-lang#156.
This isn't intended to create controversy, but document where discussion has settled. Please feel free to open more PRs to clear up additional items. Closes rust-lang#156.
It's not entirely clear if this question is just asking for documentation or for a resolution to the Rust level issue about what the semantics of |
In MiniRust this issue is resolved by saying that the value of a union consists of several "chunks" of raw memory, that will have As far as I know, this is sufficient to encode the behavior of these I think all that remains to be done here is somehow documenting this as our consensus solution to the problem. But of course this interacts with other questions around the value representation of unions, of which there are a lot. In particular, do we make union bytes "noundef" if they are "noundef" in each variant? It might be worth opening a new issue that specifically tracks these questions. |
Closing in favor of #438. |
In this thread and the comments that follow, the following new information was discovered.
repr(C)
unions have padding bits of the form:union_size - largest_field_size
trailing bits are padding bits.i
of all union fields is a padding bit, the biti
of the inion is a padding bitThe content of padding bits of
repr(C)
unions is always uninitialized. That is, they are not required to be preserved on copy / move / pass by value, etc. The implementation of the call ABI can exploit this knowledge.For example, Rust, clang, and GCC all implement the SysV64 ABI, and when passing a
#[repr(C)] union U { x: (u8, u32) }
around by value, @eddyb mentioned that Rust and GCC pass the bottom 32-bits (where theu8
is stored) while clang passes the bottom 8-bits. Both implementations are allowed. @comex also mentioned that in some ABIs like RISC-V ELF "appears to require callers to zero- or sign-extendarguments in registers in a particular way. In other words, it requires the upper bits (which correspond to padding bytes) to have a specific value, and the callee can assume that they do have that value". That would be incompatible with allowing users to use the padding bits.
That is,
repr(C)
unions are not and cannot be just "bags of bits" where one could write to any bit, and that bit value would need to be preserved on copy / move / pass-by-value.We should document this for
repr(C)
unions in the Unsafe Code Guidelines, so I'm re-opening this issue until that is resolved.The text was updated successfully, but these errors were encountered: