From 8c381202bfc577b7e95de8e33e9aa5cbbe2977c8 Mon Sep 17 00:00:00 2001 From: Alexis Hunt Date: Fri, 7 Oct 2022 22:33:22 -0400 Subject: [PATCH] Attempt to document the current state of union. I have tried to document everything I believe we have consensus on. I've left some things open that I possibly could have closed, but because this PR is very big, I would like to focus on getting it in as quickly as possible and worrying about whatever's left aftwards. I strongly encourage others to submit follow up PRs to close out the other open issues. Closes #156. Closes #298. Closes #352. --- active_discussion/unions.md | 28 +++- reference/src/glossary.md | 35 ++++- reference/src/layout/unions.md | 99 ++++++++++++-- reference/src/validity/unions.md | 227 ++++++++++++++++++++++++++++++- 4 files changed, 364 insertions(+), 25 deletions(-) diff --git a/active_discussion/unions.md b/active_discussion/unions.md index 0863526f..c89f4c77 100644 --- a/active_discussion/unions.md +++ b/active_discussion/unions.md @@ -1,3 +1,29 @@ # Unions -TBD +## Outstanding questions + +* Is `#[repr(Rust)]` the bag-o-bytes union repr, or do we want to propose a new repr? + * *Discussion:* [#73: Validity of unions][#73] +* The following questions are all implicitly answered if `#[repr(Rust)]` is the bag-o-bytes repr, but remain open if not: + * Do `#[repr(Rust)]` enums guarantee all fields at offset 0? + * *Discussion*: [#353: Offsets of union fields][#353] + * Do `#[repr(Rust)]` enums have internal padding? + * *Discussion*: [#354: Do #[repr(Rust)] enums have internal padding?][#354] +* Do `#[repr(transparent)]` enums ever have niches? + * *Discussion*: [#364: What is the value model/validity invariant for transparent unions?][#364] + +## Closed discussion issues: + +* [#13: Representation of unions][#13] +* [#156: Layout of repr(C) unions has padding][#156] +* [#298: Is `repr(transparent)` completely transparent within `repr(Rust)` types?][#298] +* [#352: What is the safety invariant, if any, for unions?][#352] + +[#13]: https://github.com/rust-lang/unsafe-code-guidelines/issues/13 +[#156]: https://github.com/rust-lang/unsafe-code-guidelines/issues/156 +[#298]: https://github.com/rust-lang/unsafe-code-guidelines/issues/298 +[#352]: https://github.com/rust-lang/unsafe-code-guidelines/issues/352 +[#353]: https://github.com/rust-lang/unsafe-code-guidelines/issues/353 +[#354]: https://github.com/rust-lang/unsafe-code-guidelines/issues/354 +[#364]: https://github.com/rust-lang/unsafe-code-guidelines/issues/364 +[#73]: https://github.com/rust-lang/unsafe-code-guidelines/issues/73 \ No newline at end of file diff --git a/reference/src/glossary.md b/reference/src/glossary.md index 2e70d7bf..9fdd43a7 100644 --- a/reference/src/glossary.md +++ b/reference/src/glossary.md @@ -190,8 +190,9 @@ guarantee that `Option<&mut T>` has the same size as `&mut T`. While all niches are invalid bit-patterns, not all invalid bit-patterns are niches. For example, the "all bits uninitialized" is an invalid bit-pattern for -`&mut T`, but this bit-pattern cannot be used by layout optimizations, and is not a -niche. +`&mut T`, but this bit-pattern cannot be used by layout optimizations, and is not a niche. + +It is a surprisingly common misconception that niches can occur in [padding] bytes. They cannot: A niche representation must be invalid for `T`. But a padding byte must be irrelevant to the value of `T`. It follows that if you take a niche representation of `T`, and change any of the padding bytes to any other values, then the result must still be a niche representation of `T`. If a niche were contained entirely in padding, that would mean that `T` was entirely niches and, consequently, uninhabited. #### Zero-sized type / ZST @@ -207,6 +208,8 @@ requirement of 2. *Padding* (of a type `T`) refers to the space that the compiler leaves between fields of a struct or enum variant to satisfy alignment requirements, and before/after variants of a union or enum to make all variants equally sized. +Padding for a type is either [interior padding], which is part of one or more fields, or [exterior padding], which is before, between, or after the fields. + Padding can be though of as `[Pad; N]` for some hypothetical type `Pad` (of size 1) with the following properties: * `Pad` is valid for any byte, i.e., it has the same validity invariant as `MaybeUninit`. * Copying `Pad` ignores the source byte, and writes *any* value to the target byte. Or, equivalently (in terms of Abstract Machine behavior), copying `Pad` marks the target byte as uninitialized. @@ -217,8 +220,26 @@ for all values `v` and lists of bytes `b` such that `v` and `b` are related at ` changing `b` at index `i` to any other byte yields a `b'` such `v` and `b'` are related (`Vrel_T(v, b')`). In other words, the byte at index `i` is entirely ignored by `Vrel_T` (the value relation for `T`), and two lists of bytes that only differ in padding bytes relate to the same value(s), if any. -This definition works fine for product types (structs, tuples, arrays, ...). -The desired notion of "padding byte" for enums and unions is still unclear. +This definition works fine for product types (structs, tuples, arrays, ...) and for unions. The desired notion of "padding byte" for enums is still unclear. + +#### Padding (exterior) +[exterior padding]: #exterior-padding + +Exterior padding bytes are [padding] bytes that are not part of one or more fields. They are exactly the padding bytes that are not [interior padding], and therefore must be before, between, or after the fields of the type. Padding that comes after all fields is called [tail padding]. + +#### Padding (interior) +[interior padding]: #interior-padding + +Interior padding bytes are [padding] bytes that are part of one or more fields of a type. + +We can say that a field `f: F` *contains* the byte at index `i` in the type `T` if the layout of `T` places `f` at offset `j` and we have `j <= i < j + size_of::()`. Then a padding byte is interior padding if and only if there exists a field `f` that contains it. + +It follows that, provided `T` is not an enum, for any such `f`, the byte at index `i - j` in `F` is a padding byte of `F`. This is because all values of `f` give rise to distinct values of `T`. + +#### Padding (tail) +[tail padding]: #tail-padding + +Tail padding is [exterior padding] that comes after all fields of a type. #### Place @@ -254,8 +275,8 @@ The relation should be functional for a fixed list of bytes (i.e., every list of It is partial in both directions: not all values have a representation (e.g. the mathematical integer `300` has no representation at type `u8`), and not all lists of bytes correspond to a value of a specific type (e.g. lists of the wrong size correspond to no value, and the list consisting of the single byte `0x10` corresponds to no value of type `bool`). For a fixed value, there can be many representations (e.g., when considering type `#[repr(C)] Pair(u8, u16)`, the second byte is a [padding byte][padding] so changing it does not affect the value represented by a list of bytes). -See the [value domain][value-domain] for an example how values and representation relations can be made more precise. +See the [MiniRust page on values][minirust-values] for an example how values and representation relations can be made more precise. [stacked-borrows]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md -[value-domain]: https://github.com/rust-lang/unsafe-code-guidelines/tree/master/wip/value-domain.md -[place-value-expr]: https://doc.rust-lang.org/reference/expressions.html#place-expressions-and-value-expressions +[minirust-values]: https://github.com/RalfJung/minirust/blob/master/lang/values.md +[place-value-expr]: https://doc.rust-lang.org/reference/expressions.html#place-expressions-and-value-expressions \ No newline at end of file diff --git a/reference/src/layout/unions.md b/reference/src/layout/unions.md index b9f018b4..57ec7627 100644 --- a/reference/src/layout/unions.md +++ b/reference/src/layout/unions.md @@ -1,10 +1,10 @@ # Layout of unions -**Disclaimer:** This chapter represents the consensus from issue -[#13]. The statements in here are not (yet) "guaranteed" -not to change until an RFC ratifies them. +**Disclaimer**: This chapter is a work-in-progress. What's contained here +represents the consensus from [various issues][union discussion]. The statements in here are not (yet) +"guaranteed" not to change until an RFC ratifies them. -[#13]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/13 +[union discussion]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/active_discussion/unions.md ### Layout of individual union fields @@ -29,8 +29,17 @@ largest field, and the offset of each union field within its variant. How these are picked depends on certain constraints like, for example, the alignment requirements of the fields, the `#[repr]` attribute of the `union`, etc. -[padding]: ../glossary.md#padding -[layout]: ../glossary.md#layout +Unions may contain both [exterior][exterior padding] and [interior padding]. In the below diagram, exterior padding is marked by `EXT`, interior padding by `INT`, and bytes that are padding bytes for a particular field but not padding for union as a whole are marked `NON`: + +```text +[ EXT [ field0_0_ty | INT | field0_1_ty | INT ] EXT ] +[ EXT [ field1_0_ty | INT | NON NON NON | INT ] EXT ] +[ EXT | NON NON NON | INT [ field2_0_ty ] INT | EXT ] +``` + +It is necessarily the case that any byte that is a non-padding byte for any field is also a non-padding byte for the union. It is, in general, **unspecified** whether the converse is true. Specific reprs may specify whether or not bytes are padding bytes. + +Padding bytes in unions has subtle implications; see the union [value model]. ### Unions with default layout ("`repr(Rust)`") @@ -40,6 +49,10 @@ layout of Rust unions is, _in general_, **unspecified**. That is, there are no _general_ guarantees about the offset of the fields, whether all fields have the same offset, what the call ABI of the union is, etc. +**Major footgun:** The layout of `#[repr(Rust)]` enums allows for the [interior padding footgun] to also exist with `#[repr(Rust)]`, and this behaviour *is* extant in Rustc as of this writing. It is [**TBD**][#354] whether it will be removed. + +[interior padding footgun]: #interior-padding-footgun +
Rationale As of this writing, we want to keep the option of using non-zero offsets open @@ -107,11 +120,11 @@ the layout of `U1` is **unspecified** because: * `Zst2` is not a [1-ZST], and * `SomeOtherStruct` has an unspecified layout and could contain padding bytes. -### C-compatible layout ("repr C") +### C-compatible layout (`#[repr(C)]`) The layout of `repr(C)` unions follows the C layout scheme. Per sections [6.5.8.5] and [6.7.2.1.16] of the C11 specification, this means that the offset -of every field is 0. Unsafe code can cast a pointer to the union to a field type +of every field is 0, and the alignment of the type is the widest alignment of its fields. Unsafe code can cast a pointer to the union to a field type to obtain a pointer to any field, and vice versa. [6.5.8.5]: http://port70.net/~nsz/c/c11/n1570.html#6.5.8p5 @@ -119,11 +132,11 @@ to obtain a pointer to any field, and vice versa. #### Padding -Since all fields are at offset 0, `repr(C)` unions do not have padding before +Since all fields are at offset 0, `repr(C)` unions do not have [padding] before their fields. They can, however, have padding in each union variant *after* the field, to make all variants have the same size. -Moreover, the entire union can have trailing padding, to make sure the size is a +Moreover, the entire union can have tail padding, to make sure the size is a multiple of the alignment: ```rust @@ -138,9 +151,25 @@ assert_eq!(size_of::(), 2); # } ``` -> **Note**: Fields are overlapped instead of laid out sequentially, so -> unlike structs there is no "between the fields" that could be filled -> with padding. +#### Interior Padding Footgun + +**Major footgun:** On some platform ABIs, such as the obscure ARM64, C unions may also have [interior padding] *within* fields, where a byte is padding in every variant: + +```rust +#[repr(C)] +union U { + x: (u8, u16), // [u8, 1*pad, u16] + y: (u8, u8), // [u8, 1*pad, u8, 1*pad] +} +let u = unsafe { mem::zeroed::() }; // resulting bytes: [0, uninit (!!), 0, 0] +let buf: &[u8] = unsafe { slice::from_raw_parts(transmute(&u), 4) }; // UB! +``` + +This is, surprisingly, undefined behaviour, because it appears that the union is fully initialized and therefore ought to be castable to a slice. + +However, because byte 1 is a padding byte in both variants, it can be a padding byte in the union type as well. Fortunately, this counterintuitive behaviour is limited to obscure platforms like amd64. + +**C/C++ compatibility hazard:** This footgun exists for compatibility with the C/C++ platform ABI, and it is not well-known in C/C++ communities. So whenever dealing with a union that might have internal padding, you should assume that C/C++ code may be handing you a loaded footguns. #### Zero-sized fields @@ -172,4 +201,48 @@ translation of that code into Rust will not produce a compatible result. Refer to the [struct chapter](structs-and-tuples.md#c-compatible-layout-repr-c) for further details. +
Rationale + +Look. It wasn't our idea. + +We could try to limit the blast radius to `extern "C"` functions, but really, that's just sawing off the end of the footgun. + +
+ +### Transparent layout (`#[repr(transparent)]`) + +`#[repr(transparent)]` is currently unstable for unions, but [RFC 2645] documents most of its semantics. Notably, it causes unions to be passed using the same ABI as the non-1-ZST field. + +**Major footgun:** Matching the interior ABI means that all padding bytes of the non-1-ZST field will also be padding bytes of the union, so the [interior mutability footgun] exists with `#[repr(transparent)]` unions. + +**Note:** If `U` is a transparent union wrapping a `T`, `U` may not inherit `T`'s niches, and therefore `Option` and `Option`, for instance, will not necessarily have the same layout or even the same size. + +This is because, if `U` contains any zero-sized fields in addition to the `T` field, the [value model] forces `U` to support uninitialized bytes, and that in turn prevents `T`'s niches from being present in `U`. Currently, `U` also supports uninitialized bytes if it does not contain any additional fields, but it is [**TBD**][#364] if single-field transparent unions might support niches. + +[RFC 2645]: https://github.com/rust-lang/rfcs/blob/master/text/2645-transparent-unions.md + +### Bag-o-bytes layout (Repr-raw) + +There are applications where it is desirable that unions behave simply as a buffer of abstract bytes, with no constraints on validity and no interior padding bytes that can [get surprisingly reset to uninit][interior mutability footgun]. + +Thus, we propose that Rust support a repr, which we are tentatively calling the Raw-repr, which gives these semantics to unions. The Raw-repr may be `#[repr(Rust)]` or it may be a new repr, say `#[repr(Raw)`]. The Raw-repr will have the following properties: + +* All fields are laid out at offset 0. +* The alignment of the union is the greatest alignment among fields. +* The only padding bytes are tail padding bytes, if any. + +
Rationale + +We need at least one repr without the [interior mutability footgun]. This layout is extremely constrained, so it would generally be against the philosophy of `#[repr(Rust)]` to impose these constraints on the default layout instead of introducing a new one. However, without such constraints, `#[repr(Rust)]` is a just a giant, largely useless footgun, which is a rationale to simply constrain it and leave any potential relaxations, e.g. for safe transmutes and niches, to other reprs. + +
+ +[#354]: https://github.com/rust-lang/unsafe-code-guidelines/issues/354 +[#364]: https://github.com/rust-lang/unsafe-code-guidelines/issues/364 [1-ZST]: ../glossary.md#zero-sized-type--zst +[exterior padding]: ../glossary.md#exterior-padding +[interior padding]: ../glossary.md#interior-padding +[layout]: ../glossary.md#layout +[padding]: ../glossary.md#padding +[union values]: ../validity/unions.md#values +[value model]: ../glossary.md#value-model \ No newline at end of file diff --git a/reference/src/validity/unions.md b/reference/src/validity/unions.md index 86c95478..301391de 100644 --- a/reference/src/validity/unions.md +++ b/reference/src/validity/unions.md @@ -1,13 +1,232 @@ # Validity of unions **Disclaimer**: This chapter is a work-in-progress. What's contained here -represents the consensus from issue [#73]. The statements in here are not (yet) +represents the consensus from [various issues][union discussion]. The statements in here are not (yet) "guaranteed" not to change until an RFC ratifies them. -## Validity of unions with zero-sized fields +[union discussion]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/active_discussion/unions.md -A union containing a zero-sized field can contain any bit pattern. An example of such -an union is [`MaybeUninit`]. +**Note**: For ease of reading the examples, the hypothetical type `Padded` is used, which behaves identically to `T` except that writing to it clobbers any subsequent padding. This could actually be accomplished using an overaligned newtype struct, but that would make the examples more verbose for no gain in clarity. Additionally, layout of union fields is put in some comments; in these comments, the notation `n*pad` means "`n` padding bytes". +## Value model + +The possible values of unions are not defined in terms of the values of the fields, but rather, a union's possible values are lists of bytes. The [representation relation] is trivial in both directions, except for [padding bytes][padding byte] which are uninitialized in all values. + +
Rationale + +The following examples must be supported, and therefore impose constraints on union value behaviour. The simplest solution, by far, is to treat the value representation of unions as merely being lists of bytes. While we do not discuss every possible angle here, it should be rapidly clear from just these two examples that any other example is significantly more complicated. Please trust us that every other alternatives that sounds even half-reasonable has been examined and has some critical flaw or another. + +
+ +**We cannot require at least all fields, or even only one field, to be valid at all times.** + +```rust +union Padding { + left: (Padded, u16), // [[u8, 1*pad], u16] + right: (u16, Padded), // [u16, [u8, 1*pad]] +} +let p = Padding{left: (0, 0)}; // resulting bytes: [0, uninit, 0, 0] +p.right.1 = 1; // resulting bytes: [0, uninit, 1, uninit] +fn f(_: Padding) {} +f(p); +``` + +By the end of this example, the resulting union has no valid fields, because every field contains uninit, non-padding bytes. So therefore no field is even fully initialized, despite the fact that `p` was fully initialized as can be witnessed by the fact that the compiler allows it to be moved into `f`. This is all stable, Safe Rust, so these semantics being sound are a hard constraint. + +**Unions must preserve provenance.** + +```rust +union Provenance<'a> { + raw: *const u32, + reference: &'a u32, +} +let x: u32; +let u = Provenance(raw: &x); +let y = unsafe{ *u.reference }; +``` + +We must be able to carry provenance between the `raw` and `reference` fields in order for the assignment via `u.reference` to be valid. While this uses Unsafe Rust, this code is "obviously" sound. And therefore the union must be able to maintain provenance between the two pointers---even should the pointers be nested deeply within structs. + +[representation relation]: ../glossary.md#representation-relation +[padding byte]: ../glossary.md#padding-byte + +### Niches + +## Valid values for #[repr(C)] and Raw-repr unions + +`#[repr(C)]` and [Raw-repr][raw repr] unions can take on any byte value. + +
Rationale + +The purpose of the Raw-repr is to provide these semantics, which are easy to raeson about. Furthermore, C programmers are used to being able to treat unions like bags of bytes, more or less, and Rust programmers are similarly used to the same with `#[repr(C)]` unions. Therefore, they should both accept any arbitrary byte pattern. + +
+ +## Possible niche values + +The presence of padding bytes, and writes to individual fields in general, makes niches hard to come by in unions. A niche representation of a union would have to not only be invalid for every single one of its fields, but also impossible to construct in Safe Rust with any combination of writes to any of its fields. + +For reprs other than `#[repr(C)]` and the [Raw-repr], values not constructible from safe Rust are consequently [**TBD**][#73] whether or not they are valid. The following example assumes that `#[repr(Rust)]` is not the Raw-repr: + +```rust +#[repr(Rust)] +union U { + a: (u16, u16), + b: u32, +} +MaybeUninit::uninit().assume_init(); // Unsound: assumes that U can be uninit +fn get_b(u: U) -> u32 { + unsafe { u.b } // Unsound: assumes that U cannot be uninit +} +let u: U; +get_b(u); // Compile error: u is not initialized. +``` + +Because all bytes of `U` must be initialized for the value to be valid, and this is enforced by the compiler's initialization checks, it might be tempting to assume that `U`'s bytes must always be defined, but this is not a valid assumption. It is equally invalid, however, to assume that `U`'s bytes can be undefined. + +
Rationale + +We have not yet reached consensus on whether or not we wish to leave the door open for the possibility that unions with safe field access, or `#[repr(transparent)]` unions with no ZSTs, contain niches: + +```rust +#[repr(transparent)] +union U { b: bool }; +assert_eq!(size_of::>(), 1); // Requires a niche, which in turn requires that U must be initialized to be valid. +``` + +We are **not** describing this case as unspecified, but instead as TBD."Constructible with Safe Rust" is a poorly-defined and very complex invariant, which falls short of the UCG's goals of easily checked, easy to understand (such as it were) semantics, and therefore we are not comfortable leaving the language in this state on an indefinite basis. + +The main saving grace here is that `#[repr(Rust)]` unions are presently nearly impossible to use correctly anyways, because they do not even guarantee fields at offset 0. + +
+ +## Validity of sometimes-padding bytes + +We can say that a byte is *sometimes padding* for a union `U` if there is *some* inhabited field `f` such that the byte is either padding for `f` or not a part of `f`. + +In that case, the byte will be uninitialized in the value `U{f: /* some value */ }`. By the [monotonicity property], therefore, all sometimes-padding bytes can contain any byte value, be it undefined or any bit pattern with any provenance. Likewise, if multiple bytes are padding for the same field, then they can take on any possible combination of byte values between them. + +It follows that a union containing an inhabited zero-sized field can contain any bit pattern whatsoever, because all bytes are sometimes-padding bytes. An example of such +an union is [`MaybeUninit`], which is a union of `T` and `()`. + +As per the previous section, however, just because a byte is a sometimes-padding byte does not mean it can always safely be set to uninitialized (or any other value), if this can produce a value not reachable from Safe Rust. + +For instance, the following is presently unsound (assuming that `#[repr(Rust)]` is not the [Raw-repr]), even assuming that all fields are placed at offset 0: + +```rust +struct B { + +} +#[repr(Safe)] +Union u { + b: bool, // [bool, 1*pad] + u: u16, // [u16] +} +let u = U{u: (0xff00, 0}; // resulting bytes: [0xff, 0] +unsafe { (&mut u.b as *mut u8 as *mut MaybeUninit).offset(1).write(MaybeUninit::uninit()) }; // resulting bytes: [0xff, uninit] +``` + +This value is impossible to reach in Safe Rust: the only way to write uninit to the padding is to write to the boolean field. Writing to the integer field must initialize + +## Safety invariants of unions + +Unions currently provide *no* safety invariants of any kind. Without a documented safety invariant for a union type, code cannot make any assumptions about a union passed in from untrusted code, other than that it has a valid value, and it cannot pass a union value to untrusted code unless it could do so in purely Safe Rust. + +In particular, regardless of the union's repr, it is not safe to assume that a union's field can be safely accessed, even if it seems "obviously" safe. + +```rust +// Crate a +pub union U { + pub i: i32 +} +// Crate b +pub fn get_i(u: a::U) -> i32 { + // Safe: u.i cannot be uninit in Safe Rust. + unsafe { u.i } // UNSOUND! +} +``` + +Making this field access safe would require additional an safety invariant that can be understood by the compiler. The UCG WG does not oppose such a safety invariant, but believes it should be opt-in, and an RFC for such a feature is beyond our remit. + +
Rationale + +At first blush, it may appear that the crate `b` is entitled to assume that it is being called from Safe Rust, or from unsafe Rust following the rules of Safe Rust. It then seems to follow that `u.i` must always be initialized, since the only safe way to create a value of type `U` is to initialize it with a value for `i`. + +One might analogize this to the corresponding code with a struct: + +```rust +// Crate a +pub struct S { + pub i: i32 +} +// Crate b +pub fn get_i(s: a::S) -> i32 { + // Safe: s.i cannot be uninit in Safe Rust. + unsafe { s.i } // Sound. +} +``` + +This struct code, however, is absolutely sound, even in the absence of a safety invariant documented by `S`, because of `S`'s validity invariant: for `S` to be valid, all its fields must be valid, and therefore `i` must be initialized. If it weren't, the definition of `get_i` wouldn't be the problem: the caller would be committing UB by passing an uninitialized `S`. Consequently, the `unsafe` block is redundant. + +But for the union type `U`, its validity invariant is not transitive to its fields. `u.i` has no guarantee of validity for `U` to be valid. + +Okay, so what about a field with a safety invariant that is stricter than the validity invariant? + +```rust +// Crate a +pub struct S<'a> { + pub s: &'a str +} +// Crate b +pub fn get_s(s: a::S<'_>) -> String { + // Safe: s.s must be UTF-8 in Safe Rust. + unsafe { String::from_utf8_unchecked(s.s.as_bytes().to_owned()) } // Sound. +} +``` + +Now we are relying on a safety invariant separate from the validity invariant: that `str` must be UTF-8. So isn't this like our union example, where we're relying on the safety invariant that `i32` can't be uninit? No, because union fields are unsafe. + +Consider the following three types: + +```rust +static invalid_utf8: [u8; 1] = [0xff]; +pub struct Sound<'a> { + s: &'a str +} +pub struct Unsound<'a> { + pub s: &'a str +} +pub union AlsoSound<'a> { + pub s: &'a str +} +impl<'a> Sound<'a> { + pub fn new() -> Self { + Self { s: str::from_utf8_unchecked(&invalid_utf8) } + } +} +impl<'a> Unsound<'a> { + pub fn new() -> Self { + Self { s: str::from_utf8_unchecked(&invalid_utf8) } + } +} +impl<'a> AlsoSound<'a> { + pub fn new() -> Self { + Self { s: str::from_utf8_unchecked(&invalid_utf8) } + } +} +``` + +One struct type is `Sound`, the other is `Unsound`, and the only difference between the two is that `Unsound`'s field is `pub`. This lets us get to the heart of how safety invariants work for fields: untrusted Safe Rust cannot be allowed to get its hands on a `str` with invalid UTF-8. If it could do that, it could pass it off to arbitrary unsafe Rust that *does* assume that the `str` has UTF-8, such as the `str::chars()` method. And that will cause UB. + +Thus, every struct implicitly has a safety invariant that all of its `pub` fields are safe. + +The union `AlsoSound`, is identical to `Unsound` except for being a union, but it is sound. And the reason it doesn't break the rules is that union field access is unsafe. Safe Rust can call `Unsound::new().s.chars()`, but neither `Sound::new().s.chars()` nor `AlsoSound::new().s.chars()`. + +It follows that unions have no safety invariants on their fields, even `pub` fields, except for those that are explicitly documented. + +
+ +[Raw-repr]: ../layout/unions.md#raw-repr [#73]: https://github.com/rust-lang/unsafe-code-guidelines/issues/73 [`MaybeUninit`]: https://doc.rust-lang.org/std/mem/union.MaybeUninit.html +[monotonicity property]: https://github.com/RalfJung/minirust/blob/master/lang/values.md#generic-properties