Untagged unions (tracking issue for RFC 1444) #32836

nikomatsakis · 2016-04-08T20:42:25Z

Unresolved questions:

Does assigning directly to a union field trigger a drop of the previous contents?
When moving out of one field of a union, are the others considered invalidated? (1, 2, 3, 4)
- Answered by Union initialization and Drop rfcs#2514.
Under what conditions can you implement Copy for a union? For example, what if some variants are of non-Copy type? All variants?
What interaction is there between unions and enum layout optimizations? (Unions interacting with Enum layout optimization #36394)

Open issues of high import:

Matching on uninhabited unsafe places (union fields, raw pointer dereferences, etc.) allowed in safe code. #47412 -- MIR-based unsafety checker sometimes accepts unsafe accesses to union fields in presence of uninhabited fields

The text was updated successfully, but these errors were encountered:

sfackler · 2016-04-08T22:27:48Z

I may have missed it in the discussion on the RFC, but am I correct in thinking that destructors of union variants are never run? Would the destructor for the Box::new(1) run in this example?

union Foo {
    f: i32,
    g: Box<i32>,
}

let mut f = Foo { g: Box::new(1) };
f.g = Box::new(2);

solson · 2016-04-08T22:30:28Z

@sfackler My current understanding is that f.g = Box::new(2) will run the destructor but f = Foo { g: Box::new(2) } would not. That is, assigning to a Box<i32> lvalue will cause a drop like always, but assigning to a Foo lvalue will not.

sfackler · 2016-04-08T22:32:47Z

So an assignment to a variant is like an assertion that the field was previously "valid"?

solson · 2016-04-08T22:35:44Z

@sfackler For Drop types, yeah, that's my understanding. If they weren't previously valid you need to use the Foo constructor form or ptr::write. From a quick grep, it doesn't seem like the RFC is explicit about this detail, though. I see it as an instantiation of the general rule that writing to a Drop lvalue causes a destructor call.

ohAitch · 2016-04-08T22:38:40Z

Should a &mut union with Drop variants be a lint?

On Friday, 8 April 2016, Scott Olson [email protected] wrote:

@sfackler https://github.com/sfackler For Drop types, yeah, that's my
understanding. If they weren't previously valid you need to use the Foo
constructor form or ptr::write. From a quick grep, it doesn't seem like
the RFC is explicit about this detail, though.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#32836 (comment)

joshtriplett · 2016-04-08T23:51:27Z

On April 8, 2016 3:36:22 PM PDT, Scott Olson [email protected] wrote:

@sfackler For Drop types, yeah, that's my understanding. If they
weren't previously valid you need to use the Foo constructor form or
ptr::write. From a quick grep, it doesn't seem like the RFC is
explicit about this detail, though.

I should have covered that case explicitly. I think both behaviors are defensible, but I think it'd be far less surprising to never implicitly drop a field. The RFC already recommends a lint for union fields with types that implement Drop. I don't think assigning to a field implies that field was previously valid.

sfackler · 2016-04-08T23:52:37Z

Yeah, that approach seems a bit less dangerous to me as well.

solson · 2016-04-08T23:56:34Z

Not dropping when assigning to a union field would make f.g = Box::new(2) act differently from let p = &mut f.g; *p = Box::new(2), because you can't make the latter case not drop. I think my approach is less surprising.

It's not a new problem, either; unsafe programmers already have to deal with other situations where foo = bar is UB if foo is uninitialized and Drop.

joshtriplett · 2016-04-08T23:59:37Z

I personally don't plan to use Drop types with unions at all. So I'll defer entirely to people who have worked with analogous unsafe code on the semantics of doing so.

retep998 · 2016-04-09T00:01:42Z

I also don't intend to use Drop types in unions so either way doesn't matter to me as long as it is consistent.

ohAitch · 2016-04-09T00:05:22Z

I don't intend to use mutable references to unions, and probably
just "weirdly-tagged" ones with Into

On Friday, 8 April 2016, Peter Atashian [email protected] wrote:

I also don't intend to use Drop types in unions so either way doesn't
matter to me as long as it is consistent.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#32836 (comment)

nikomatsakis · 2016-04-12T19:56:59Z

Seems like this is a good issue to raise up as an unresolved question. I'm not sure yet which approach I prefer.

joshtriplett · 2016-04-12T20:38:54Z

@nikomatsakis As much as I find it awkward for assigning to a union field of a type with Drop to require previous validity of that field, the reference case @tsion mentioned seems almost unavoidable. I think this might just be a gotcha associated with code that intentionally disables the lint for putting a type with Drop in a union. (And a short explanation of it should be in the explanatory text for that lint.)

solson · 2016-04-12T21:02:40Z

And I'd like to reiterate that unsafe programmers must already generally know that a = b means drop_in_place(&mut a); ptr::write(&mut a, b) to write safe code. Not dropping union fields would be one more exception to learn, not one less.

(NB: the drop doesn't happen when a is statically known to already be uninitialized, like let a; a = b;.)

But I support having a default warning against Drop variants in unions that people have to #[allow(..)] since this is a fairly non-obvious detail.

nikomatsakis · 2016-04-12T23:02:45Z

@tsion this is not true for a = b and maybe only sometimes true for a.x = b but it is certainly true for *a = b. This uncertainty is what made me hesitant about it. For example, this compiles:

fn main() {
  let mut x: (i32, i32);
  x.0 = 2;
  x.1 = 3;
}

(though trying to print x later fails, but I consider that a bug)

solson · 2016-04-12T23:09:58Z

@nikomatsakis That example is new to me. I guess I would have considered it a bug that that example compiles, given my previous experience.

But I'm not sure I see the relevance of that example. Why is what I said not true for a = b and only sometimes for a.x = b?

Say, if x.0 had a type with a destructor, surely that destructor is called:

fn main() {
    let mut x: (Box<i32>, i32);
    x.0 = Box::new(2); // x.0 statically know to be uninit, destructor not called
    x.0 = Box::new(3); // x.0 destructor is called before writing new value
}

arielb1 · 2016-04-14T11:11:01Z

Maybe just lint against that kind of write?

nikomatsakis · 2016-04-16T15:02:46Z

My point is only that = does not always run the destructor; it
uses some knowledge about whether the target is known to be
initialized.

On Tue, Apr 12, 2016 at 04:10:39PM -0700, Scott Olson wrote:

@nikomatsakis That example new to me. I guess I would have considered it a bug that that example compiles, given my previous experience.

But I'm not sure I see the relevance of that example. Why is what I said not true for a = b and only sometimes for 'a.x = b'?

Say, if x.0 had a type with a destructor, surely that destructor is called:
fn main() {
    let mut x: (Box<i32>, i32);
    x.0 = Box::new(2); // x.0 statically know to be uninit, destructor not called
    x.0 = Box::new(3); // x.0 destructor is called
}

arielb1 · 2016-04-16T15:17:45Z

@nikomatsakis

It runs the destructor if the drop flag is set.

But I think that kind of write is confusing anyway, so why not just forbid it? You can always do *(&mut u.var) = val.

solson · 2016-04-16T23:16:58Z

My point is only that = does not always run the destructor; it uses some knowledge about whether the target is known to be initialized.

@nikomatsakis I already mentioned that:

(NB: the drop doesn't happen when a is statically known to already be uninitialized, like let a; a = b;.)

But I didn't account for dynamic checking of drop flags, so this is definitely more complicated than I considered.

arielb1 · 2016-04-17T18:40:24Z

@tsion

Drop flags are only semi-dynamic - after zeroing drop is gone, they are a part of codegen. I say we forbid that kind of write because it does more confusion than good.

ghost · 2016-04-27T16:35:16Z

Should Drop types even be allowed in unions? If I'm understanding things correctly, the main reason to have unions in Rust is to interface with C code that has unions, and C doesn't even have destructors. For all other purposes, it seems that it's better to just use an enum in Rust code.

Amanieu · 2016-04-27T16:36:58Z

There is a valid use case for using a union to implement a NoDrop type which inhibits drop.

joshtriplett · 2016-04-27T16:56:22Z

As well as invoking such code manually via drop_in_place or similar.

RumataEstor · 2016-06-21T14:29:14Z

To me dropping a field value while writing to it is definitely wrong because the previous option type is undefined.

Would it be possible to prohibit field setters but require full union replacement? In this case if the union implements Drop full union drop would be called for the value replaced as expected.

joshtriplett · 2016-06-22T01:49:08Z

I don't think it makes sense to prohibit field setters; most uses of unions should have no problem using those, and fields without a Drop implementation will likely remain the common case. Unions with fields that implement Drop will produce a warning by default, making it even less likely to hit this case accidentally.

petrochenkov · 2018-07-29T01:30:43Z

@joshtriplett

primary use cases of unions

It's not obvious to me at all why this is the primary use case.
It may be true for repr(C) unions if you assume that all uses of unions for tagged unions / "Rust enum emulation" in FFI assume extensibility (which is not true), but from what I've seen, uses of repr(Rust) unions (drop control, intialization control, transmutes) do not expect "unexpected variants" suddenly appearing in them.

joshtriplett · 2018-07-29T02:12:28Z

@petrochenkov I didn't say "break the primary use case", I said "break primary use cases". FFI is one of the primary use cases of unions.

scottmcm · 2018-07-30T05:41:50Z

and take the union (heh ;) ) of all of those sets

There's certainly an attractive obviousness to a statement that "the possible values of a union are the union of the possible values of all its possible variants"...

RalfJung · 2018-07-30T07:02:12Z

True. However, that's not the proposal -- we all agree that the following should be legal:

union F {
  x: (u8, bool),
  y: (bool, u8),
}
fn foo() -> F {
  let mut f = F { x: (5, false) };
  unsafe { f.y.1 = 17; }
  f
}

Actually I think it is a bug that this even requires unsafe.

So, the union has to be taken bytewise, at least.
Also, I don't think "attractive obviousness" on its own is a sufficiently good reason. Any invariant we decide on is a significant burden for unsafe code authors, we should have concrete advantages that we get in turn.

petrochenkov · 2018-07-30T10:00:08Z

@RalfJung

Actually I think it is a bug that this even requires unsafe.

I don't know about the new MIR-based unsafety-checker implementation, but in the old HIR-based one it was certainly a checker limitation/simplification - only expressions of the form expr1.field = expr2 were analyzed for possible "field assignment" unsafety opt-out, everything else was conservatively treated as generic "field access" that's unsafe for unions.

petrochenkov · 2018-08-05T20:08:17Z

Answering the comment in #52786 (comment):

So the idea is that compiler still doesn't know anything about the Wrap<T>'s contract and can't e.g. do layout optimizations. Ok, this position is understood.
This means that internally, inside of Wrap's module, implementation of Wrap<T> module can, for example, temporarily write "unexpected values" into it, if it doesn't leak them to users, and compiler will be okay with them.

I'm not sure though how exactly the part of Wraps contract about absence of unexpected values is related to field privacy.

First of all, regardless of fields being private or public, unexpected values cannot be written directly through those fields. You need something like a raw pointer, or code on the other side of FFI to do it, and it can be done without any field access, just by having a pointer to the whole union. So we need to approach this from some other direction than access to a field being restricted.

As I interpret you comment, the approach is to say that a private field (in union or a struct, doesn't matter) implies an arbitrary invariant unknown to user, so any operations changing that field (directly or through wild pointers, doesn't matter) result in UB because they can potentially break that unspecified invariant.

This means that if a union has a single private field, then its implementer (but not compiler) can assume that no third party will write an unexpected value into that union.
That's a "default union documentation clause" for the user in some sense:
- (Default) If a union has a private field you can't write garbage into it.
- Otherwise, you can write garbage into a union unless its docs explicitly prohibit it.

If some union wants to prohibit unexpected values while still providing pub access to its expected fields (e.g. when those fields have no their own invariants), then it still can do it through documentation, that's why the "unless" in the second clause is necessary.

@RalfJung
Does this describe you position accurately?

How scenarios like this are treated?

mod m {
    union MyPrivateUnion { /* private fields */ }
    extern {
        fn my_private_ffi_function() -> MyPrivateUnion; // Can return garbage (?)
    }
}

RalfJung · 2018-08-06T14:30:36Z

As I interpret you comment, the approach is to say that a private field (in union or a struct, doesn't matter) implies an arbitrary invariant unknown to user, so any operations changing that field (directly or through wild pointers, doesn't matter) result in UB because they can potentially break that unspecified invariant.

No, that is not what I meant.

There are multiple invariants. I do not know how many we will need, but there will be at least two (and I don't have great names for them):

The "Layout-level invariant" (or "syntactic invariant") of a type is completely defined by the syntactic shape of the type. These are things like "&mut T is non-NULL and aligned", "bool is 0 or 1", "! cannot exist". On this level, *mut T is the same as usize -- both allow any value (or maybe any initialized value, but that distinction is for another discussion). We are, eventually, going to have a document spelling out these invariants for all types, by structural recursion: The layout-level invariant of a struct is that all its fields have their invariant maintained, etc. Visibility does not play a role here.

Violating the layout-level invariant is instantaneous UB. This is a statement we can make because we have defined this invariant in very simple terms, and we make it part of the definition of the language itself. We can then exploit this UB (and we already do), e.g. to perform enum layout optimizations.
The "Custom type-level invariant" (or "semantic invariant") of a type is picked by whoever implements the type. The compiler cannot know this invariant as we do not have a language to express it, and the same goes for the language definition. We cannot make violating this invariant UB, as we cannot even say what that invariant is! The fact that it is even possible to have custom invariants is a feature of any useful type system: Abstraction. I wrote more about this in a past blog post.

The connection between the custom, semantic invariant and UB is that we declare that unsafe code may rely on its semantic invariants being preserved by foreign code. That makes it incorrect to just go ahead any put random stuff into a Vec's size field. Note that I said incorrect (I sometimes use the term unsound) -- but not undefined behavior! Another example to demonstrate this difference (really, the same example) is the discussion about aliasing rules for &mut ZST. Creating a dangling well-aligned non-null &mut ZST is never immediate UB, but it is still incorrect/unsound because one may write unsafe code which relies on this not to happen.

It would be nice to align these two concepts, but I do not think it is practical. First of all, for some types (function pointers, dyn traits), the definition of the custom, semantic invariant actually uses the definition of UB in the language. This definition would be circular if we wanted to say that it is UB to ever violate the custom, semantic invariant. Secondly, I'd prefer if the definition of our language, and whether a certain execution trace exhibits UB, was a decidable property. Semantic, custom invariants are frequently not decidable.

I'm not sure though how exactly the part of Wraps contract about absence of unexpected values is related to field privacy.

Essentially, when a type chooses its custom invariant, it has to make sure that anything that safe code can do preserves the invariant. After all, the promise is that just using this type's safe API can never lead to UB. This is applies to both structs and unions. One of the things safe code can do is access public fields, which is where this connection comes from.

For example, a public field of a struct cannot have a custom invariant that is different from the custom invariant of the field type: After all, any safe user could write arbitrary data into that field, or read form the field and expect "good" data. A struct where all fields are public can be safely constructed, placing further restrictions on the field.

A union with a public field... well that's somewhat interesting. Reading union fields is unsafe anyway, so nothing changes there. Writing union fields is safe, so a union with a public field has to be able to handle arbitrary data which satisfies that field's type's custom invariant being put into the field. I doubt this will be very useful...

So, to recap, when you choose a custom invariant, it is your responsibility to make sure that foreign safe code cannot break this invariant (and you have tools like private fields to help you achieve this). It is the responsibility of foreign unafe code to not violate your invariant when that code does something safe code could not do.

This means that internally, inside of Wrap's module, implementation of Wrap module can, for example, temporarily write "unexpected values" into it, if it doesn't leak them to users, and compiler will be okay with them.

Correct. (panic-safety is a concern here but you are probably aware). This is just like, in Vec, I can safely do

let sz = self.size;
self.size = 1337;
self.size = sz;

and there is no UB.

mod m {
    union MyPrivateUnion { /* private fields */ }
    extern {
        fn my_private_ffi_function() -> MyPrivateUnion; // Can return garbage (?)
    }
}

In terms of the syntactic layout invariant, my_private_ffi_function can do anything (assuming the function call ABI and signature matches). In terms of the semantic custom invariant, that's not visible in the code -- whoever wrote this module had an invariant in mind, they should document it next to their union definition and then make sure that the FFI function returns a value which satisfies the invariant.

RalfJung · 2018-08-22T17:20:21Z

I finally wrote that blog post about whether and when &mut T must be initialized, and the two kinds of invariants I mentioned above.

SimonSapin · 2019-03-10T19:47:25Z

Is there anything left to track here that’s not already covered by #55149, or should we close?

Nemo157 · 2019-05-13T08:44:55Z

E0658 still points here:

error[E0658]: unions with non-Copy fields are unstable (see issue #32836)

Avi-D-coder · 2019-09-16T00:23:15Z

This currently plays terribly with atomics, since they do not implement Copy. Does anyone know a workaround?

SimonSapin · 2019-09-16T00:28:42Z

When #55149 is implemented, you’ll be able to use ManuallyDrop<AtomicFoo> in a union. Until then, the only work-around is to use Nightly (or not use union and find some alternative).

RalfJung · 2019-09-16T07:35:02Z

With that implemented, you shouldn't even need ManuallyDrop; after all rustc knows that Atomic* does not implement Drop.

Centril · 2019-10-21T20:10:29Z

Assigning myself to switch the tracking issue to the new one.

@petrochenkov

…plett unions: test move behavior of non-Copy fields This test ensures the behaviors suggested by @petrochenkov [here](rust-lang#32836 (comment)).

nikomatsakis added B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. T-lang Relevant to the language team, which will review and decide on the PR/issue. B-unstable Blocker: Implemented in the nightly compiler and unstable. labels Apr 8, 2016

nikomatsakis mentioned this issue Apr 8, 2016

unions rust-lang/rfcs#1444

Merged

mbrubeck mentioned this issue Sep 25, 2018

[meta] Wishlist for smallvec 1.0 servo/rust-smallvec#73

Closed

12 tasks

RalfJung mentioned this issue Oct 5, 2018

Representation of unions rust-lang/unsafe-code-guidelines#13

Closed

SimonSapin mentioned this issue Oct 17, 2018

Tracking issue for RFC 2514, "Union initialization and Drop" #55149

Closed

7 tasks

SimonSapin mentioned this issue Nov 25, 2018

Use a union to avoid UB with uninitialized &mut T ratel-rust/toolshed#6

Merged

jacobrosenthal mentioned this issue Dec 14, 2018

When to go 1.0 and what edition should 1.0 target? em32-rs/efm32hg-pac#6

Open

carols10cents mentioned this issue Jun 14, 2019

Tracking issue for union improvements rust-lang/edition-guide#173

Closed

Centril self-assigned this Oct 21, 2019

Centril mentioned this issue Oct 24, 2019

Adjust the tracking issue for untagged_unions. #65747

Merged

Centril added the F-untagged_unions `#![feature(untagged_unions)]` label Oct 24, 2019

Centril mentioned this issue Oct 24, 2019

Under what conditions can you implement Copy for a union? #65748

Closed

bors closed this as completed in 8b9661b Oct 25, 2019

mbrubeck mentioned this issue Nov 19, 2019

[meta] Wishlist for smallvec 2.0 servo/rust-smallvec#183

Open

11 tasks

royaltm mentioned this issue Feb 1, 2020

untagged unions are in rust stable now, perhaps it's time to adapt servo/rust-smallvec#201

Closed

RalfJung mentioned this issue Aug 15, 2020

unions: test move behavior of non-Copy fields #75559

Merged

Untagged unions (tracking issue for RFC 1444) #32836

Untagged unions (tracking issue for RFC 1444) #32836

Comments

nikomatsakis commented Apr 8, 2016 • edited by Centril Loading

sfackler commented Apr 8, 2016

solson commented Apr 8, 2016

sfackler commented Apr 8, 2016

solson commented Apr 8, 2016

ohAitch commented Apr 8, 2016

joshtriplett commented Apr 8, 2016

sfackler commented Apr 8, 2016

solson commented Apr 8, 2016

joshtriplett commented Apr 8, 2016

retep998 commented Apr 9, 2016

ohAitch commented Apr 9, 2016

nikomatsakis commented Apr 12, 2016

joshtriplett commented Apr 12, 2016

solson commented Apr 12, 2016

nikomatsakis commented Apr 12, 2016

solson commented Apr 12, 2016

arielb1 commented Apr 14, 2016

nikomatsakis commented Apr 16, 2016

arielb1 commented Apr 16, 2016 • edited Loading

solson commented Apr 16, 2016

arielb1 commented Apr 17, 2016 • edited Loading

ghost commented Apr 27, 2016 • edited by ghost Loading

Amanieu commented Apr 27, 2016

joshtriplett commented Apr 27, 2016

RumataEstor commented Jun 21, 2016

joshtriplett commented Jun 22, 2016

petrochenkov commented Jul 29, 2018 • edited Loading

joshtriplett commented Jul 29, 2018

scottmcm commented Jul 30, 2018

RalfJung commented Jul 30, 2018 • edited Loading

petrochenkov commented Jul 30, 2018 • edited Loading

petrochenkov commented Aug 5, 2018

RalfJung commented Aug 6, 2018

RalfJung commented Aug 22, 2018

SimonSapin commented Mar 10, 2019

Nemo157 commented May 13, 2019

Avi-D-coder commented Sep 16, 2019

SimonSapin commented Sep 16, 2019

RalfJung commented Sep 16, 2019

Centril commented Oct 21, 2019

nikomatsakis commented Apr 8, 2016 •

edited by Centril

Loading

arielb1 commented Apr 16, 2016 •

edited

Loading

arielb1 commented Apr 17, 2016 •

edited

Loading

ghost commented Apr 27, 2016 •

edited by ghost

Loading

petrochenkov commented Jul 29, 2018 •

edited

Loading

RalfJung commented Jul 30, 2018 •

edited

Loading

petrochenkov commented Jul 30, 2018 •

edited

Loading