Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnsafePinned: allow aliasing of pinned mutable references #3467

Merged
merged 35 commits into from
Jun 18, 2024

Conversation

RalfJung
Copy link
Member

@RalfJung RalfJung commented Aug 1, 2023

Add a type UnsafePinned that indicates to the compiler that this field is "pinned" and there might be pointers elsewhere that point to the same memory.
This means, in particular, that &mut UnsafePinned is not necessarily a unique pointer, and thus the compiler cannot just make aliasing assumptions.
However, &mut UnsafePinned can still be mem::swaped, so this is not a free ticket for arbitrary aliasing of mutable references.
You need to use mechanisms such as Pin to ensure that mutable references cannot be used in incorrect ways by clients.

This type is then used in generator lowering, finally fixing #63818.

Thanks to @rust-lang/opsem for a bunch of useful feedback!

Rendered

Tracking:

@ehuss ehuss added the T-lang Relevant to the language team, which will review and decide on the RFC. label Aug 1, 2023
@RalfJung

This comment was marked as off-topic.

text/0000-unsafe-aliased.md Outdated Show resolved Hide resolved
text/0000-unsafe-aliased.md Outdated Show resolved Hide resolved
text/0000-unsafe-aliased.md Outdated Show resolved Hide resolved
@comex
Copy link

comex commented Aug 1, 2023

Strongly in favor. But bikeshed time:

unlike types like MaybeUninit or MaybeDangling that have an effect when wrapped around the pointer.

This wording initially threw me for a loop, given that it's much more common to see &mut MaybeUninit<T> than MaybeUninit<&mut T>. Then I remembered that when you want MaybeUninit to affect pointer validity requirements specifically, it has to be wrapped around a pointer. And in fact from a pure opsem perspective, &mut MaybeUninit<T> behaves the same as &mut T (right?). But I'd argue that line of reasoning sees the world through overly opsem-tinted glasses. From the perspective of intended or safe usage, &mut MaybeUninit<T> very much implies weaker validity rules for the pointee than &mut T; it's just that those rules aren't enforced until you dereference them at their respective types.

That said, I still think UnsafeAliased is a good name.

@RalfJung
Copy link
Member Author

RalfJung commented Aug 1, 2023

&mut MaybeUninit very much implies weaker validity rules for the pointee than &mut T

Hm, yeah... however the aliasing and dereferenceable rules are still in full effect on &mut MaybeUninit<T> and that's what this RFC is call about.

I tried to clarify this in the RFC text.

@clarfonthey
Copy link
Contributor

So, I definitely feel like UnsafeAliased isn't that great of a name, but for the primary reason that I think that a good unsafe API would have a safe counterpart, which here would just be called Aliased.

And when I think over and over again about what that kind of API would look like, I keep coming back to the same conclusion: isn't that just RefCell? I mean, obviously, they're not the same. But they feel the same in the sense that RefCell is accomplishing the same thing. Here, our data is inside the cell and the pointer is an Option<Ref<T>>. Obviously, it doesn't work this way, but this line of reasoning makes it feel weird to simply refer to the value as "aliased," even though that's the name of the optimisation (or rather, lack thereof) that gets applied.

Like, the confusion, I think, is the fact that "aliased" implies that the issue is someone else having a reference to the value, when the real issue is that the value itself cannot be referenced safely. So, in the first example given, the issue isn't that we've got the reference ptr_to_data, but rather the fact that data thinks it owns its value when it's lent its data to someone else.

I tried a bit to sketch this safe Aliased API, but I couldn't wrap my head around the details. Although the terminology that made sense to me was to consider it a "Slot," since the value could be "moved" out of its slot and has to be "moved" back into it before the slot can be used again. Internally, this would just be accomplished via noalias, but a term like UnsafeSlot to me says something along the lines of, I can take the value out of the slot (get a pointer to it), but in order to use the value again, I must put it back (relinquish borrow). This will solve some of the confusion around UB with multiple aliases (you can't have multiple aliases, it's just that any actions done around the value can't be assumed to be of something other than the value itself, since one of them might be "putting it back in").

I had a rough time properly formulating this since, although noalias is a simple optimisation, the actual way it breaks the borrow mechanics are hard to describe. I probably did a terrible job, but maybe it helps improve the way this is explained.

@RalfJung
Copy link
Member Author

RalfJung commented Aug 2, 2023

So, I definitely feel like UnsafeAliased isn't that great of a name, but for the primary reason that I think that a good unsafe API would have a safe counterpart, which here would just be called Aliased.

A safe counterpart would be some Pin-based construction that allows self-referential pointers. That's why UnsafePinned is a proposed alternative. But I don't think that's necessarily a good name since there might be uses of this that do not involve pinning.

I also think UnsafeCell is not a great name, since it says basically nothing about what is actually happening. (That's also why I don't like "slot" at all. "slot" and "cell" aren't even obviously referring to different concepts; UnsafeSlot would be just as good a name for our interior mutability primitive as UnsafeCell.) UnsafeSharedMutable would be a lot better. So then maybe this one here should be UnsafeMutableAliased?

Like, the confusion, I think, is the fact that "aliased" implies that the issue is someone else having a reference to the value, when the real issue is that the value itself cannot be referenced safely.

I don't understand the distinction you are making between a reference existing to the value vs the value being referenced.

the issue isn't that we've got the reference ptr_to_data, but rather the fact that data thinks it owns its value when it's lent its data to someone else

So... UnsafeMaybeBorrowed?

Co-authored-by: Jacob Lifshay <[email protected]>
Copy link

@CAD97 CAD97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(top level comment is just naming bikeshed)

The closest to a safe equivalent to UnsafeAliased<T> would be Pin<Box<T>>. There cannot be a full safe equivalent, though; to a rough estimation, the purpose of UnsafeAliased is to make &mut UnsafeAliased<T> act like &UnsafeCell<T> (w.r.t. the borrow opsem).

This lens does offer an alternative avenue for deriving a name — focusing on that w.r.t. aliasing, UnsafeAliased behaves as-if it introduces an indirection. Except that this still isn't quite accurate, since adding a (raw) pointer indirection also enables shared mutability but UnsafeAliased doesn't.

Given that UnsafeAliased is to &mut T as UnsafeCell is to &T, I don't actually dislike UnsafeMut as a name any more than I consider UnsafeCell an imperfect1 name. UnsafeCell does have "this is a cell type but unchecked" as an analogy driving its name, and UnsafeAliased doesn't have any such "but unchecked" analogy to directly lean on, but it can still pull the analogy to cell types' impact on &Cell and apply that to &mut Cell.

Arguing this naming direction becomes easier if UnsafeMutCell also relaxes &, i.e. is a standard cell type (is !Freeze) in addition. Usage which can be made sound without UnsafeCell semantics are quite rare2, but they do exist, and pending decisions around cell preciseness, the memory regions with the separate relaxations may not always subset nicely.

The other potential name which I'm partial to is UnsafePinCell. This leans more into naming based on usage instead of effect, as well as leaning on PhantomPinned (which could be potentially defined as UnsafePinCell<()>). This becomes more interesting of a naming direction if a) we commit to infectious cell semantics and b) provide a PhantomMut as UnsafeCell<()> / Freeze optout.

Footnotes

  1. UnsafeCell is an imperfect name because it's not just an unchecked Cell. std nonexhaustively lists all of Cell, RefCell, OnceCell, Mutex, RwLock, OnceLock, and the atomic types as cell types, and UnsafeCell is more accurately described as unchecked any one of those.

  2. Shared cell semantics are needed because getting &S from Pin<&mut S>, and this attempts to retag S's bytes as ref-borrowed, which would invalidate any previously derived mut-borrowed provenance, IIUC. As such, any usage without UnsafeCell would need to only dynamically extend the static shared-provenance and not mut-provenance.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

API sketch:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should UnsafeAliased be !Unpin? This (negative) impl listed isn't in the sketch, but it must be for the earlier guide-level line about Pin<&mut S> soundness to be accurate.

By analogy to how the other autotraits are treated, I argue that yes, it should be not-Unpin. The user can always implement Unpin if desired.

text/0000-unsafe-aliased.md Outdated Show resolved Hide resolved

- We have a `Unique` auto trait similar to `Freeze` that is implemented if the type does not contain any by-val `UnsafeAliased`.
- `noalias` on mutable references is only emitted for `Unique` types. (This replaces the current hack where it is only emitted for `Unpin` types.)
- Miri will do `SharedReadWrite` retagging inside `UnsafeAliased` similar to what it does inside `UnsafeCell` already. (This replaces the current `Unpin`-based hack.)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's clear that &UnsafeAliased<T> carries a Shared tag, and that mutating through it is UB (assuming no UnsafeCell is present). What's not clear is how retagging impacts extant sibling write-capable provenance.

This very much gets into some undecided specifics to specify precisely, but some specifics are needed. Namely, if I do

let mut s = pin!(S::new());
s.as_mut().get_data(); // init ptr
black_box(s.as_ref()); // retag &S
s.as_mut().set_data(42); // write through ptr

is this guaranteed to be permitted? If it is, then IIUC, &UnsafeAliased<T> can protect the bytes such that writing to them would be UB, but must not retag them in such a way as to completely invalidate sibling write-capable provenance.

Copy link
Member Author

@RalfJung RalfJung Aug 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea where existing siblings matter in your example.

In SB, each as_mut creates a new SRW child of s. Each as_ref creates a new SRO child. Both of these actions preserve existing SRW and SRO children of s.

In TB, as_mut will just return the original provenance -- no retagging for !Unique mutable references. as_ref will create a read-only child that lives as long as there is no write.

Your example doesn't even exploit any of that though. It would be allowed even if creating a new SRW or SRO child first killed all previously created children, since you are creating new pointer with calls to as_mut/as_ref all the time. Did you mean to keep the first as_mut pointer around and use it again after the as_ref?

@clarfonthey
Copy link
Contributor

clarfonthey commented Aug 3, 2023

A safe counterpart would be some Pin-based construction that allows self-referential pointers. That's why UnsafePinned is a proposed alternative. But I don't think that's necessarily a good name since there might be uses of this that do not involve pinning.

I mean, even such construction would be incomplete, TBQH. In the second example given, for example, data is accessed without first "dropping" the borrowed pointer, and this would be required if there were an actual mutable reference to the data. The mutability is precisely what requires this extra construct; self-referentiality on its own isn't enough.

I also think UnsafeCell is not a great name, since it says basically nothing about what is actually happening. (That's also why I don't like "slot" at all. "slot" and "cell" aren't even obviously referring to different concepts; UnsafeSlot would be just as good a name for our interior mutability primitive as UnsafeCell.) UnsafeSharedMutable would be a lot better. So then maybe this one here should be UnsafeMutableAliased?

I sympathise with the concept of "cell" being a horrible name, although what I like about it is that it has actually two names: Cell, and "interior mutability." While Cell isn't the safe counterpart for UnsafeCell, it's the simplest one, and without Copy semantics you need to branch into more complicated abstractions like RefCell, Mutex, etc.

In terms of the "second" name here, what we're offering is effectively an alternative to &own references or &move references via an unsafe API. What we're doing is effectively an owning reference, since simple self-referentiality can be accomplished without the need for these unsafe-aliasing features.

I don't understand the distinction you are making between a reference existing to the value vs the value being referenced.

In terms of ownership, ptr_to_data has exclusive control over the value whereas data does not. So, if someone accesses the value via data, they're technically violating those ownership guarantees, since ptr_to_data is in control. So, effectively, obtaining a mutable reference to data has to come with the unsafe assertion that it actually does own its own data, and no mutable references to the data exist elsewhere. This also comes with a compile-time indication that operations on the newly-created reference can't be reordered with respect to other mutations, since any one of them could be the one that actually owned the data beforehand.

So... UnsafeMaybeBorrowed?

One other term I thought of was "disowned" since data is effectively no longer in control of its own ownership. This also closer resembles the other versions of this feature which were proposed, like references which actually move values.

@RalfJung
Copy link
Member Author

RalfJung commented Aug 3, 2023

Usage which can be made sound without UnsafeCell semantics are quite rare2

That's wrong. I am not aware of a single usage that needs UnsafeCell semantics. The current implementation of the Unpin hack does not have UnsafeCell semantics, and Miri is perfectly happy with self-referential generators and async fn and all sorts of other pinning shenanigans.

@VitWW
Copy link

VitWW commented Aug 3, 2023

UnsafeAliased<T> is not a good naming.
AlreadyBorrowed<T> / ConsiderBorrowed<T> / AssumeBorrowed<T> is much better name

@RalfJung
Copy link
Member Author

RalfJung commented Aug 4, 2023

@clarfonthey

I mean, even such construction would be incomplete, TBQH. In the second example given, for example, data is accessed without first "dropping" the borrowed pointer, and this would be required if there were an actual mutable reference to the data.

ptr_to_data is a raw pointer, not a reference, so it doesn't have to be "dropped". Using both it and the field back-and-forth is intended to be legal.

And even if it was a reference, this example would be fine -- the lifetime of the reference ends after the write(42). Generators (and by extension async fn) will generate code that does exactly that. The example matches the following generator:

let mut data = 0;
let ptr_to_data = &mut data;
yield;
*ptr_to_data = 42;
println!("{}", data);
return;

The mutability is precisely what requires this extra construct; self-referentiality on its own isn't enough.

This is trivially true in the sense that if everything is read-only then there are no aliasing violations. Self-referential webs of shared references can be created in safe code.

But for better or worse, we need self-referential data with mutable references, e.g. to support the Future::poll API.

Maybe you want to say that UnsafeAliased is an incomplete name since "aliased" is fine, we allow aliasing all the time via shared references? I agree. It should be UnsafeAliasedDespiteMutableReference. That's a bit too long though...

I sympathise with the concept of "cell" being a horrible name, although what I like about it is that it has actually two names: Cell, and "interior mutability." While Cell isn't the safe counterpart for UnsafeCell, it's the simplest one, and without Copy semantics you need to branch into more complicated abstractions like RefCell, Mutex, etc.

In terms of the "second" name here, what we're offering is effectively an alternative to #998 or #1646 via an unsafe API. What we're doing is effectively an owning reference, since simple self-referentiality can be accomplished without the need for these unsafe-aliasing features.

Insofar as "interior mutability" is a (terrible) name for "mutability behind shared references", the corresponding thing for UnsafeAliased would be "aliasing / alternative incoming pointers despite mutable references". I don't think there is any relation to &own/&move.

In terms of ownership, ptr_to_data has exclusive control over the value whereas data does not. So, if someone accesses the value via data, they're technically violating those ownership guarantees, since ptr_to_data is in control. So, effectively, obtaining a mutable reference to data has to come with the unsafe assertion that it actually does own its own data, and no mutable references to the data exist elsewhere. This also comes with a compile-time indication that operations on the newly-created reference can't be reordered with respect to other mutations, since any one of them could be the one that actually owned the data beforehand.

Nobody is creating a mutable reference to data though. People are creating mutable references to S, of which data is a field, and we somehow have to indicate that this should not be considered as also making the usual "mutable reference promises" for data.

One other term I thought of was "disowned" since data is effectively no longer in control of its own ownership. This also closer resembles the other versions of this feature which were proposed, like references which actually move values.

It would be MaybeDisowned though -- some of the time the field does have all the ownership. In fact when you have a regular &mut S you still get all the ownership for all the fields -- otherwise mem::swap would be unsound.

You can't really go with any kind of ownership story for this RFC since the ownership of &mut T is fully determined already. That's why self-referential futures need pinning -- they need to get away from that mutable reference and its strong ownership requirements.

@RalfJung
Copy link
Member Author

RalfJung commented Aug 4, 2023

The other potential name which I'm partial to is UnsafePinCell. This leans more into naming based on usage instead of effect, as well as leaning on PhantomPinned (which could be potentially defined as UnsafePinCell<()>). This becomes more interesting of a naming direction if a) we commit to infectious cell semantics and b) provide a PhantomMut as UnsafeCell<()> / Freeze optout.

I think we should avoid "cell", but I like UnsafePinned and it is an alternative mentioned in the RFC. If we impl !Unpin for UnsafePinned, we could define PhantomPinned as UnsafePinCell and that would be backwards compatible. We could add a new unsafe trait that detects UnsafePinned the way Freeze detects UnsafeCell (called Unique in the RFC but I hope there are better names), and switch codegen over to using that to opt-out of noalias for &mut, and that would be backwards compatible except for code that does impl !Unpin (which is unstable).

Without infectious cell semantics, switching Miri over to Unique would not be backwards compatible, but that's okay.

So yeah I really like the idea of re-defining PhantomPinned as UnsafePinned<()> (no matter the name we end up picking).

@ahicks92
Copy link

ahicks92 commented Aug 5, 2023

SO something I don't like about this RFC is that it requires tracking the provenance of a reference through all code you control, in case that reference somehow aliases.

My guess is that it's only a matter of time until someone uses this wrong internally, but in a important crate like arc_swap (as an example--they don't need it I think), and that relies on the current optimization behavior of the compiler at whatever time this happens. If you consider one person writing unsafe code in one sitting it's possible to get this right, but if given to a team or maintained over a long period of time, I feel like this asymptotically approaches very hard to spot problems.

Unfortunately I absolutely don't have a better suggestion here. I'd offer one if I could. The drawbacks section does acknowledge this, but the core crate concern makes me flinch a bit. I don't want to be the one who finds miscompilation because of something that's a 5-times dependency and we decided to upgrade Rust. I'm not sure how we'd even track that back to the problem. Unfortunately something like this is probably necessary. I just don't like what appears to me to be a major incident waiting to happen. Imagine if this took out tokio.

@Darksonn
Copy link
Contributor

Darksonn commented Aug 6, 2023

Imagine if this took out tokio.

It's interesting that you mention Tokio. I'm one of the maintainers of Tokio. Tokio does have unsafe code that relies on the so-called Unpin hack to write custom self-referential futures (well, mostly futures with intrusive linked lists). In fact, I suspect that Tokio is the single largest user of the Unpin hack on crates.io.

We justify this because it lets us eliminate a bunch of allocations, and I have been monitoring the state of these issues for a very long time, and I have always been ready to replace the Unpin hack with a proper solution once one is stabilized. For example, see this gist that I wrote back in Jan 2021 advocating for pretty much the solution proposed here.

@tmandry tmandry added the T-opsem Relevant to the operational semantics team, which will review and decide on the RFC. label May 29, 2024
@tmandry
Copy link
Member

tmandry commented May 29, 2024

@rfcbot fcp merge

We discussed this RFC in today's lang team meeting (notes) and I think we should move forward with the RFC.

Teaching

One concern I had is around how to teach proper use of UnsafePinned. In particular, it would still be UB to create a mutable reference accessible by unknown code to any type transitively containing UnsafePinned, if that reference is aliased. Pointers to any type which has aliasing references (including self-references) should instead all be wrapped in Pin.

This is somewhat different from UnsafeCell, which neatly contains all the unsafety directly in its API by only handing out raw pointers, and requires a user to directly use unsafe to do anything which might violate its invariants. That said, I think this weirdness is inherent in the Pin API itself, and we are just giving users tools they need to express the patterns it is designed to enable.

@RalfJung used a framing of "capabilities" that I thought was insightful. Generic code which has &T for some unknown, unbounded T is not capable of doing anything with that reference (even dereferencing it would require Copy). On the other hand, &mut T does come with inherent capabilities, like the ability to call std::mem::swap, which assumes its arguments do not alias. The need to restrict such capabilities subsequently results in the restrictions around handing out mutable references.

Naming

I raised an eyebrow at the name UnsafePinned, having been used to seeing UnsafeAliased before. However, this section of the RFC convinced me:

An earlier proposal suggested to call the type UnsafeAliased, since the type is not inherently tied to pinning.
However, it is not possible to write sound wrappers around UnsafeAliased such that we can have aliasing &mut MyType. One has to use pinning for that: Pin<&mut MyType>.
Because of that, the RFC proposes that we suggestively put pinning into the name of the type, so that people don't confuse it with a general mechanism for aliasing mutable references.
It is more like the core primitive behind pinning, where whenever a type is pinned that is caused by an UnsafePinned field somewhere inside it.

For instance, it may be tempting to use an UnsafeAliased type to mark a single field in some struct as "separately aliased", and then a Mutex<Struct> would acquire ownership of the entire struct except for that field.
However, due to mem::swap, that would not be sound.
One cannot hand out an &mut to such aliased memory as part of a safe-to-use abstraction -- except by using pinning.

The fact that the use of Pin goes hand-in-hand with the use of UnsafePinned, in both directions – to the point where we would implement the existing PhantomPinned in terms of it and potentially even deprecate that type – convinces me that this is the right name. I also think it improves the story around teaching to have them paired.

That said, @joshtriplett also raised concerns about the name and I believe he would like to have it included as an open question in the RFC.

@rfcbot
Copy link
Collaborator

rfcbot commented May 29, 2024

Team member @tmandry has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

cc @rust-lang/lang-advisors: FCP proposed for lang, please feel free to register concerns.
See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. labels May 29, 2024
@joshtriplett
Copy link
Member

For clarity, no objection to the introduction of this in nightly. I'd like to add an unresolved question about naming that we resolve before stabilization, but that's not a blocker for experimentation in nightly.

@RalfJung
Copy link
Member Author

I added the name to the Unresolved questions.

@nikomatsakis
Copy link
Contributor

@rfcbot reviewed

@nikomatsakis
Copy link
Contributor

The name UnsafePinned doesn't really "speak to me" -- it feels like the name could better convey that it is the target of raw pointers. Perhaps UnsafePinPointee or something, I'm not sure. But since the name is unresolved I'm happy to go forward for now.

@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Jun 6, 2024
@rfcbot
Copy link
Collaborator

rfcbot commented Jun 6, 2024

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this RFC. to-announce and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Jun 16, 2024
@rfcbot
Copy link
Collaborator

rfcbot commented Jun 16, 2024

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@traviscross traviscross removed the I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. label Jun 18, 2024
@traviscross traviscross merged commit 784b515 into rust-lang:master Jun 18, 2024
@traviscross
Copy link
Contributor

The teams have accepted this RFC, and we've now merged it.

Thanks to @RalfJung for pushing this high quality work forward, and thanks to those who reviewed this work and provided helpful feedback.

For further updates, follow the tracking issue:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this RFC. T-lang Relevant to the language team, which will review and decide on the RFC. T-opsem Relevant to the operational semantics team, which will review and decide on the RFC. to-announce
Projects
None yet
Development

Successfully merging this pull request may close these issues.