Skip to content

Commit

Permalink
UnsafeAliased: port text from hackmd
Browse files Browse the repository at this point in the history
  • Loading branch information
RalfJung committed Aug 1, 2023
1 parent ed4c592 commit 018a865
Showing 1 changed file with 337 additions and 0 deletions.
337 changes: 337 additions & 0 deletions text/0000-unsafe-aliased.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,337 @@
# `unsafe_aliased`

- Feature Name: `unsafe_aliased`
- Start Date: 2022-11-05
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)

# Summary
[summary]: #summary

Add a type `UnsafeAliased` that acts on `&mut` in a similar way to how `UnsafeCell` acts on `&`: it opts-out of the aliasing guarantees.
However, `&mut UnsafeAliased` can still be `mem::swap`ed, so this is not a free ticket for arbitrary aliasing of mutable references.
Abstractions built on top of `UnsafeAliased` must ensure proper encapsulation when handing such aliases references to clients (e.g. via pinning).

This type is then used in generator lowering, finally fixing [#63818](https://github.com/rust-lang/rust/issues/63818).

# Motivation
[motivation]: #motivation

Let's say you want to write a type with a self-referential pointer:

```rust
#![feature(pin_macro)]
#![feature(negative_impls)]
use std::ptr;
use std::pin::{pin, Pin};

pub struct S {
data: i32,
ptr_to_data: *mut i32,
}

impl !Unpin for S {}

impl S {
pub fn new() -> Self {
S { data: 42, ptr_to_data: ptr::null_mut() }
}

pub fn get_data(self: Pin<&mut Self>) -> i32 {
// SAFETY: We're not moving anything.
let this = unsafe { Pin::get_unchecked_mut(self) };
if this.ptr_to_data.is_null() {
this.ptr_to_data = ptr::addr_of_mut!(this.data);
}
// SAFETY: if the pointer is non-null, then we are pinned and it points to the `data` field.
unsafe { this.ptr_to_data.read() }
}

pub fn set_data(self: Pin<&mut Self>, data: i32) {
// SAFETY: We're not moving anything.
let this = unsafe { Pin::get_unchecked_mut(self) };
if this.ptr_to_data.is_null() {
this.ptr_to_data = ptr::addr_of_mut!(this.data);
}
// SAFETY: if the pointer is non-null, then we are pinned and it points to the `data` field.
unsafe { this.ptr_to_data.write(data) }
}
}

fn main() {
let mut s = pin!(S::new());
s.as_mut().set_data(42);
println!("{}", s.as_mut().get_data());
}
```

This kind of code is implicitly generated by rustc all the time when an `async fn` has a local variable of reference type that is live across a yield point.
The problem is that this code has UB under our aliasing rules: the `&mut S` inside the `self` argument of `get_data` aliases with `ptr_to_data`!

This simple code only has UB under Stacked Borrows but not under the LLVM aliasing rules; more complex variants of this -- still in the realm of what `async fn` generates -- also have UB under the LLVM aliasing rules.

<details>

<summary>A more complex variant</summary>

```rust
#![feature(pin_macro)]
#![feature(negative_impls)]
use std::ptr;
use std::pin::{pin, Pin};
use std::task::Poll;

pub struct S {
state: i32,
data: i32,
ptr_to_data: *mut i32,
}

impl !Unpin for S {}

impl S {
pub fn new() -> Self {
S { state: 0, data: 42, ptr_to_data: ptr::null_mut() }
}

fn poll(self: Pin<&mut Self>) -> Poll<()> {
// SAFETY: We're not moving anything.
let this = unsafe { Pin::get_unchecked_mut(self) };
match this.state {
0 => {
// The first time, set up the pointer.
this.ptr_to_data = ptr::addr_of_mut!(this.data);
// Now yield.
this.state += 1;
Poll::Pending
}
1 => {
// After coming back from the yield, write to the pointer.
unsafe { this.ptr_to_data.write(42) };
// And read our local variable `data`.
// THIS IS UB! `this` is derived from the `noalias` pointer
// `self` but we did a write to `this.data` in the previous
// line when writing to `ptr_to_data`. The compiler is allowed
// to reorder this and the previous line and then the output
// would change.
println!("{}", this.data);
// Done!
Poll::Ready(())
}
_ => unreachable!(),
}
}
}

fn main() {
let mut s = pin!(S::new());
while let Poll::Pending = s.as_mut().poll() {}
}
```

</details>

<br>

Beyond self-referential types, a similar problem also comes up with intrusive linked lists: the nodes of such a list often live on the stack frames of the functions participating in the list, but also have incoming pointers from other list elements.
When a function takes a mutable reference to its stack-allocated node, that will alias the pointers from the neighboring elements.
`Pin` is used to ensure that the list elements don't just move elsewhere (which would invalidate those incoming pointers), but there still is the problem that an `&mut Node` is actually not a unique pointer due to these aliases -- so we need a way for the to opt-out of the aliasing rules.

<br>

The goal of this RFC is to offer a way of writing such self-referential types and intrusive collections without UB.
We don't want to change the rules for mutable references in general (that would also affect all the code that doesn't do anything self-referential), instad we want to be able to tell the compiler that this code is doing funky aliasing and that should be taken into account for optimizations.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

To write this code in a UB-free way, wrap the fields that are *targets* of self-referential pointers in an `UnsafeAliased`:

```rust
pub struct S {
data: UnsafeAliased<i32>, // <!---- here
ptr_to_data: *mut i32,
}
```

This lets the compiler know that mutable references to `data` might still have aliases, and so optimizations cannot assume that no aliases exist. That's entirely analogous to how `UnsafeCell` lets the compiler know that *shared* references to this field might undergo mutation.

However, what is *not* analogous is that `&mut S`, when handed to safe code *you do not control*, must still be unique pointers!
This is because of methods like `mem::swap` that can still assume that their two arguments are non-overlapping.
(`mem::swap` on two `&mut UnsafeAliased<i32>` may soundly assume that they do not alias.)
In other words, the safety invariant of `&mut S` still requires full ownership of the entire memory range `S` is stored at; for the duration that a function holds on to the borrow, nobody else may read and write this memory.
But again, this is a *safety invariant*; it only applies to safe code you do not control. You can write your own code handling `&mut S` and as long as that code is careful not to make use of this memory in the wrong way, potential aliasing is fine.
You can also hand out a `Pin<&mut S>` to external safe code; the `Pin` wrapper imposes exactly the restrictions needed to ensure that this remains sound (e.g., it prevents `mem::swap`).


# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

API sketch:

```rust
/// The type `UnsafeAliased<T>` lets unsafe code violate
/// the rule that `&mut UnsafeAliased<T>` may never alias anything else.
///
/// However, it is still very risky to have two `&mut UnsafeAliased<T>` that alias
/// each other. For instance, passing those two references to `mem::swap`
/// would cause UB. The general advice is to still have a single mutable
/// reference, and then a bunch of raw pointers that alias it.
/// If you want to pass such a reference to external safe code,
/// you need to use techniques such as pinning (`Pin<&mut TypeWithUnsafeAliased>`)
/// which ensure that both the mutable reference and its aliases remain valid.
///
/// Further note that this does *not* lift the requirement that shared references
/// must be read-only! Use `UnsafeCell` for that.
#[lang = "unsafe_cell_mut"]
#[repr(transparent)]
struct UnsafeAliased<T: ?Sized> {
value: T,
}

impl<T: ?Sized> !Send for UnsafeAliased<T> {}

impl<T> UnsafeCell<T> {
/// Constructs a new instance of `UnsafeCell` which will wrap the specified
/// value.
pub fn new(value: T) -> UnsafeAliased<T> {
UnsafeAliased { value }
}

pub fn into_inner(self) -> T {
self.value
}

/// Get read-write access to the contents of an `UnsafeAliased`.
pub fn get_mut(&mut self) -> *mut T {
self as *mut _ as *mut T
}

/// Get read-only access to the contents of a shared `UnsafeAliased`.
/// Note that `&UnsafeAliased<T>` is read-only if `&T` is read-only.
/// This means that if there is mutation of the `T`, future reads from the
/// `*const T` returned here are UB!
///
/// ```rust
/// unsafe {
/// let mut x = UnsafeAliased::new(0);
/// let ref1 = &mut *addr_of_mut!(x);
/// let ref2 = &mut *addr_of_mut!(x);
/// let ptr = ref1.get(); // read-only pointer, assumes immutability
/// ref2.get_mut.write(1);
/// ptr.read(); // UB!
/// }
/// ```
pub fn get(&self) -> *const T {
self as *const _ as *const T
}

pub fn raw_get_mut(this: *mut Self) -> *mut T {
this as *mut T
}

pub fn raw_get(this: *const Self) -> *const T {
this as *const T
}
}
```

The comment about aliasing `&mut` being "risky" refers to the fact that their safety invariant still asserts exclusive ownership.
This implies that `duplicate` in the following example, while not causing immediate UB, is still unsound:

```rust
pub struct S {
data: UnsafeAliased<i32>,
}

impl S {
fn new(x: i32) -> Self {
S { data: UnsafeAliased::new(x) }
}

fn duplicate<'a>(s: &'a mut S) -> (&'a mut S, &'a mut S) {
let s1 = unsafe { (&s).transmute_copy() };
let s2 = s;
(s1, s2)
}
}
```

The unsoundness is easily demonstrated by using safe code to cause UB:

```rust
let mut s = S::new(42);
let (s1, s2) = s.duplicate(); // no UB
mem::swap(s1, s2); // UB
```

We could even soundly make `get_mut` return an `&mut T`, given that the safety invariant is not affected by `UnsafeAliased`.
But that would probably not be useful and only cause confusion.

Reference diff:

```diff
* Breaking the [pointer aliasing rules]. `&mut T` and `&T` follow LLVM’s scoped
- [noalias] model, except if the `&T` contains an [`UnsafeCell<U>`].
+ [noalias] model, except if the `&T` contains an [`UnsafeCell<U>`] or
+ the `&mut T` contains an [`UnsafeAliased<U>`].
```

Async generator lowering changes:
- Fields that represent local variables whose address is taken across a yield point must be wrapped in `UnsafeAliased`.

Codegen and Miri changes:

- We have a `Unique` auto trait similar to `Freeze` that is implemented if the type does not contain any by-val `UnsafeAliased`.
- `noalias` on mutable references is only emitted for `Unique` types. (This replaces the current hack where it is only emitted for `Unpin` types.)
- Miri will do `SharedReadWrite` retagging inside `UnsafeAliased` similar to what it does inside `UnsafeCell` already. (This replaces the current `Unpin`-based hack.)

### Comparison with some other types that affect aliasing

- `UnsafeCell`: disables aliasing (and affects but does not fully disable dereferenceable) behind shared refs, i.e. `&UnsafeCell<T>` is special. `UnsafeCell<&T>` (by-val, fully owned) is not special at all and basically like `&T`; `&mut UnsafeCell<T>` is also not special.
- `UnsafeAliased`: disables aliasing (and affects but does not fully disable dereferenceable) behind mutable refs, i.e. `&mut UnsafeAliased<T>` is special. `UnsafeAliased<&mut T>` (by-val, fully owned) is not special at all and basically like `&mut T`; `&UnsafeAliased<T>` is also not special.
- [`MaybeDangling<T>`](https://github.com/rust-lang/rfcs/pull/3336): disables aliasing and dereferencable *of all references (and boxes) directly inside it*, i.e. `MaybeDanling<&[mut] T>` is special. `&[mut] MaybeDangling<T>` is not special at all and basically like `&[mut] T`.

# Drawbacks
[drawbacks]: #drawbacks

It's yet another wrapper type adjusting our aliasing rules and very easy to mix up with `UnsafeCell` or [`MaybeDangling`](https://github.com/rust-lang/rfcs/pull/3336).
Furthermore, it is an extremely subtle wrapper type, as the `duplicate` example shows.
`Unique` is a very strange trait, since it does not come with much of a safety requirement -- unlike `Freeze`, which quite clearly expresses some statement about the invariant of a type when shared, it is much less clear what the corresponding statement would be for `Unique`. (More work on RustBelt is needed to determine the details here.)

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

The proposal in this RFC matches [what was discussed with the lang team a long time ago](https://github.com/rust-lang/rust/issues/63818#issuecomment-526579977).

However, of course one could imagine alternatives:

- Keep the status quo. The current sitaution is that we only make aliasing requirements on mutable references if the type they point to is `Unpin`. This is unsatisfying: `Unpin` was never meant to have this job. A consequence is that a stray `impl Unpin` on a `Wrapper<T>`-style type can [lead to subtle miscompilations](https://internals.rust-lang.org/t/surprising-soundness-trouble-around-pollfn/17484/15) since it re-adds aliasing requirements for the inner `T`. Contrast this with the `UnsafeCell` situation, where it is not possible for (stable) code to just `impl Freeze for T` in the wrong way -- `UnsafeCell` is *always* recognized by the compiler.

On the other hand, `UnsafeAliased` is rather quirky in its behavior and having two marker traits (`Unique` and `Unpin`) might be too confusing, so sticking with `Unpin` might not be too bad in comparison.

If we do that, however, it seems preferrable to transition `Unpin` to an unsafe trait. There *is* a clear statement about the types' invariants associated with `Unpin`, so an `impl Unpin` already comes with a proof obligation. It just happens to be the case that in a module without unsafe, one can always arrange all the pieces such that the proof obligation is satisifed. This is mostly a coincidence and related to the fact that we don't have safe field projections on `Pin`. That said, solving this also requires solving the trouble around `Drop` and `Pin`, where effectively an `impl Drop` does an implicit `get_mut_unchecked`, i.e., implicitly assumes the type is `Unpin`.
- `UnsafeAliased` could affect aliasing guarantees *both* on mutable and shared references. This would avoid the currently rather subtle situation that arises when one of many aliasing `&mut UnsafeAliased<T>` is cast or coerced to `&UnsafeAliased<T>`: that is a read-only shared reference and all aliases must stop writing! It would make this type strictly more 'powerful' than `UnsafeCell` in the sense that replacing `UnsafeCell` by `UnsafeAliased` would always be correct. (Under the RFC as written, `UnsafeCell` and `UnsafeAliased` can be nested to remove aliasing requirements from both shared and mutable references.)

If we don't do this, we could consider removing `get` since since it seems too much like a foot-gun. But that makes shared references to `UnsafeAliased` fairly pointless. Shared references to generators/futures are basically useless so it is unclear what the potential use-cases here are.
- Instead of introducing a new type, we could say that `UnsafeCell` affects *both* shared and mutable references. That would lose some optimization potential on types like `&mut Cell<T>`, but would avoid the footgun of coercing an `&mut UnsafeAliased<T>` to `&UnsafeAliased<T>`. That said, so far the author is not aware of Miri detecting code that would run into this footgun (and Miri is [able to do detect such issues](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=aab417b535f7dbd266fbfe470ea208c7)).


# Prior art
[prior-art]: #prior-art

This is somewhat like `UnsafeCell`, but for mutable instead of shared references.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

- How do we transition code that relies on `Unpin` opting-out of aliasing guarantees, to the new type? Futures and generators just need a compiler patch, but there is probably other code that needs adjusting (e.g., Rust-for-Linux uses pinning to handle all sorts of self-referntial things in the Linux Kernel). Note that all such code is explicitly unsupported right now; the `Unpin` loophole has always explicitly been declared as temporary, unstable, and not something that we promise will actually work.
- The name of the type needs to be bikeshed. `UnsafeAliased` might be too close to `UnsafeCell`, but that is a deliberate choice to indicate that this type has an effect when it appears in the *pointee*, unlike types like `MaybeUninit` or [`MaybeDangling`](https://github.com/rust-lang/rfcs/pull/3336) that have an effect when wrapped around the *pointer*. Like `UnsafeCell`, the aliasing allowed here is "interior". Other possible names: `UnsafeSelfReferential`, `UnsafePinned`, ...
- Relatedly, in which module should this type live?
- Should this type `derive(Copy)`? `UnsafeCell` does not, which is unfortunate because it now means some people might use `T: Copy` as indication that there is no `UnsafeCell` in `T`.
- `Unpin` [also affects the `dereferenceable` attribute](https://github.com/rust-lang/rust/pull/106180), so the same would happen for this type. Is that something we want to guarantee, or do we hope to get back `dereferenceable` when better semantics for it materialize on the LLVM side?

# Future possibilities
[future-possibilities]: #future-possibilities

- Similar to how we might want the ability to project from `&UnsafeCell<Struct>` to `&UnsafeCell<Field>`, the same kind of projection could also be interesting for `UnsafeAliased`.

0 comments on commit 018a865

Please sign in to comment.