-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Add freeze
intrinsic and related library functions
#3605
base: master
Are you sure you want to change the base?
Changes from 7 commits
860fb6f
0706505
7077201
b50a994
223a2db
70e679c
9ac31a9
5b2ff7c
76ac3d7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,217 @@ | ||
- Feature Name: `freeze_bytes` | ||
- Start Date: 2024-02-13 | ||
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) | ||
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) | ||
- Feature Name: `freeze_uninit` | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
Defines a language primitive for freezing uninitialized bytes, and library functions that use this language primitive. | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
In Rust, it is generally undefined behaviour to read uninitialized memory. Most types disallow uninitialized bytes as part of their validity invariant. | ||
While in most cases, this is not a problem, as a temporarily held (or externally checked) uninitialized value can be stored as `MaybeUninit<T>`, in some rare cases it can be useful or desireable to handle uninitialized data as a "Don't Care" value, while still doing typed operations, such as arithmetic. | ||
Freeze allows these limited Rust Programs to convert uninitialized data into useless-but-initialized bytes. | ||
|
||
Examples of uses: | ||
1. The major use for freeze is to read padding bytes of structs. This can be used for a [generic wrapper around standard atomic types](https://docs.rs/atomic/latest/atomic/struct.Atomic.html). | ||
2. SIMD Code using masks can load a large value by freezing the bytes, doing lanewise arithmetic operations, then doing a masked store of the initialized elements. With additional primitives not specified here, this can allow for efficient partial load operations which prevent logical operations from going out of bounds (such a primitive could be defined to yield uninit for the lane, which could then be frozen). | ||
3. Low level libraries, such as software floating-point implementations, used to provide operations for compilers where uninit is considered a valid value for the provided operations. | ||
* Along the same lines, a possible fast floating-point operation set that yields uninit on invalid (such as NaN or Infinite) results, stored as `MaybeUninit`, then frozen upon return as `f32`/`f64`. | ||
* Note that such operations require compiler support, and these operations are *not* defined by this RFC. | ||
|
||
|
||
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
Two new library interfaces are defined for unsafe code users: | ||
|
||
```rust | ||
// in module `core::ptr` | ||
pub unsafe fn read_freeze<T>(ptr: *const T) -> T; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the safety contract of this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I write it later in the guide-level explanation: Same as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, to me it felt obvious the unsafe here was just because it's a pointer deref, but since the ultimate implementation will have document that anyway, probably worth just writing it here explicitly. |
||
|
||
impl<T> *const T{ | ||
pub unsafe fn read_freeze(self) -> T; | ||
} | ||
impl<T> *mut T{ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should have a method for |
||
pub unsafe fn read_freeze(self) -> T; | ||
} | ||
|
||
// in module `core::mem` | ||
impl<T> MaybeUninit<T>{ | ||
pub fn freeze(self) -> Self; | ||
} | ||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Poking around the library, the obvious parallel to this functionality is Namely, To me, Another maybe-controversial decision is that I think that these methods should require Maybe there could be some sort of "atomic freeze" that accepts an So, to that end, it feels more like the API should look like this instead: // in core::ptr
pub fn freeze<T>(ptr: *mut T);
pub unsafe fn read_freeze<T>(ptr: *mut T) -> T {
freeze(ptr);
read(ptr)
}
impl<T> *mut T {
pub fn freeze<T>(self);
pub unsafe fn read_freeze<T>(self) -> T {
self.freeze();
self.read()
}
}
impl<T> MaybeUninit<T> {
pub fn freeze(&mut self);
// and maybe:
pub fn frozen() -> MaybeUninit<T> {
let mut uninit = MaybeUninit::uninit();
uninit.freeze();
uninit
}
} And maybe there could also be helpers for arrays, similar to how There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I was confused about this as well, but reading the rest of the RFC I came to the conclusion that this is supposed to be a read-only operation which does nothing to the original memory location. It returns a "freezed version" of the read bytes but the original memory is left untouched. The API you are talking about is summarized by There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No, this is actually literally a write to the target memory, with all of the consequences that follow from that. (No "conceptual" about it, there are cases where you need to actually issue a write instruction, fault on read-only pages, acquire exclusive access to the cache line, the whole nine yards.) The RFC proposes to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, although I guess the main controversy here is that it kind of forgets the reason why For example, take this simple use case: you want to read out a struct in full, including padding bytes, which are normally uninitialized. So, let's say we freeze-read that struct, copy it over a buffer of bytes, then write that out. This is unsound. When you read the struct as a value, the padding is still uninitialized. What you actually want to do is cast the struct to bytes first, then freeze-read that into your buffer. But again, this is still wrong. You need to cast to a slice of maybe-uninit bytes, since the padding is uninitialized, freeze-read that into a slice of init bytes, and write that. This is why If we treat But ultimately, I feel like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I agree with the first part: if you read a struct Using by-value instead of by-mut-ref freeze avoids a whole bunch of issues related to "what if we just elided the read?" on weird memory. (This has been discussed at length on #t-opsem.) It also makes it unusable on read-only memory. The API footgun concern is sound, but the by-value There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, I know. But since I am more of a practitioner, it's easier for me to discuss concrete examples based on potential usecases I have in mind for Does not matter how robust your model is, if it does something surprising or unexpected for people who practice the language (be it it terms of performance or generated insructions), arguably, it's a bad model. You either need to better communicate what the model does, or change the model itself to better satisfy user expectations. After this discussion, I personally find that the proposed One last example from me: pub fn f() {
unsafe {
let mut buf: [u8; 1024] = MaybeUninit::<[u8; 1024]>::uninit().freeze().assume_init();
extern_read(&mut buf);
}
} How much stack space this function would use when compiled with enabled optimizations? ~1024 bytes? ~2048 bytes? More? What exact writes would be generated? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That example should be trivially optimizable to use 1kb of stack. No writes should be necessary since 1kb is less than a page size. If it was more than a page size then it's my understanding that stack probes would already be injected to probe the additional stack pages, irrespective of the use of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wouldn't be so sure. Such buffer can easily cross page boundaries. And even if it does not, you don't know if any writes were generated for this page. What if So instead of a trivial no-op expected by programmers for this code snippet, we get quite non-trivial behavior. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah that's what happens when programmers are wrong. They expect things to be simple that aren't simple (see pointer provenance as another example of this). I don't blame them; optimizing compilers and system tricks like MADV_FREE make things complicated in subtle hard-to-predict ways. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Then you may need additional probes at the page boundary if the contract between the compiler and the platform does not require that stack reads give deterministic results already. I don't know if such a contract is already defined, but a reasonable option is to simply say that such a threading library is just wrong (in the same way the thread stacks are required to be properly aligned.)
The behaviour would be exactly the same, because behaviour is a property of the AM. What instructions are emitted by the compiler for the intrinsic could be completely different in different circumstances though... Obviously? That's what compiler optimizations do!
The behaviour is trivial, the emitted code is not. Freeze is not a no-op in the AM - whether it's a no-op in practice depends on optimizations and contracts, in the same way that whether a copy is elided depends on optimisations. If the contract says that stack memory reads are deterministic then it can indeed compile to a no-op. If not then of course the compiler needs to emit a write to that memory to ensure that it becomes initialized. |
||
|
||
`read_freeze` is an operation that loads a number of bytes from (potentially uninitialized) memory, replacing every byte in the source that is uninitialized with an arbitrary-but-initialized value. | ||
The function has similar safety constraints to `core::ptr::read`, in that the source pointer must be *dereferenceable* and well aligned, as well as pointing to a valid value of type `T` (after replacing uninitialized bytes). | ||
The same-name functions on the raw pointer types are identical to the free function version, and are provided for convience. | ||
Only the bytes of the return value are frozen. The bytes behind the pointer argument are unmodified. | ||
**The `read_freeze` operation does not freeze any padding bytes of `T` (if any are present), and those are set to uninit after the read as usual.** | ||
|
||
`MaybeUninit::freeze` is a safe, by-value version of `read_freeze`. It takes in a `MaybeUninit` and yields an initialized value, which is either exactly `self` if it is already initialized, or some arbitrary value if it is not. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It may be worth noting that the term "initialized" here refers to the notion of an Abstract Byte in memory being initialized. This is not to be confused with the notion of |
||
|
||
The result of `core::ptr::read_freeze` or `MaybeUninit::freeze` is not guaranteed to be valid for `T` if `T` has a more complex validity invariant than an array of `u8` (for example, `char`, `bool`, or a reference type). | ||
However, the result is guaranteed to be valid for an integer or floating point type, or an aggregate (struct, union, tuple, or array) only containing (recursively) those types. | ||
|
||
Note that frozen bytes are arbitrary, not random. | ||
Rust code must not rely on frozen uninitialized bytes (or unfrozen uninitialized bytes) as a source of entropy, only as values that it does not care about and is fine with garbage-but-consistent computational results from. | ||
|
||
For example, the following function: | ||
```rust | ||
pub fn gen_random() -> i32{ | ||
unsafe{MaybeUnit::uninit().freeze().assume_init()} | ||
} | ||
``` | ||
|
||
can be validily optimized to: | ||
```rust | ||
pub fn gen_random() -> i32{ | ||
4 // Compiler chose a constant by fair dice roll | ||
} | ||
``` | ||
|
||
(See also [XKCD 221](https://xkcd.com/221/)) | ||
|
||
Note that the value `4` was chosen for expository purposes only, and the same optimization could be validly replace by any other constant, or not at all. | ||
|
||
## Relationship to `read_volatile` | ||
|
||
`read_volatile` and `read_freeze` are unrelated operations (except insofar as they both `read` from a memory location). | ||
`read_volatile` performs an observable side effect (that compilers aren't allowed to remove), but will otherwise act (mostly) the same as `read`. `read_volatile` does not freeze bytes read. | ||
In contrast, `read_freeze` is not a side effect (thus can be freely optimized by the compiler to anything equivalent). | ||
|
||
It is possible in the future that `read_volatile` may carry a guarantee of freezing (non-padding) bytes, but this RFC does not provide that guarantee. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, part of the reason that I mentioned The purpose of Freezing is still useful for reasons like you said-- volatile reads have side effects that are undesired in most cases where freezing is useful. However, I don't think it's right to say that volatile reads don't freeze the data they read. In fact, I would argue that volatile reads always freeze the data they read. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As it currently is, this is not the case. You can observe the difference with miri or msan. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wait, so, what exactly does that mean for embedded contexts and the DMA case? It's just always been UB to volatile read from these addresses, and only adding freeze makes it not UB? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Presumably anything outside the AM that's accessing memory outside the AM isn't setting things to uninit. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This CE link shows that llvm also doesn't think that Note that it's discarding the value obtained via There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, I completely misread There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Looks like GCC does the same thing, although ICC and MSVC don't. I find that very unfortunate. What makes volatile volatile is that it pierces through Abstract Machine semantics and is defined in terms of assembly-level semantics instead. It has to be defined that way; otherwise it wouldn't make sense to say things like "can't be optimized" or "must be a single load/store instruction". And assembly-level semantics don't have uninitialized bytes. I'm not saying it's impossible to define volatile in a way where it actually can have uninitialized bytes, justifying the behavior observed here. But it makes the definition more complicated for little benefit. My ideal definition for volatile is essentially "equivalent to an I am going to file a bug report against LLVM to see what they think about this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we've tried in the past to make LLVM consider volatile loads as having an implicit freeze, and they wouldn't have it. But we could codegen There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Hang on a second. ICC and MSVC do both the read and the write, and Clang tosses the write but GCC tosses the read as well. That's worse than "the same thing as Clang", it's a straight up violation of the popular understanding of volatile in C. You should also be filing a bug report against GCC. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (FTR, xlang from lccc does spec that both loads and stores with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps it could be mentioned that adding "freezing" semantics to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This sentence alone made me realise just how much I had been misunderstanding the RFC, so, I think it's worth including here at least. |
||
|
||
# Reference-level explanation | ||
[reference-level-explanation]: #reference-level-explanation | ||
|
||
## Uninit Bytes | ||
|
||
Each byte in memory can either be initialized, or uninitialized. When a byte is uninitialized, it is not considered valid as part of any scalar type, the discriminant of an enum type, or pointer type (including reference types). | ||
|
||
Uninitialized memory is a distinct state from initialized memory, and the distinction is used by the defined Language Primitive and thus the library functions. | ||
|
||
|
||
## Language Primitive | ||
|
||
The Guide-level explanation defines the public APIs for this RFC. To implement these public APIs, we define a new language primitive, in the form of an intrinsic function, to perform the operation defined. | ||
|
||
For the purposes of the remainder of this RFC, it is chosen that the defined primitive is the `read_freeze` function, however it is believed that the choice is arbitrary and a valid implementation could choose to implement either as the compiler intrinsic. | ||
An example implementation of the `read_freeze` function using `MaybeUninit::freeze` as the language primitive is provided later. | ||
|
||
```rust | ||
// in module core::intrinsics | ||
pub unsafe extern "rust-intrinsic" fn read_freeze<T>(ptr: *const T) -> T; | ||
``` | ||
|
||
(The above prototype and definition is for *exposition-only*, and is not intended to be exposed directly in a stable form) | ||
|
||
The `read_freeze` intrinsic does a typed copy from `ptr` as `T`, but for every non-padding byte read from `ptr`, if the byte read is uninit, it is instead replaced by a non-determinstic initialized byte (the uninit byte is "frozen"). If the byte is init, then it is copied as-is. | ||
If `T` contains any pointers, we do not attach any valid provenance to bytes that are "frozen" (and thus, the entire pointer doesn't have any provenance when frozen from uninit bytes). Unsafe code cannot dereference such pointers or perform inbounds offsets (`core::ptr::offset`) on those pointers, except as otherwise considered valid for pointers with either no provenance or dangling provenance. | ||
|
||
Other than the validity property, `read_freeze` has the same preconditions as the `read_by_copy` operation (used to implement `core::ptr::read`), or a read from a place obtained by dereferencing `ptr`. | ||
In particular, it must be aligned for `T`, nonnull, dereferenceable for `T` bytes, and the bytes not be covered by a mutable reference that conflicts with `ptr`. | ||
The validity invariant of `T` is enforced after the bytes read by the intrinsic are frozen. | ||
|
||
The intrinsic is not proposed to be `const`, however there is not believed to be a fundamental reason why it could not be defined `const` in the future if there are sufficient use cases for that. This RFC leaves this to a separate stabilization step regardless. | ||
|
||
Any initialized byte value chosen nondeterminstically may be chosen arbitrarily by the implementation (and, in particular, the implementation is permitted to deliberately pick a value that will violate the validity invariant of `T` or otherwise cause undefined behaviour, if such a choice is available). | ||
|
||
|
||
## Library Functions | ||
|
||
### `core::ptr::read_freeze` | ||
```rust | ||
// in module core::ptr | ||
pub unsafe fn read_freeze<T>(ptr: *const T) -> T{ | ||
core::intrinsics::read_freeze(ptr) | ||
} | ||
``` | ||
|
||
The `core::ptr::read_freeze` function directly invokes the `read_freeze` intrinsic and returns the result. | ||
|
||
The majority of the operation is detailed [above](#language-primitive), so the definition is not repeated here. | ||
|
||
The `read_freeze` functions on `*const T` and `*mut T` are method versions of the free function and may be defined in the same manner. | ||
|
||
### `MaybeUninit::<T>::freeze` | ||
|
||
```rust | ||
// in module core::mem | ||
impl<T> MaybeUninit<T>{ | ||
pub fn freeze(self) -> Self{ | ||
unsafe{core::ptr::read_freeze(core::ptr::addr_of!(self))} | ||
} | ||
} | ||
``` | ||
|
||
`MaybeUninit::freeze` is a safe version of the `read_freeze` function, defined in terms of the language intrinsic. It is safe because the value is kept as `MaybeUninit<T>`, which is trivially valid. | ||
To be used, Rust code would need to unsafely call `MaybeUninit::assume_init` and assert that the frozen bytes are valid for `T`. | ||
|
||
The remaining behaviour of the function is equivalent to the `read_freeze` intrinsic. | ||
|
||
### Example Alternative Implementation | ||
|
||
If an implementation decides to define `MaybeUninit::freeze` (or equivalent) as the compiler intrinsic, it is possible to implement `core::ptr::read_freeze` as follows: | ||
```rust | ||
pub unsafe fn read_freeze(ptr: *const T) -> T { | ||
ptr.cast::<MaybeUninit<T>>() | ||
.read() | ||
.freeze() | ||
.assume_init() | ||
} | ||
``` | ||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another drawback that should be discussed is that this will lead to valgrind errors in sound Rust code. Ideally there is some plan for how to enable valgrind to still detect UB due to use of uninit memory (it would be bad to lose this capability) while accepting code that correctly uses There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would imagine Rust can have a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm pretty sure valgrind already has false positives in sound Rust code because it doesn't understand There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you have an example of that? |
||
|
||
The main drawbacks that have been identified so far: | ||
* It is potentially [considered desireable](https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Arguments.20for.20freeze/near/377333420) to maintain the property that sound (e.g. correct) code cannot meaningfully read uninitialized memory | ||
* It is generally noted that safe/unsafe is not a security or privilege boundary, and it's fully possible to write unsound code (either deliberately or inadvertanly) that performs the read. If the use of uninitialized memory is within the threat model of a library that, for example, handles cryptographic secrets, that library should take additional steps to santize memory that contains those secrets. | ||
* Undefined behaviour does not prevent malicious code from accessing any memory it physically can. | ||
chorman0773 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* That said, making such code UB could still be useful as it makes it unambiguously a bug to expose the contents of uninitialized memory to safe code, which can avoid accidental information leaks. If this RFC gets accepted, we should find ways to make it clear that even if doing so would be technically *sound*, this is still not something that Rust libraries are "supposed" to do and it must always be explicitly called out in the documentation. | ||
|
||
|
||
# Rationale and alternatives | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. An alternative that is being considered by PST (see discussion here) is something like: /// Initializes possibly-uninitialized bytes in `v` to `0`.
///
/// # Safety
///
/// Unsafe code may depend on uninitialized bytes in `v` being
/// initialized to `0`. Note, however, that the initialization
/// of padding bytes is not preserved on move.
// TODO: Restrict `v` to non-dyn pointers.
fn initialize<T: ?Sized>(v: &mut T) {} This alternative mitigates some of the listed Drawbacks of this RFC's proposal: Rust would retain the property that sound code does not read uninitialized memory. The performance cost of this is linear with respect to the number of padding bytes in a type. Since this is less than or equal to the total number of bytes in a type, it may be more performant than a move. For types with no padding, it's free. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To implement If would also that require compiler support to guarantee "better-than-O(size_of_val)" performance, even for arbitrary initialization instead of zero-initialization. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't work with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The compiler does know where padding bytes are for arbitrary types. That's the entire shtick of Project Safe Transmute, and it's also part of how miri is able to validate transmutations at runtime.
I don't follow why that would be the case. Both
As I understand it, neither
As documented, it would overwrite them. It would probably be useful to provide
Can you say more about this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There's a catch here: for enums, we need to read the discriminant to find out where the padding bytes are. So this can only be done for data where the discriminant is valid. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think that's unimplementable because sometimes LLVM doesn't know if bytes are uninitialized (e.g. calling an external function), so it can't know which bytes to zero. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
In order for this to work, either the compiler would have to know and preserve which bytes are uninit (violating the refinement that an uninitialized byte to any initialized value which is not implementable), or it would have to set every byte to 0. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @programmerjake, @chorman0773 The semantics of
I don't claim that
It's a sufficiently viable and compelling alternative that it deserves some consideration in the Alternatives section of this RFC. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Without an additional bound that There's no need for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I'll agree that |
||
[rationale-and-alternatives]: #rationale-and-alternatives | ||
|
||
* The intrinsic could be defined as a by-value operation used by `MaybeUninit::freeze`, instead of the read operation used by `core::ptr::read_freeze` | ||
* As noted above, it is believed that the two intrinsic choices are functionally identical, and thus the choice between them is completely arbitrary. | ||
* Either one of the two functions could be provided on their own | ||
* Both functions are provided for maximum flexibility, and can be defined in terms of each other. The author does not believe there is significant drawback to providing both functions instead of just one | ||
* An in-place, mutable `freeze` could be offered, e.g. `MaybeUninit::freeze(&mut self)` | ||
* While this function would seem to be a simple body that llvm could replace with a runtime no-op, in reality it is possible for [virtual memory](https://man7.org/linux/man-pages/man2/madvise.2.html#MADV_FREE) that has been freshly allocated and has not been written to exhibit properties of uninitialized memory (rather than simply being an abstract phenomenon the compiler tracks that disappears at runtime). Thus, such an operation would require a potentially expensive in-place copy. Until such time as an optimized version is available, we should avoid defining the in-place version, and require users to spell it explicitly as `*self = core::mem::replace(&mut self, uninit()).freeze()`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we add this, the in-place version also can be implemented in a library crate, that uses inline asm to touch/write to each page at least once ensuring that it is not MADV_FREE. Without the primitive freeze, there's no corresponding sequence of AM operations that this is equivalent to, but with it, the asm is equivalent to freezing each byte by-hand. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you elaborate that a bit @thomcc? Like, what inline assembly are you thinking of? Because from your comment it feels like you're proposing both the use of ASM and the new operation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @clarfonthey Inline assembly that does this can be written today, but because the freeze operation does not exist in the abstract machine, that inline asm has... maybe not quite UB, but it performs something the AM does not recognize as a thing that can be done. However, if the freeze operation is added to the AM, that inline asm can be shown to have fully defined behavior in the AM in terms of the freeze operation. In other words, yes, the ASM and the new stdlib functions would probably both be used (for different use-cases (freezing a large chunk of memory in-place vs freeze-loading a single value, for simple examples)), but semantically both would be defined in terms of the new AM operation. x86_64 ASM example(Not the most optimized, but gets the point across. Provided AS-IS with no warranty, etc, etc.)#[inline(never)]
pub fn freeze_in_place(mem: &mut [MaybeUninit<u8>]) -> &mut [u8] {
let len: usize = mem.len();
if len > 0 {
let ptr: *mut MaybeUninit<u8> = mem.as_mut_ptr();
// touch the first byte of the slice, then
// touch the first byte in each subsequent page that
// contains a byte of the slice.
unsafe {
core::arch::asm!(
"2:",
"mov {tmp:l}, BYTE PTR [{ptr}]",
"mov BYTE PTR [{ptr}], {tmp:l}",
"mov {tmpptr}, {ptr}",
"and {tmpptr}, -4096", // addr of first byte of this page
"add {tmpptr}, 4096", // addr of first byte of next page
"mov {tmplen}, {tmpptr}",
"sub {tmplen}, {ptr}",
// tmplen is the number of bytes we semantically froze
// with the above `mov BYTE PTR ...`.
// if this is >= the remaining length of the slice, we're done
"cmp {tmplen}, {len}",
"jae 3f",
// otherwise, keep going
"sub {len}, {tmplen}",
"add {ptr}, {tmplen}",
"jmp 2b",
"3:",
ptr = inout(reg) ptr => _,
len = inout(reg) len => _,
tmpptr = out(reg) _,
tmplen = out(reg) _,
tmp = out(reg) _,
);
}
}
// Safety: either the slice is empty, or we touched every page containing the slice
unsafe {
&mut *(mem as *mut [_] as *mut [u8])
}
} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think I totally understand the performance considerations here. What is an "in place" copy? Why is it more expensive than a move? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not more expensive than a move (except as to llvm having more freedom to elide it), but I'm guessing a move looks more like it's "Doing something", whereas a user might assume that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just rewording how I interpreted this, to verify I'm understanding correctly: since virtual memory may be uninitialised but still copy-on-write (for example, from a forked process), there are cases where a copy would still happen even if it's not explicit in the in-place signature, and so it's probably easier to just treat as a read anyway? This does feel like a case of, optimising for the common case even if it's not always the best, and it might be worth elaborating more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean, I personally think that "user isn't capable of reading documentation" is a compelling case for advocating for a particular API. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Putting on my Project Safe Transmute hat, I'd like to advocate for alternatively or additionally providing an in-place intrinsic. Libraries like That said, the safety considerations of implementing a general, in-place |
||
* If intermediate representations are cooperative, it may be beneficial to provide the operation in the future, as it could perform only the writes required to ensure the backing memory is in a stable state (such as 1 write every 4096 bytes) | ||
* `MaybeUninit<Int>` (and maybe `MaybeUninit<Float>`) could have arithmetic that is defined as `uninit (op) x` or `x (op) uninit` is `uninit`. | ||
* While this can partially solve the second use case (by following it through `Simd`) and the third use case, this does not help the first use case | ||
* This RFC does not preclude these operations from being defined in the future, and may even make them more useful. | ||
* `MaybeUninit::freeze` could instead be `pub unsafe fn freeze(self) -> T`, like `assume_init`. | ||
* While this would allow `mu.freeze().assume_init()` to be written in fewer lines of code, maintaining the value as `MaybeUninit` may make some code possible using careful offsetting and `MaybeUninit::as_ptr()`/`MaybeUninit::as_mut_ptr()`. | ||
* Most uses of `MaybeUninit` would require immediately using `.assume_init()` on the result, however, this is a potential footgun if `T` has any invalid initialized values, or is a user-defined type with a complex safety invariant. It is hoped that the extra verbosity, on top of helpful documentation, will allow intermediate, advanced, or expert unsafe code users making use of `MaybeUninit::freeze` to recognize this potential footgun and to carefully validate the types involved. | ||
|
||
# Prior art | ||
[prior-art]: #prior-art | ||
|
||
[LLVM](https://llvm.org/docs/LangRef.html#freeze-instruction) supports freeze by-value. | ||
|
||
|
||
* GCC may support an equivalent operation via the `SAVE_EXPR` GIMPLE code. | ||
* See <https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/GCC.20and.20freeze> | ||
|
||
|
||
# Unresolved questions | ||
[unresolved-questions]: #unresolved-questions | ||
|
||
* Which of the library functions should recieve the direct language intrinsic, between `ptr::read_freeze` and `MaybeUninit::freeze` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That was accidental (only There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. I don't want any more directly exposed intrinsics. |
||
* Should the `ptr::read_freeze` and `MaybeUninit::freeze` functions be `const` | ||
|
||
# Future possibilities | ||
[future-possibilities]: #future-possibilities | ||
|
||
* With the project-portable-simd, the `Simd` type could support `Simd::load_partial<T, const N: usize>(x: *const T) -> Simd<[MaybeUninit<T>;N]>` (signature to be bikeshed) which could then be frozen lanewise into `Simd<[T;N]>`. With proper design (which is not the subject of this RFC), this could allow optimized loads at allocation boundaries by allowing operations that may physically perform an out-of-bounds read, but instead logically returns uninit for the out-of-bounds portion. This can be used to write an optimized implementation of `memchr`, `strchr`, or `strlen`, or even optimize `UTF-8` encoding and processing. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This relies on a hypothetical new LLVM operation that does a load where out-of-bounds memory is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or an appropriate masked/partial load that doesn't logically perform the problematic reads at the llvm level. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The text does not indicate a way for portable-simd to have the information to produce such a mask. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: use an expository signature of Simd::read_select(
source: *const [T],
enable: Mask<isize, N>,
) -> Simd<MaybeUninit<T>, N> This is deliberately imitating the signature of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
* `project-safe-transmute` could, in the future, offer some form of `MaybeUninit::assume_init_freeze` that statically checks for all-init-values being valid | ||
* Until such time as this becomes an option, this functionality could be provided by an external crate, such as the [bytemuck crate](https://lib.rs/bytemuck) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to rely on us being able to map LLVM
poison
to Rust "uninit", which is currently not possible.Same for the next bullet point.