RFC: Interfacing LLVM's "atomic volatile" in Rust #212

HadrienG2 · 2019-10-16T18:42:56Z

Basically my proposal to move forward on volatile support in Rust, by interfacing LLVM's atomic volatile which I consider to be a better match for volatile's intended "defer to hardware memory model" semantics than today's ptr::[read|write]_volatile in every single respect.

Emerges from the #152 discussion, and a step toward resolving it, but will not fully resolve that problem because as far as I know, that's impossible without LLVM involvement.

Having this RFC reviewed here is intended as a way to weed out obvious problems in a small and focused group before going for the broader audience of a pre-RFC on IRLO.

Rendered

gnzlbg

This reads really nicely. Good job!

rfcs/0000-atomic-volatile.md

Emphasize why load/store tearing is bad. Co-Authored-By: gnzlbg <[email protected]>

Typographical fix Co-Authored-By: gnzlbg <[email protected]>

Amanieu · 2019-10-17T16:46:02Z

In the case of shared memory, there is no reason why volatile accesses should be limited to primitive types. In fact, it is very common to want to read/write an entire struct in the shared memory. With this API you would have to read/write each field of the struct individually, which makes the code more complicated for no benefit.

Shared memory tends to have two components:

Control signals, which acts as a sort of "lock" and indicate if a process is currently reading/writing to a portion of the shared memory. This is generally done using atomic types such as AtomicUsize. Note that there is no need for volatile atomics here, plain atomics are fine.
Data, access to which is governed by the control signals. If the memory is shared between trusted processes then we just use plain loads and stores since we "know" that the other process isn't writing to data that we may be concurrently reading (or vice versa). However if the processes are untrusted then we have to use volatile accesses in case the other process is concurrently writing to data that we are accessing (whether reading or writing). Volatile writes allow us to "fire and forget": it's the other process's problem if it corrupts the data we are sending it. Volatile reads allow us to read a "snapshot" of the data in shared memory, which we can then validate before processing it.

At no point do we ever need volatile atomics. In fact, volatile atomics would only be useful for theoretical MMIO which must be interacted with atomic instructions instead of normal loads and stores, but AFAIK no such hardware exists.

Also, memory orderings don't really make sense in the context of MMIO. It is sometimes necessary to ensure ordering of writes to MMIO, but the fence instructions emitted by atomics will only ensure synchronization with other CPU cores. A specialized set of fence instructions are needed to ensure synchronization with other memory-mapped hardware (on ARM this is the difference between DMB ISH and DMB SY).

Amanieu · 2019-10-17T17:10:41Z

rfcs/0000-atomic-volatile.md

+
+1. Concurrent use of volatile memory operations on a given memory location is
+   considered to be Undefined Behavior, and may therefore result in unintended
+   compilation output if detected by the optimizer.


I disagree with this definition of volatile accesses: data races with volatile operations do not create data races (assuming both sides of the concurrent access are volatile), it just produces garbage data (i.e. whatever happened to be in that memory location at the time). This is what is meant by volatiles acting just like a load or store instruction.

I'll accept that my interpretation is controversial, but this is how compilers implement volatile because it matches how all of the existing C/C++ code uses volatile.

This is not a definition, it just documents that we currently do not guarantee the behavior of programs where this happens, which is true, because our read_volatile/write_volatile functions are implemented on top of LLVM, and LLVM does not provide this guarantee.

I think everybody agrees that everything would be much better if LLVM were to guarantee this, but we opened a documentation bug and it doesn't look like LLVM will be guaranteeing this any time soon: https://bugs.llvm.org/show_bug.cgi?id=42435

In fact, LLVM actively warn that they consider concurrent volatile accesses to be data races in their atomics documentation:

The rule is essentially that all memory accessed with basic loads and stores by multiple threads should be protected by a lock or other synchronization; otherwise, you are likely to run into undefined behavior. If your frontend is for a “safe” language like Java, use Unordered to load and store any shared variable. Note that NotAtomic volatile loads and stores are not properly atomic; do not try to use them as a substitute. (Per the C/C++ standards, volatile does provide some limited guarantees around asynchronous signals, but atomics are generally a better solution.)

I'll accept that my interpretation is controversial, but this is how compilers implement volatile because it matches how all of the existing C/C++ code uses volatile.

We tried to convince LLVM to make this their official docs, and they refused. So, while what you say is true for current LLVM, this may change any time and LLVM does not seem willing to commit to a proper guarantee here. Until LLVM changes their stanza, our hands are bound.

rfcs/0000-atomic-volatile.md

asajeffrey · 2019-10-17T17:42:05Z

cc @elast0ny

HadrienG2 · 2019-10-17T18:57:38Z

In the case of shared memory, there is no reason why volatile accesses should be limited to primitive types. In fact, it is very common to want to read/write an entire struct in the shared memory. With this API you would have to read/write each field of the struct individually, which makes the code more complicated for no benefit.

This is definitely a case where the MMIO and shared memory use cases are at odds with each other. In principle, the shared memory use case can be handled on top of primitives for MMIO (guaranteed absence of tearing) by building a memcpy operation on top of those. But in practice, I could accept this as a motivation for keeping ptr::[read|write]_volatile around.

Shared memory tends to have two components:

Control signals, which acts as a sort of "lock" and indicate if a process is currently reading/writing to a portion of the shared memory. This is generally done using atomic types such as AtomicUsize. Note that there is no need for volatile atomics here, plain atomics are fine.

I believe that this assertion is not generally true.

The single most central assumption of the Rust Abstract Machine, which almost every single compiler optimization relies on for correctness, is that the contents of memory can only change when the program allows it to.

Atomics are not exempt from this rule. They simply ask rustc to consider the possibility of interaction between threads instead of restricting the analysis to the current thread. But even with atomics, it should be okay for rustc to optimize this program...

use std::sync::atomic::{AtomicBool, Ordering};

static x: AtomicBool = AtomicBool::new(true);

fn main() {
    x.store(true, Ordering::Relaxed);
    println!("{}", x.load(Ordering::Relaxed));
}

...into this one...

use std::sync::atomic::AtomicBool;

static x: AtomicBool = AtomicBool::new(true);

fn main() {
    println!("{}", true);
}

...and indeed, rustc already performs this sort of optimizations today when presented with a very simple program like the one above. For sure, today, the optimizations will go away as soon as your program becomes just a tad bit more complex. But there's no fundamental reason why that is the case, it's just a limitation of the optimizer's current program analysis capabilities.

Future-proofing against this is why we need atomic volatile operations in shared memory scenarios. To tell the compiler that its knowledge of memory accesses in a certain region, and of their visibility to the outside world, is not complete, even if there's no obvious side-effect going on, and that a conservative stance towards optimization must be adopted. Because what rustc's optimizer does on toy programs today, it may do on non-toy programs in the future...

Data, access to which is governed by the control signals. If the memory is shared between trusted processes then we just use plain loads and stores since we "know" that the other process isn't writing to data that we may be concurrently reading (or vice versa). However if the processes are untrusted then we have to use volatile accesses in case the other process is concurrently writing to data that we are accessing (whether reading or writing). Volatile writes allow us to "fire and forget": it's the other process's problem if it corrupts the data we are sending it. Volatile reads allow us to read a "snapshot" of the data in shared memory, which we can then validate before processing it.

Using plain loads and stores for interprocess communication is subtly incorrect for the same reason that using non-volatile atomics is:

Because it works under an assumption of omniscient knowledge of memory accesses, the compiler is allowed to optimize out repeated reads from the same region of memory, under the assumption that the current process is the only entity in the world that can write to said memory.
Similarly, the compiler is allowed to optimize out repeated writes into the same region of memory, under the assumption that only the last of said writes may be externally observed.

Unless you get lucky with the aforementioned poor ability of rustc and LLVM to optimize atomics-based code, these effects are likely to break shared memory communication based on plain loads and stores. The underlying issue is that the compiler's model of an isolated address space is incorrect in the case of shared memory, and must be amended by disabling the kind of optimization described above through use of volatile.

As @RalfJung pointed out in the context of seqlocks, using volatile memory accesses in an attempt to turn data races into well-defined behavior is a hack which isn't guaranteed to continue working in the future by either the current Rust definition of undefined behavior or LLVM's current implementation of non-atomic volatile. Which is why LLVM has atomic volatile, and I think Rust should have it too.

At no point do we ever need volatile atomics. In fact, volatile atomics would only be useful for theoretical MMIO which must be interacted with atomic instructions instead of normal loads and stores, but AFAIK no such hardware exists.

As explained by this RFC, volatile atomics bring two very useful properties with respect to current volatile operations even if the only thing that one uses them for are Relaxed loads and stores:

They have well-defined data race semantics, just like hardware loads and stores. This property is something which everyone expects of volatile, but which non-atomic volatile doesn't actually guarantee. It is more relevant to the shared memory use case than it is to the MMIO use case.
They are guaranteed not to be translated into multiple hardware instructions. Trying to do a volatile load which is too large for hardware is a compilation error. This latter property is very useful for catching MMIO errors at compile time.

Also, memory orderings don't really make sense in the context of MMIO. It is sometimes necessary to ensure ordering of writes to MMIO, but the fence instructions emitted by atomics will only ensure synchronization with other CPU cores. A specialized set of fence instructions are needed to ensure synchronization with other memory-mapped hardware (on ARM this is the difference between DMB ISH and DMB SY).

Indeed, orderings stronger than Relaxed are not generally enough for MMIO. As you correctly point out, different synchronization mechanisms are required by some CPU architectures when communicating with memory-mapped hardware. Where the stronger orderings become useful is in the shared memory use case, where as previously explained using non-volatile atomic operations are not strong enough, and at risk of breaking in the future as rustc's optimizer becomes smarter about whole-program analysis.

RalfJung · 2019-10-17T19:30:29Z

This is definitely a case where the MMIO and shared memory use cases are at odds with each other. In principle, the shared memory use case can be handled on top of primitives for MMIO (guaranteed absence of tearing) by building a memcpy operation on top of those. But in practice, I could accept this as a motivation for keeping ptr::[read|write]_volatile around.

IMO ptr::[read|write]_volatile should be implemented via "elementwise volatile copy", similar to rust-lang/compiler-builtins#311.

This avoids relying on LLVM to do the tearing for us, meaning we can quite clearly document what is happening.

As @RalfJung pointed out in the context of seqlocks, using volatile memory accesses in an attempt to turn data races into well-defined behavior is a hack which isn't guaranteed to continue working in the future by either the current Rust definition of undefined behavior or LLVM's current implementation of non-atomic volatile. Which is why LLVM has atomic volatile, and I think Rust should have it too.

I don't agree with the "which is why" part here. seqlocks shouldn't use any volatile, period -- that's what I said above, so I don't know how that got construed into an argument in support of volatile. ;)

Atomic volatile are useful for shared-memory concurrency, to answer questions like "what happens on a data race". They are more well-behaved that non-atomic volatile because their behavior is better specified. I'd love if we could get rid of non-atomic volatile as that would side-step a lot of torny questions, but alas, some hardware disagrees... but I feel it would be fine for such hardware to have to jump through hoops and not use the normal convenient APIs.

HadrienG2 · 2019-10-17T20:25:21Z

As @RalfJung pointed out in the context of seqlocks, using volatile memory accesses in an attempt to turn data races into well-defined behavior is a hack which isn't guaranteed to continue working in the future by either the current Rust definition of undefined behavior or LLVM's current implementation of non-atomic volatile. Which is why LLVM has atomic volatile, and I think Rust should have it too.

I don't agree with the "which is why" part here. seqlocks shouldn't use any volatile, period -- that's what I said above, so I don't know how that got construed into an argument in support of volatile. ;)

Bad wording, I suspect. This was intended to respond to @Amanieu's repeated assertions that using volatile to handle data races is okay because it would just return garbage torn data.

Volatile and atomic are orthogonal. You need volatile when your memory accesses are visible to someone else (or conversely you are seeing someone else's memory accesses), and you need atomics when one thread is racing with another. If you are racing with a foreign process in shared memory, you need both. Ergo, atomic volatile is useful.

I'd love if we could get rid of non-atomic volatile as that would side-step a lot of torny questions, but alas, some hardware disagrees... but I feel it would be fine for such hardware to have to jump through hoops and not use the normal convenient APIs.

You may like the section where I mention that we could actually support more hardware with atomic volatile if we expanded our model of atomic operations beyond global cache coherence (aka Relaxed).

Amanieu · 2019-10-17T23:01:20Z

The single most central assumption of the Rust Abstract Machine, which almost every single compiler optimization relies on for correctness, is that the contents of memory can only change when the program allows it to.

This doesn't sound right. If that were true then any memory that interacts with a system call would need to be accessed with volatile, because anything that happens in the kernel is outside the Rust Abstract Machine. Consider the case of a buffer being filled from a file: the compiler clearly cannot assume that the contents of the buffer remain the same across read calls, even though there is nothing in the program modifying the contents of that buffer.

But even with atomics, it should be okay for rustc to optimize this program...

rustc can optimize this program because it can prove that the address of x is never escapes the current program. This would not be the case if the AtomicBool came from a shared memory region that was allocated with a system call. In that situation the compiler cannot assume that it has exclusive access to the memory region and therefore cannot optimize the store away.

Because it works under an assumption of omniscient knowledge of memory accesses, the compiler is allowed to optimize out repeated reads from the same region of memory, under the assumption that the current process is the only entity in the world that can write to said memory.

Similarly, the compiler is allowed to optimize out repeated writes into the same region of memory, under the assumption that only the last of said writes may be externally observed.

Again, you are misunderstanding the reason the compiler is allowed to perform these optimizations. This is because there is no synchronization operation (i.e. an atomic operation or a fence) between the loads/stores, so it is allowed to assume the memory contents have not changed. However any shared data structure (whether intra-process or in shared memory) will use atomic operations on the control signals to control access to data, which prevents merging the loads/stores across synchronization operations.

comex · 2019-11-23T03:01:00Z

Now, I will readily admit that I do not know enough about compiler optimizers to precisely evaluate whether the above wish is merely ridiculous or downright insane.

Not impossible, but likely very unreliable in practice, at least in the general case where you have some arbitrary data structures containing Arcs somewhere in them. It would only take one function call the compiler couldn't track (e.g. to a trait object), where an Arc field is transitively reachable from any parameter, to kill the optimization – for all objects which are ever assigned to that field, or which the compiler can't prove aren't (so in practice, probably all objects of the same type).

In the future, it might be possible to relax this constraint with suitable annotations on function signatures, either manual or (preferably) automatically inserted by the compiler during analysis.
This will never work in the presence of annotation-free opaque FFI, like syscalls, where the compiler must assume the worst... unless we can somehow bound what FFI is allowed to do without requesting special permission, much like LLVM bounds what volatile is allowed to do even if it's supposed to be a "defer to hardware semantics and all bets are off" optimization barrier.

First, I don't know why you'd be passing an Arc, or a pointer to a structure containing one, to a system call in the first place.

Second, I don't see why you'd want the default for FFI functions to be more conservative than the default for non-FFI functions. In your scenario, if you had a Rust function where:

a parameter was an Arc, or a reference to a structure containing one, etc.;
the compiler (for whatever reason) couldn't automatically insert a "single-threaded only" annotation, e.g. due to trait object dynamism; and
the user didn't manually add such an annotation;

the compiler would have to assume that it was multithreaded. The same should apply to FFI functions with pointer parameters. If anything, FFI code is more likely to do "weird things" than Rust code, not less.

There could of course be unsafe annotations for FFI functions to make assumptions about how they behave: not just single-threadedness, but stronger constraints like "it won't keep using this pointer after it's done executing". But they shouldn't be the default.

HadrienG2 · 2019-11-23T09:30:53Z

Not impossible, but likely very unreliable in practice, at least in the general case where you have some arbitrary data structures containing Arcs somewhere in them. It would only take one function call the compiler couldn't track (e.g. to a trait object), where an Arc field is transitively reachable from any parameter, to kill the optimization – for all objects which are ever assigned to that field, or which the compiler can't prove aren't (so in practice, probably all objects of the same type).

Indeed, making this work reliably on trait objects would probably require either of...

Manual annotations which forbid implementations of a trait to do certain things.
Semi-automated proof that all implementations of a trait are visible and don't do these things (would probably require some sort of language-level sealed trait support).
Compiler proof that only certain implementations of a trait with the right properties may be used as a trait object in this particular part of the code (seems close to the devirtualization problem in C++, which doesn't bode well for practical applicability).

I guess of those three, a compiler understanding of sealed traits would be option that's closest to being actually feasible in the medium term... and it definitely wouldn't resolve the general problem, but only solve the problem for sealed traits.

First, I don't know why you'd be passing an Arc, or a pointer to a structure containing one, to a system call in the first place.

For Arc I'm not sure if I can think of a good example, but for the others I could totally see myself dumping a struct with an AtomicXyz or Mutex member to some kind of I/O interface...

There could of course be unsafe annotations for FFI functions to make assumptions about how they behave: not just single-threadedness, but stronger constraints like "it won't keep using this pointer after it's done executing". But they shouldn't be the default.

Fair enough, that's also what I remembered the #215 conclusion was. If we accept it, then we go back to the "we don't really have a use for atomic volatile beyond atomic loads and stores" starting point above. To answer your underlying question about why I find this conclusion somewhat unsatisfying (but probably not so unsatisfying as to feel like relitigating it too much)...

Second, I don't see why you'd want the default for FFI functions to be more conservative than the default for non-FFI functions.

FFI is special in how opaque it is. With Rust code, as it's currently compiled at least, the compiler has a lot of freedom to look through the code of the crate that is being compiled and its dependencies and to infer many useful properties from it, whereas with FFI we only get to do that in very specific compilation scenarios (cross-language LTO) and at a very low level (LLVM only).

This means that in the FFI case, how we set the default interface contract is very important, because often it's the only interface contract we're going to get. Unlike in Rust, where the compiler can potentially infer a lot more than what the developer told him in the code.

You could reasonably argue, however, that writing FFI bindings is already so much work that it wouldn't hurt too much to ask performance-conscious people to add "may not do X or Y" manual annotations to the FFI bindings which are used in the hot parts of their code.

(Note also that there is some argument in there for being careful if we ever try to stabilize some form of Rust ABI for e.g. dynamic linking, as it will have mostly the same problems as C FFI if done in the most straightforward way of mapping functions to ELF/PE symbols without any extra metadata.)

By the way, if I didn't manage to convince you that i'm insane yet...

There could of course be unsafe annotations for FFI functions to make assumptions about how they behave: not just single-threadedness, but stronger constraints like "it won't keep using this pointer after it's done executing". But they shouldn't be the default.

Would this allow things like excluding memory which could not have possibly been touched by any other thread from Acquire and Release barriers, occasionally to the point of eliding those barriers altogether if they can't possibly have an observable effect on program behavior? 😋

That intuitively seems even more far-fetched than eliding atomics in single-threaded code, but if doable, it could be a way to resolve the longstanding Great Atomics Ordering expert debate by making the "safe choice" of using SeqCst no more costly than the "explicit/fast choice" of using the simplest Ordering that fits the application's constraints.

rfcs/0000-atomic-volatile.md

HadrienG2 · 2019-12-08T16:47:38Z

Alright, I'm finally done giving this a good read and updating it with respect to the outcomes of the recent discussion with @RalfJung . In the end, the magnitude of the changes made this pretty much a full rewrite, which together with shortage of my volunteer time is why it took so much time.

Things that I'm still wondering about:

Is VolatileBool actually a good idea? (see above) => No, made it a future extension
Are the proposed atomic ordering semantics weak enough to map well to GPUs and VMs?
How to deal with hardware memory accesses that are race-free in the "no UB" sense but can tear, like SIMD loads and stores?.
Are VolatileXyz wrappers the best API for this, or should it be extension traits for pointer types?
Where will the ongoing dereferenceable discussions take this proposal's API?

To answer the latter question, I could use the opinion of people who implemented compiler backends for those kind of targets. Any suggestion of a good place to ask the question?

RalfJung · 2019-12-26T15:30:22Z

To answer the latter question, I could use the opinion of people who implemented compiler backends for those kind of targets. Any suggestion of a good place to ask the question?

I'd start on IRLO.

comex · 2019-12-27T08:04:36Z

(Replying to @RalfJung's comment which GitHub is now hiding under a "show resolved" button...)

I could also totally imagine an MMIO situation where a concurrent data structure (say, a concurrent ring buffer) is shared between the CPU and an external device, and atomic MMIO is needed for coordination between the two. Or is that not needed because CPUs treat MMIO specially and the caching effects (that usually mandate synchronization and fences even on the CPU level) don't show up there?

Overly long answer:

Yeah, it’s common to have a sequence like:

Write some data to a buffer in RAM
Write to an MMIO register to tell a device to start the operation
Device starts reading from that buffer using DMA

You often do need some kind of synchronization between steps 1 and 2, but it’s dependent on the hardware and how you’ve configured it. It might be any of:

Nothing at all (x86, maybe, sometimes)
The same type of memory barrier used for inter-CPU synchronization
A stronger type of barrier (e.g. ARM’s Inner Shareable vs. Outer Shareable distinction)
Flushing the cache (on systems where DMA ignores the cache and accesses the underlying RAM directly)

Traditionally the synchronization is done as a separate operation; prepping for DMA is usually considered a more heavyweight activity that doesn’t need to be optimized as tightly as atomics. But, for example, ARMv8 has a dedicated store-release instruction, and AFAIK it can provide the necessary guarantees for this, depending on how the memory is configured. In other words, it’s a use case where ‘volatile atomic release store’ would have meaningful semantics, and where you can’t get the same instruction sequence a different way (as opposed to architectures where the assembly for store-release is just a barrier followed by a normal store). It’s just… not really necessary and not usually done. But it’s at least plausible that some future system would have a stronger demand for it.

HadrienG2 · 2020-01-04T08:48:52Z

I opened an IRLO thread to discuss the remaining issue of applicability of unordered atomics to super-weak memory models (language virtual machines, GPUs...)

HadrienG2 · 2020-01-04T21:13:48Z

New concern from the GPU discussion, but which is also relevant on x86 IIRC: some hardware architectures define SIMD loads and stores to be element-wise atomic, not atomic as a whole. If I'm aiming for total deprecation of ptr::volatile_[read|write] as the end goal, I need to figure out how to handle this in the proposed formalism. Maybe I should take some inspiration from LLVM's element-wise atomic memcpy here...

Diggsey · 2020-01-05T00:22:55Z

some hardware architectures define SIMD loads and stores to be element-wise atomic, not atomic as a whole.

This is what we were discussing here too: #209

HadrienG2 · 2020-01-05T15:35:20Z

Ah, yes, thank you for reminding me about this topic and linking to it!

Reading through it again gives me the impression that volatile SIMD loads and stores are a rather tricky beast, because...

They are not, and cannot be, available in atomic form at the backend level (it wouldn't make sense, since they can tear), only in non-atomic volatile form.
The most frequently discussed fix for exposing their element-wise atomic nature at the language level, which is to handle them as element-wise unordered atomic memcpys and expect LLVM to optimize the memcpy into a SIMD load, does not work for volatile accesses.
- At the LLVM level, the @llvm.memcpy.element.unordered.atomic intrinsic does not have an isvolatile parameter like the base @llvm.memcpy intrinsic.
- Even if it did have a volatile variant, optimization of e.g. *const u8 element-wise atomic memcpys into SIMD loads is the opposite of everything that volatile stands for.

As far as I can tell, our options here are to...

Keep non-atomic volatile read/write operations around and use them for SIMD types, since we don't have anything better available. Hopefully they'll keep compiling into something sensible even in multi-threaded environments, as they do today even if no written contract mandates that.
Implement volatile SIMD load/stores in terms of a loop of other atomic volatile operations, knowing that the generated code will be much slower than a hardware SIMD load or store operation and that it goes against the user desire of generating the right load/store instruction.
Somehow integrate the hardware notion of "well-defined data race that can tear" into the language memory model, as an alternative to atomics for well-defined synchronization, and use that for SIMD volatile loads & stores.
Not support volatile SIMD loads and stores at all.

HadrienG2 · 2020-01-06T20:45:49Z

So, I've been having a very interesting chat with @mcy, who had some reservations about the currently proposed API. Here's my biased and incomplete summary.

First of all, the concerns:

At a conceptual level, this API feels wrong because it features volatile objects (if only as an abstract concept that can only be manipulated behind a pointer) rather than volatile accesses only (which are generally agreed to be the best mental model for volatile).
At a practical level, this API feels unpleasant because it doesn't deal very well with the classic MMIO scenario of having a whole *mut [u8] memory-mapped I/O range full of memory-mapped registers with a hardware-specific layout.

We felt that it should be possible to do better on these fronts by trying to stay closer to the original design of <*mut T>::[read|write]_volatile(), that VolatileT deviated a bit too much from this just for the sake of expressing that T can be read and written atomically.

With this in mind, a possible alternative API design would be to use a trait to express volatile operations...

// This trait is unsafe to implement, because implementors must guarantee that the
// operations are indeed atomic and volatile, with semantics as described by this RFC.
unsafe trait VolatileLoadStore {
    // This operation is unsafe to use because the pointer might not be valid
    unsafe fn load_volatile(self: *const Self) -> Self;

    // This operation is unsafe to use because the pointer might not be valid
    unsafe fn store_volatile(self: *mut Self, val: Self);
}

...and then we would implement that trait for every type which has native machine load and store operations.

Speaking personally, I do like this design direction better than the current design, but that's another major RFC rewrite so I wanted to test waters here before changing all the text.

Here's my quick biased evaluation.

Advantages:

Simpler abstractions, it's just vanilla use of traits without weird wrapper types.
Not based on NonNull, so messing around with pointers in MMIO regions is easier.
No notion of atomic objects, it's atomic accesses all the way down.
Trait-based formalism is more elegant than one VolatileXyz per Xyz machine type.

Drawbacks:

Naming conflict with ptr::[read|write]_volatile(), hard to find a good way to differentiate between the atomic version and the non-atomic version without making things confusing. @mcy was thinking about [read|write]_volatile_once(), but I'm not sure if this convention would be very clear outside of those few people who ever dealt with Linux's internal synchronization primitives. The above [load|store] vs [read|write] proposal is not great either.
Strong typing barrier between volatile and non-volatile accesses must now be provided by external crates instead of being a built-in core/std feature.

Neutral:

Does not resolve the arbitrary_self_type issue.
Does not resolve the dependency on raw pointers.
No impact on semantics.
Compatible with all envisioned extensions of the VolatileT concept.

HadrienG2 · 2020-02-27T07:09:44Z

While I do not have that much time to dedicate to this RFC at the moment, I just wanted to mention that the ongoing discussion about non-dereferenceable references is extremely relevant to its API design.

HadrienG2 · 2020-05-15T18:18:49Z

Alright, 2-3 months later I think it's fair to say that I will not manage to finish this one myself anytime soon, as life got busy with too many other things.

I hope that the notes which I left at the end of this PR thread will be enough to help anyone interested in pushing this forward into taking things over from where I left them. Otherwise, feel free to ask me any question about whatever seems unclear.

HadrienG2 and others added 5 commits October 15, 2019 06:11

First RFC draft for atomic volatile

43c10b0

Some cross-checking

e84c586

More cross-checking

3638bf9

One more argument against volatile ops on atomic

c46335e

Fine-tuning

9cfd45e

HadrienG2 changed the title ~~Interfacing LLVM's "atomic volatile" in Rust~~ RFC: Interfacing LLVM's "atomic volatile" in Rust Oct 16, 2019

HadrienG2 mentioned this pull request Oct 16, 2019

A possible path forward for padding bytes TimelyDataflow/abomonation#32

Open

gnzlbg reviewed Oct 17, 2019

View reviewed changes

HadrienG2 and others added 10 commits October 17, 2019 13:41

Update rfcs/0000-atomic-volatile.md

3b2984f

Emphasize why load/store tearing is bad. Co-Authored-By: gnzlbg <[email protected]>

Update rfcs/0000-atomic-volatile.md

e920abd

Typographical fix Co-Authored-By: gnzlbg <[email protected]>

Follow suggestions from the PR

c423313

Rewording

f0c2381

Typo fix

ca0c36a

Typo

e6bee95

Clarify the exotic hardware situation

dc55924

Typo fix

9b9783e

Propose using a different feature gate for not_atomic stuffé

effdd4b

Clarification

a195f48

Avoid first person

da02234

Amanieu reviewed Oct 17, 2019

View reviewed changes

rfcs/0000-atomic-volatile.md Outdated Show resolved Hide resolved

Amanieu reviewed Oct 17, 2019

View reviewed changes

rfcs/0000-atomic-volatile.md Show resolved Hide resolved

HadrienG2 added 2 commits October 17, 2019 22:08

Clarify the assumption of isolated memory

d32a410

Correct inaccurate section about atomic volatile being an LLVM thing

10fe05e

HadrienG2 added 11 commits November 23, 2019 13:45

Disambiguate shared-memory IPC from shared-memory concurrency

ff9d1f1

Talk about spills

693d6c1

Talk about barrier issues

9cb9328

Talk about limitations of volatile for virtual memory

bfcb98b

Weaken wording of how optimizations interact with volatile

b573dd4

Shorten sentence

b81738e

Do not deprecate old read/write_volatile methods

bab4645

Reorganize unresolved questions & future possibilities

10de5da

Remove ordering, rmw, make unordered, and restructure

e57bcf8

More atomics codegen horror stories

17ab617

Air + wording cleanup

8aa979f

HadrienG2 commented Dec 8, 2019

View reviewed changes

rfcs/0000-atomic-volatile.md Outdated Show resolved Hide resolved

HadrienG2 added 3 commits December 8, 2019 15:54

Link to P0062R1

9114251

Last read before pinging

79f9545

Also link to Box issue

71ebfcd

Turn VolatileBool into a future extension

bb9cefa

HadrienG2 closed this May 15, 2020

RalfJung mentioned this pull request Jul 23, 2020

compilation time (of volatile array read?) is super-quadratic in array size rust-lang/rust#74476

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Interfacing LLVM's "atomic volatile" in Rust #212

RFC: Interfacing LLVM's "atomic volatile" in Rust #212

HadrienG2 commented Oct 16, 2019 •

edited

Loading

gnzlbg left a comment

Amanieu commented Oct 17, 2019

Amanieu Oct 17, 2019

gnzlbg Oct 17, 2019

HadrienG2 Oct 17, 2019 •

edited

Loading

RalfJung Nov 20, 2019

asajeffrey commented Oct 17, 2019

HadrienG2 commented Oct 17, 2019 •

edited

Loading

RalfJung commented Oct 17, 2019 •

edited

Loading

HadrienG2 commented Oct 17, 2019 •

edited

Loading

Amanieu commented Oct 17, 2019

comex commented Nov 23, 2019

HadrienG2 commented Nov 23, 2019 •

edited

Loading

HadrienG2 commented Dec 8, 2019 •

edited

Loading

RalfJung commented Dec 26, 2019

comex commented Dec 27, 2019 •

edited

Loading

HadrienG2 commented Jan 4, 2020

HadrienG2 commented Jan 4, 2020 •

edited

Loading

Diggsey commented Jan 5, 2020

HadrienG2 commented Jan 5, 2020 •

edited

Loading

HadrienG2 commented Jan 6, 2020 •

edited

Loading

HadrienG2 commented Feb 27, 2020 •

edited

Loading

HadrienG2 commented May 15, 2020

RFC: Interfacing LLVM's "atomic volatile" in Rust #212

RFC: Interfacing LLVM's "atomic volatile" in Rust #212

Conversation

HadrienG2 commented Oct 16, 2019 • edited Loading

gnzlbg left a comment

Choose a reason for hiding this comment

Amanieu commented Oct 17, 2019

Amanieu Oct 17, 2019

Choose a reason for hiding this comment

gnzlbg Oct 17, 2019

Choose a reason for hiding this comment

HadrienG2 Oct 17, 2019 • edited Loading

Choose a reason for hiding this comment

RalfJung Nov 20, 2019

Choose a reason for hiding this comment

asajeffrey commented Oct 17, 2019

HadrienG2 commented Oct 17, 2019 • edited Loading

RalfJung commented Oct 17, 2019 • edited Loading

HadrienG2 commented Oct 17, 2019 • edited Loading

Amanieu commented Oct 17, 2019

comex commented Nov 23, 2019

HadrienG2 commented Nov 23, 2019 • edited Loading

HadrienG2 commented Dec 8, 2019 • edited Loading

RalfJung commented Dec 26, 2019

comex commented Dec 27, 2019 • edited Loading

HadrienG2 commented Jan 4, 2020

HadrienG2 commented Jan 4, 2020 • edited Loading

Diggsey commented Jan 5, 2020

HadrienG2 commented Jan 5, 2020 • edited Loading

HadrienG2 commented Jan 6, 2020 • edited Loading

HadrienG2 commented Feb 27, 2020 • edited Loading

HadrienG2 commented May 15, 2020

HadrienG2 commented Oct 16, 2019 •

edited

Loading

HadrienG2 Oct 17, 2019 •

edited

Loading

HadrienG2 commented Oct 17, 2019 •

edited

Loading

RalfJung commented Oct 17, 2019 •

edited

Loading

HadrienG2 commented Oct 17, 2019 •

edited

Loading

HadrienG2 commented Nov 23, 2019 •

edited

Loading

HadrienG2 commented Dec 8, 2019 •

edited

Loading

comex commented Dec 27, 2019 •

edited

Loading

HadrienG2 commented Jan 4, 2020 •

edited

Loading

HadrienG2 commented Jan 5, 2020 •

edited

Loading

HadrienG2 commented Jan 6, 2020 •

edited

Loading

HadrienG2 commented Feb 27, 2020 •

edited

Loading