Document movnt needs sfence #1457

workingjubilee · 2023-08-09T13:13:07Z

For every intrinsic that may generate any of the MOVNT family of instructions, specify it must be followed by _mm_sfence.

Also, ask people to not think too hard about what actually happens with write-combining memory buffers. They probably don't want to know, and in terms of the Rust abstract machine, we aren't actually entirely sure yet.

rustbot · 2023-08-09T13:13:12Z

r? @Amanieu

(rustbot has picked a reviewer for you, use r? to override)

workingjubilee · 2023-08-09T13:15:10Z

Also see:

Non-temporal stores (and _mm_stream operations in stdarch) break our memory model rust#114582
nontemporal stores behave incorrectly in their interaction with concurrency primitives llvm/llvm-project#64521

For every intrinsic that may generate any of the MOVNT family of instructions, specify it must be followed by `_mm_sfence`. Also, ask people to not think too hard about what actually happens with write-combining memory buffers. They probably don't want to know, and in terms of the Rust abstract machine, we aren't actually entirely sure yet.

the8472 · 2023-08-09T13:34:17Z

crates/core_arch/src/x86/avx.rs

+/// After using this intrinsic, but before any atomic operations occur, a call
+/// to `_mm_sfence()` must be performed. A safe function that includes unsafe
+/// usage of this intrinsic must always end in `_mm_sfence()`.


I think this is a bit too strict. There are niche scenarios where one would write out data without ever reading it again or at least without reading it again on another thread.
In those cases some later release write would be incidental and only meant to order other, regular writes.

Maybe the entire requirement could be conditional on "if the written memory is intended to be made accessible on another thread through a release operation", with the recommendation "if in doubt, add a fence".

I mean, I don't think it's coherent to leave an unsatisfied obligation hanging unless it's still in unsafe?

That depends on the perspective. Yes, it's an obligation to... restore consistency with the current rust memory model which assumes that all writes must be ordered with a release operation.
But under some unspecified extended model it may be valid to leave some memory locations unordered.
AIUI the purpose of ordering is to avoid data races which are UB. If the another thread never accesses the memory then this is unspecified behavior but not necessarily UB.

E.g. if you're writing bytes to a framebuffer that's concurrently being scanned out by the GPU then the fence doesn't add anything. You're not synchronizing with anything. You're just racing against time. Either the write makes out to the pixels or it doesn't.

I guess the conservative definition is fine for now. But it could be phrased in a way that makes it clear that it may be replaced with a more refined definition at some point.

I'd prefer to tell people to always discharge the obligation (even in the case you mention, the store-store fence would serialize any remaining deferred write-buffers, which is desired if you might run out of time otherwise!) and relax things when we mechanize a better spec.

RalfJung · 2023-08-10T11:58:47Z

crates/core_arch/src/x86/avx.rs

+///
+/// After using this intrinsic, but before any atomic operations occur, a call
+/// to `_mm_sfence()` must be performed. A safe function that includes unsafe
+/// usage of this intrinsic must always end in `_mm_sfence()`.


We are currently moving into a different direction in rust-lang/rust#114582: if we follow @talchas' approach, the rule will be: after using this intrinsic, but (happens-)before any other read or write of this location in any thread, a fence must occur (the fence must be in the same thread that called the intrinsic).

Basically, the write actually happens in another thread via a non-atomic store, so accessing this location may cause a data race. Doing a fence waits for that other thread to complete its store, avoiding the data race.

Hmm. I tried to keep the rationale for the requirement slightly vague as we discuss it, and approach it as more of a "do this or your program explodes into flames" type of warning, which is why it goes on to explain further reads or writes to the location are discouraged. So with that in mind, what is this missing?

I'd prefer if we'd call that UB instead of merely "discouraged".

Ideally even further nontemporal writes (before the next sfence) would be UB... I'd find it strange to have a situation where a nontemporal store would be allowed but a regular store would not.

I opened #1534 with my alternative wording.

workingjubilee · 2023-08-10T21:16:47Z

crates/core_arch/src/x86/avx.rs

+/// Reading and writing to the memory stored-to by any other means, after any
+/// nontemporal store has been used to write to that memory, but before the
+/// use of `_mm_sfence()`, is discouraged. Such reads can lead to pipeline
+/// stalls and yet-unspecified program behavior.


Hm. Well, both x86's movntdq and Arm's stnp retain local-thread serial ordering if you use them twice on the same location, i.e. that using _mm_stream_ps(ptr, a); _mm_stream_ps(ptr, b); still will yield b when you later read it, and coalescing such writes is traditionally allowed by compilers, so I would prefer to retain the more ambiguous verbiage I used here, @RalfJung. Mostly, I said "any other means" to cast a wide net so that people avoid it rather than inviting people to try to reason about in a way that might end in them deciding it's suddenly okay to perform an atomic store or atomic load on the memory from another thread.

I do agree that we should canonically allow _mm_stream_ps(ptr, a); _mm_store_ps(ptr, b); _mm_sfence(); eventually but I would prefer it live in an in-between state until we are clear on our mechanics.

The thing is that the proposed doc just doesn't align with the proposed spec in rust-lang/rust#114582. If you go with @talchas' original proposed spec, the doc should be something like

"After the _mm_stream_ps until the next time this thread calls _mm_sfence, any attempt to access (read or write) this memory from this thread by means other than _mm*_stream* operations is UB. Furthermore, any access from another threads must synchronize-with that _mm_sfence or else any access (read or write) from the other thread is UB, including _mm*_stream* operations.

Basically, the nontemporal writes performed by streaming operations should be considered not ordered even with other operations in the thread they appear in, except with other nontemporal writes, until the next sfence which establishes synchronization with all nontemporal writes of the current thread."

That would allow your example. Crucially it disallows _mm_stream_ps(ptr, a); *ptr = a; _mm_sfence(); and _mm_stream_ps(ptr, a); println!("{:?}", *ptr); _mm_sfence(); which is disallowed by the proposed spec. We should probably have examples in the docs.

ah, _mm_store_ps is morally equivalent to assignment.

Oh sorry, I misread your example and thought it said stream twice.

If we make _mm_store_ps also inline assembly we might be able to come up with a mechanism that allows it. But for regular assignment I'm quite worried that LLVM optimizations will make deductions from seeing an assignment that are incompatible with the nontemporal state of that location.

There is a myth that "you cannot read uninitialized memory" and I think it is harmful. I've encountered tons of confusion caused by people axiomatically thinking "I cannot read uninit memory". Instead we should teach people what actually happens in our spec: when reading memory at a certain type, that memory must be sufficiently "valid" for this type. If you read at bool, it must be 0 or 1. If you read at i32, it must be initialized. If you read at MaybeUninit<_>, it can be anything. I think telling people they can't read from uninit memory is a complete distraction and leads to an unnecessarily complicated mental model. There's a reason that "reading from uninit memory" does not show up in this list.

But anyway, it seems unlikely we will get to an agreement on terminology or teaching philosophy here and that's all rather off-topic anyway. The on-topic question is whether we should allow people to perform regular writes in between the nontemporal write and the fence. I don't see a good motivation for doing that, and it opens some tricky questions that we'd need to carefully figure out before allowing it. For instance, if I do a nontemporal write and then a regular write, do I even still need the fence or is it now guaranteed that the next "release" operation synchronizes everything properly?

Ehnn I mean yes, just...

...Anyways I think, dreadfully, to actually answer your question about the write series (i.e.

movnti [somewhere], something mov [somewhere], something mov [flag], 1

), that on x86 you kiiiinda still need the fence, from the ISA's point of view?

Hmm. The local results should be consistent absolutely, but I am concerned about code that may look more like moving overlapping sizes, so

vmovntdq [somewhere], ymm09 mov [somewhere], r09 mov [flag], 1

The first 8 bytes are guaranteed once the flag is set due to the two regular movs participating in TSO, but the vmovntdq means 24 bytes are in an ambiguous state.

do I even still need the fence or is it now guaranteed that the next "release" operation synchronizes everything properly?

If the answer to that is yes then I don't think we can allow regular writes. A regular write in Rust is guaranteed to be properly released by a release operation, after all.

We can have intrinsics that do "regular write to something that might be in the nontemporal state", but those need the same inline-assembly-and-Rust-replacement-code-spec treatment as streaming writes.

I'd really prefer if we didn't have to do this... the usecases we are aware of don't need this, do they?

Hmm, alright then, I didn't realize that was the specific way the sequencing invariants were formed. It's possible the reality may be amenable to regular writes (assuming a sort of "wholesale adoption of the x86 mechanics into Rust" approach for this case) but I'd have to examine the rules... very closely.

I don't think the majority of use cases need this in practice, correct, so now that we have hashed out that issue as existing, then I am happy to let this one go. I will update the documentation changes here accordingly.

workingjubilee · 2023-08-10T21:33:31Z

Will address the comments after the conversation settles a bit.

bors · 2024-02-21T00:55:34Z

☔ The latest upstream changes (presumably c8ae80e) made this pull request unmergeable. Please resolve the merge conflicts.

rustbot assigned Amanieu Aug 9, 2023

workingjubilee force-pushed the movnt-needs-sfence branch from 57c4239 to 1341b75 Compare August 9, 2023 13:27

workingjubilee force-pushed the movnt-needs-sfence branch from 1341b75 to dc9228d Compare August 9, 2023 13:28

the8472 reviewed Aug 9, 2023

View reviewed changes

RalfJung reviewed Aug 10, 2023

View reviewed changes

workingjubilee commented Aug 10, 2023

View reviewed changes

workingjubilee marked this pull request as draft August 10, 2023 21:33

RalfJung mentioned this pull request Feb 18, 2024

non-temporal stores: document interaction with Rust memory model #1534

Merged

workingjubilee closed this Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document movnt needs sfence #1457

Document movnt needs sfence #1457

workingjubilee commented Aug 9, 2023

rustbot commented Aug 9, 2023

workingjubilee commented Aug 9, 2023

the8472 Aug 9, 2023 •

edited

Loading

workingjubilee Aug 9, 2023

the8472 Aug 9, 2023

the8472 Aug 9, 2023

workingjubilee Aug 9, 2023

RalfJung Aug 10, 2023 •

edited

Loading

workingjubilee Aug 10, 2023

RalfJung Aug 10, 2023

RalfJung Feb 18, 2024

workingjubilee Aug 10, 2023

workingjubilee Aug 10, 2023

RalfJung Aug 11, 2023

workingjubilee Aug 11, 2023

RalfJung Aug 11, 2023 •

edited

Loading

RalfJung Aug 15, 2023 •

edited

Loading

workingjubilee Aug 18, 2023 •

edited

Loading

workingjubilee Aug 18, 2023

RalfJung Aug 19, 2023

workingjubilee Aug 20, 2023

workingjubilee commented Aug 10, 2023

bors commented Feb 21, 2024

Document movnt needs sfence #1457

Document movnt needs sfence #1457

Conversation

workingjubilee commented Aug 9, 2023

rustbot commented Aug 9, 2023

workingjubilee commented Aug 9, 2023

the8472 Aug 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RalfJung Aug 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RalfJung Aug 11, 2023 • edited Loading

Choose a reason for hiding this comment

RalfJung Aug 15, 2023 • edited Loading

Choose a reason for hiding this comment

workingjubilee Aug 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

workingjubilee commented Aug 10, 2023

bors commented Feb 21, 2024

the8472 Aug 9, 2023 •

edited

Loading

RalfJung Aug 10, 2023 •

edited

Loading

RalfJung Aug 11, 2023 •

edited

Loading

RalfJung Aug 15, 2023 •

edited

Loading

workingjubilee Aug 18, 2023 •

edited

Loading