Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix nonlocal unsoundness vs. atomic store #1

Merged
merged 1 commit into from
Aug 8, 2023

Conversation

workingjubilee
Copy link
Contributor

MOVNTI, MOVNTDQ, and friends weaken TSO when next to other stores. As most stores are not nontemporal, LLVM uses simple stores when lowering LLVMIR like atomic store ... release on x86. These facts could allow something like the following code to be emitted:

vmovntdq [addr],     ymmreg
vmovntdq [addr+N],   ymmreg
vmovntdq [addr+N*2], ymmreg
vmovntdq [addr+N*3], ymmreg
mov byte ptr [flag], 1 ; producer-consumer flag

But these stores are NOT ordered with respect to each other! Nontemporal stores induce the CPU to use write-combining buffers. These writes will be resolved in bursts instead of at once, and the write may be further deferred until a serialization point. Even a non-temporal write to any other location will not force the deferred writes to be resolved first. Thus, assuming cache-line-sized buffers of 64 bytes, the CPU may resolve these writes in e.g. this actual order:

vmovntdq [addr+N*2], ymmreg
vmovntdq [addr+N*3], ymmreg
mov byte ptr [flag], 1
vmovntdq [addr+N],   ymmreg
vmovntdq [addr],     ymmreg

This could e.g. result in other threads accessing this address after the flag is set, thus accessing memory via safe code that was assumed to be correctly synchronized. This could result in observing tearing or other inconsistent program states, especially as the number of writes, thus the number of write buffers that may begin retiring simultaneously, thus the chance of them resolving in an unfortunate order, increases. If using &mut [u8] to write uninitialized memory is permitted ( per rust-lang/unsafe-code-guidelines#346 ), it could even result in an access to &[u8] actually being reading uninitialized memory in safe code!

To guarantee program soundness, code using nontemporal stores must currently use SFENCE in its safety boundary, unless and until LLVM decides this combination of facts should be considered a miscompilation and motivation to choose lowerings that do not require explicit SFENCE.

For further information, see:

MOVNTI, MOVNTDQ, and friends weaken TSO when next to other stores. As
most stores are not nontemporal, LLVM uses simple stores when lowering
LLVMIR like `atomic store ... release` on x86. These facts could allow
something like the following code to be emitted:

```asm
vmovntdq [addr],     ymmreg
vmovntdq [addr+N],   ymmreg
vmovntdq [addr+N*2], ymmreg
vmovntdq [addr+N*3], ymmreg
mov byte ptr [flag], 1 ; producer-consumer flag
```

But these stores are NOT ordered with respect to each other! Nontemporal
stores induce the CPU to use write-combining buffers. These writes will
be resolved in bursts instead of at once, and the write may be further
deferred until a serialization point. Even a non-temporal write to any
other location will not force the deferred writes to be resolved first.
Thus, assuming cache-line-sized buffers of 64 bytes, the CPU may resolve
these writes in e.g. this actual order:

```asm
vmovntdq [addr+N*2], ymmreg
vmovntdq [addr+N*3], ymmreg
mov byte ptr [flag], 1
vmovntdq [addr+N],   ymmreg
vmovntdq [addr],     ymmreg
```

This could e.g. result in other threads accessing this address after the
flag is set, thus accessing memory via safe code that was assumed to be
correctly synchronized. This could result in observing tearing or other
inconsistent program states, especially as the number of writes, thus
the number of write buffers that may begin retiring simultaneously,
thus the chance of them resolving in an unfortunate order, increases.
If using `&mut [u8]` to write uninitialized memory is permitted ( per
rust-lang/unsafe-code-guidelines#346 ), it could even result in an access
to `&[u8]` actually being reading uninitialized memory in safe code!

To guarantee program soundness, code using nontemporal stores must
currently use SFENCE in its safety boundary, unless and until LLVM
decides this combination of facts should be considered a miscompilation
and motivation to choose lowerings that do not require explicit SFENCE.
@alexcrichton
Copy link
Owner

Thanks! Also some interesting conversation to follow :)

AFAIK no one's using this code in production anywhere, I just had fun writing it up at some point. Still good to keep correct (ish) though!

@alexcrichton alexcrichton merged commit 187c9c9 into alexcrichton:main Aug 8, 2023
4 of 5 checks passed
@workingjubilee workingjubilee deleted the use-sfence branch August 9, 2023 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants