core: disable `ptr::swap_nonoverlapping_one`'s block optimization on SPIR-V. #83019

eddyb · 2021-03-11T15:11:25Z

SPIR-V primarily supports what it calls the "Logical addressing model" (and AFAIK for graphical shaders it's the only option), and what that implies is that there is no "memory" to uniformly address at some byte/word level, and that you can't really talk about values having a "raw representation" in terms of sequences of bytes. Therefore, the "block"-wise swapping optimization employed by ptr::swap_nonoverlapping_one (where a "block" is 32 bytes, currently), is fundamentally incompatible with SPIR-V "memory".

As such, Rust-GPU's rustc_codegen_spirv backend cannot currently allow the use of ptr::swap_nonoverlapping_one - but that comes at a great price, since it's the building block of mem::{swap,replace}, and those in turn are used by e.g. Option::take and Range's Iterator implementation (the latter blocking the use of for i in 0..n loops).

There's 4 options I can see in terms of supporting ptr::swap_nonoverlapping_one in rustc_codegen_spirv:

legalize the block-wise swap loop back into swapping whole values, for SPIR-V
- this is made borderline impossible by the fact that the size of the state "on the stack" is a block, and has to be expanded back to the appropriate size of the value being swapped, so in practice this would have to effectively pattern-match on the exact shape of the block-wise swapping algorithm, as a roundabout way of "patching core::ptr on the fly"
(this PR) disable the block-wise swap optimization altogether when #[cfg(target_arch = "spirv")
- I've tested it and it does in fact allow compiling for i in 0..n loops, which was my primary motivation
- main downside IMO is the fact that core now acknowledges an out-of-tree backend
  - as a counterpoint, any attempt to compile Rust to SPIR-V would run into this problem, one way or another
only enable the block-wise swap optimization on targets where it's been empirically proven to be an improvement
- would avoid any surprises in terms of potentially-broken/inefficient codegen, in general
- however, it may be universally applicable (thanks to caches), even if the optimal block size could differ
move low-level swapping into an intrinsic, where the backend can choose any optimization approach it wants
- this also has an impact on MIR optimizations (cc @rust-lang/wg-mir-opt) - which currently cannot hope to make sense of e.g. Option::take despite it being effectively _0 = *_1; *_1 = None; return;
- long-term this is my preferred approach, and I can start working on it if that's desired, but I wanted to confirm that this swapping optimization is the final blocker for Rust-GPU supporting e.g. range for loops

r? @nagisa cc @rust-lang/libs

m-ou-se · 2021-03-11T17:59:20Z

since it's the building block of mem::{swap,replace}

I'm a bit surprised to see swap being used as part of mem::replace or mem::take. There doesn't really seem much use of a swap over a read + write in those functions. The swap makes the code short, but the result seems semantically more complex. If that inhibits some optimizations and also makes things hard for some targets, maybe we should just change that.

SimonSapin · 2021-03-11T18:13:16Z

I’m not saying we shouldn’t change that, but one reason for replace and take to be based on swap is that it centralizes the unsafe code in swap. Purely safe code cannot move !Copy values out of &mut.

m-ou-se · 2021-03-11T18:41:19Z

Purely safe code cannot move !Copy values out of &mut.

Well you can, with core::mem::replace. ;)

There has to be unsafe code/some intrinsic at some point. Implementing the simpler operation (replace) with the much more complicated operation (swap) doesn't make a whole lot of sense. replace is just read+write, and the primitive for moving out of a &mut. swap is for doing that to two &mut at the same time, which is both more niche and more complicated (as shown by swap_nonoverlapping_bytes).

SimonSapin · 2021-03-11T19:23:59Z

This complexity is not fundamental to swap. Before #40454 added optimizations for large size_of it was just three copies:

rust/src/libcore/mem.rs

Lines 260 to 274 in a59de37

    
           pub fn swap<T>(x: &mut T, y: &mut T) { 
        
               unsafe { 
        
                   // Give ourselves some scratch space to work with 
        
                   let mut t: T = uninitialized(); 
        
                   // Perform the swap, `&mut` pointers never alias 
        
                   ptr::copy_nonoverlapping(&*x, &mut t, 1); 
        
                   ptr::copy_nonoverlapping(&*y, x, 1); 
        
                   ptr::copy_nonoverlapping(&t, y, 1); 
        
                   // y and t now point to the same thing, but we need to completely forget `t` 
        
                   // because it's no longer relevant. 
        
                   forget(t); 
        
               } 
        
           }

replace based on ptr::read + ptr::write would also be three copies (unless the implicit copy from a local variable to the return value can be optimized away). I don’t know how well the optimizations from #40454 apply when swap is called by replace but they probably have some effect.

m-ou-se · 2021-03-11T19:43:51Z

A read+write mem::replace only calls copy_nonoverlapping twice, the basic swap implementation three times. The rest is moving an argument into a local variable, or moving a local variable into the return value, which happens in both cases.

(And now with the optimized swap version, the difference is a whole lot more significant.)

nagisa · 2021-03-12T13:54:03Z

@eddyb this is relevant regardless of #83022, correct? Even if replace adjustment fixed the motivating example you had, I imagine you'd still want swap to work when people write spv code, right?

eddyb · 2021-03-12T14:25:24Z

@eddyb this is relevant regardless of #83022, correct? Even if replace adjustment fixed the motivating example you had, I imagine you'd still want swap to work when people write spv code, right?

Correct, it's just that I'm not aware of mem::swap itself showing up before, it's always been through mem::replace, so if mem::replace works, then dealing with mem::swap is of lower-priority.

I still stand by the 4 options in the PR description, but now MIR optimizations on mem::replace (and by extension e.g. Option::take) are no longer a factor, once #83022 lands.

Don't implement mem::replace with mem::swap. `swap` is a complicated operation, so this changes the implementation of `replace` to use `read` and `write` instead. See rust-lang#83019. I wrote there: > Implementing the simpler operation (replace) with the much more complicated operation (swap) doesn't make a whole lot of sense. `replace` is just read+write, and the primitive for moving out of a `&mut`. `swap` is for doing that to *two* `&mut` at the same time, which is both more niche and more complicated (as shown by `swap_nonoverlapping_bytes`). This could be especially interesting for `Option<VeryLargeStruct>::take()`, since swapping such a large structure with `swap_nonoverlapping_bytes` is going to be much less efficient than `ptr::write()`'ing a `None`. But also for small values where `swap` just reads/writes using temporary variable, this makes a `replace` or `take` operation simpler: ![image](https://user-images.githubusercontent.com/783247/110839393-c7e6bd80-82a3-11eb-97b7-28acb14deffd.png)

bors · 2021-03-16T19:24:06Z

☔ The latest upstream changes (presumably #83199) made this pull request unmergeable. Please resolve the merge conflicts.

nagisa · 2021-03-16T19:25:52Z

I'm comfortable with this landing, in that case.

r=me after rebase.

nagisa · 2021-04-04T13:12:26Z

ping @eddyb ^

… changes).

…SPIR-V.

eddyb · 2021-04-04T19:31:23Z

Oops, lost track of this (after it wasn't a priority anymore).

@bors r=nagisa

bors · 2021-04-04T19:31:24Z

📌 Commit bc6af97 has been approved by nagisa

core: disable `ptr::swap_nonoverlapping_one`'s block optimization on SPIR-V. SPIR-V primarily supports what it calls the "Logical addressing model" (and AFAIK for graphical shaders it's the only option), and what that implies is that there is no "memory" to uniformly address at some byte/word level, and that you can't really talk about values having a "raw representation" in terms of sequences of bytes. Therefore, the "block"-wise swapping optimization employed by `ptr::swap_nonoverlapping_one` (where a "block" is 32 bytes, currently), is fundamentally incompatible with SPIR-V "memory". As such, [Rust-GPU](https://github.com/EmbarkStudios/rust-gpu/)'s `rustc_codegen_spirv` backend cannot currently allow the use of `ptr::swap_nonoverlapping_one` - but that comes at a great price, since it's the building block of `mem::{swap,replace}`, and those in turn are used by e.g. `Option::take` and `Range`'s `Iterator` implementation (the latter blocking the use of `for i in 0..n` loops). There's 4 options I can see in terms of supporting `ptr::swap_nonoverlapping_one` in `rustc_codegen_spirv`: * legalize the block-wise swap loop back into swapping whole values, for SPIR-V * this is made borderline impossible by the fact that the size of the state "on the stack" is a block, and has to be expanded back to the appropriate size of the value being swapped, so in practice this would have to effectively pattern-match on the exact shape of the block-wise swapping algorithm, as a roundabout way of "patching `core::ptr` on the fly" * (**this PR**) disable the block-wise swap optimization altogether when `#[cfg(target_arch = "spirv")` * I've tested it and it does in fact allow compiling `for i in 0..n` loops, which was my primary motivation * main downside IMO is the fact that `core` now acknowledges an out-of-tree backend * as a counterpoint, any attempt to compile Rust to SPIR-V would run into this problem, one way or another * only enable the block-wise swap optimization on targets where it's been empirically proven to be an improvement * would avoid any surprises in terms of potentially-broken/inefficient codegen, in general * however, it may be universally applicable (thanks to caches), even if the optimal block size could differ * move low-level swapping into an intrinsic, where the backend can choose any optimization approach it wants * this also has an impact on MIR optimizations (cc `@rust-lang/wg-mir-opt)` - which currently cannot hope to make sense of e.g. `Option::take` despite it being effectively `_0 = *_1;` `*_1 = None;` `return;` * long-term this is my preferred approach, and I can start working on it if that's desired, but I wanted to confirm that this swapping optimization is the final blocker for [Rust-GPU](https://github.com/EmbarkStudios/rust-gpu/) supporting e.g. range `for` loops r? `@nagisa` cc `@rust-lang/libs`

Rollup of 7 pull requests Successful merges: - rust-lang#80525 (wasm64 support) - rust-lang#83019 (core: disable `ptr::swap_nonoverlapping_one`'s block optimization on SPIR-V.) - rust-lang#83717 (rustdoc: Separate filter-empty-string out into its own function) - rust-lang#83807 (Tests: Remove redundant `ignore-tidy-linelength` annotations) - rust-lang#83815 (ptr::addr_of documentation improvements) - rust-lang#83820 (Remove attribute `#[link_args]`) - rust-lang#83841 (Allow clobbering unsupported registers in asm!) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup

rust-highfive assigned nagisa Mar 11, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 11, 2021

eddyb mentioned this pull request Mar 11, 2021

Bump rust-toolchain to nightly-2021-03-11. EmbarkStudios/rust-gpu#490

Closed

m-ou-se mentioned this pull request Mar 11, 2021

Don't implement mem::replace with mem::swap. #83022

Merged

eddyb mentioned this pull request Mar 15, 2021

Unlock for i in 0..n loops! (by bumping rust-toolchain to nightly-2021-03-17) EmbarkStudios/rust-gpu#493

Merged

nagisa added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 17, 2021

eddyb added 2 commits April 4, 2021 22:26

core: rearrange ptr::swap_nonoverlapping_one's cases (no functional…

3c3d3dd

… changes).

core: disable ptr::swap_nonoverlapping_one's block optimization on …

bc6af97

…SPIR-V.

eddyb force-pushed the spirv-no-block-swap branch from 955761b to bc6af97 Compare April 4, 2021 19:30

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 4, 2021

This was referenced Apr 4, 2021

Rollup of 8 pull requests #83861

Closed

Rollup of 7 pull requests #83864

Merged

bors merged commit 4e3f471 into rust-lang:master Apr 5, 2021

rustbot added this to the 1.53.0 milestone Apr 5, 2021

eddyb deleted the spirv-no-block-swap branch April 6, 2021 07:56

eddyb mentioned this pull request Oct 1, 2023

What guarantees are provided about whether the reference/stdlib docs are valid on target architectures? rust-lang/unsafe-code-guidelines#461

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core: disable `ptr::swap_nonoverlapping_one`'s block optimization on SPIR-V. #83019

core: disable `ptr::swap_nonoverlapping_one`'s block optimization on SPIR-V. #83019

eddyb commented Mar 11, 2021

m-ou-se commented Mar 11, 2021

SimonSapin commented Mar 11, 2021

m-ou-se commented Mar 11, 2021 •

edited

Loading

SimonSapin commented Mar 11, 2021

m-ou-se commented Mar 11, 2021

nagisa commented Mar 12, 2021

eddyb commented Mar 12, 2021

bors commented Mar 16, 2021

nagisa commented Mar 16, 2021

nagisa commented Apr 4, 2021

eddyb commented Apr 4, 2021

bors commented Apr 4, 2021

core: disable ptr::swap_nonoverlapping_one's block optimization on SPIR-V. #83019

core: disable ptr::swap_nonoverlapping_one's block optimization on SPIR-V. #83019

Conversation

eddyb commented Mar 11, 2021

m-ou-se commented Mar 11, 2021

SimonSapin commented Mar 11, 2021

m-ou-se commented Mar 11, 2021 • edited Loading

SimonSapin commented Mar 11, 2021

m-ou-se commented Mar 11, 2021

nagisa commented Mar 12, 2021

eddyb commented Mar 12, 2021

bors commented Mar 16, 2021

nagisa commented Mar 16, 2021

nagisa commented Apr 4, 2021

eddyb commented Apr 4, 2021

bors commented Apr 4, 2021

core: disable `ptr::swap_nonoverlapping_one`'s block optimization on SPIR-V. #83019

core: disable `ptr::swap_nonoverlapping_one`'s block optimization on SPIR-V. #83019

m-ou-se commented Mar 11, 2021 •

edited

Loading