Replace the local worker queues with st3's #115

james7132 · 2024-04-22T15:26:40Z

Fix #32. This PR replaces the fixed-sized local worker queues with st3's implementation. The implementation in the crate itself is largely the same, but st3's implementation should use considerably fewer atomic operations.

Performance wise, this seems to provide a major performance improvement across the board, particularly for single threaded cases, since pushing no longer requires any atomic operations, and popping only requires one. This does contain one major performance regression with multi_thread/executor::spawn_one, and it's unclear why that's the case. My current working theory is that the atomic-free push to local queues is putting the global queue under higher contention.

executor::create        time:   [725.66 ns 726.23 ns 726.98 ns]
                        change: [-32.237% -31.965% -31.775%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  5 (5.00%) high severe

single_thread/executor::spawn_one
                        time:   [923.93 ns 936.62 ns 950.73 ns]
                        change: [-37.180% -33.978% -30.511%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  11 (11.00%) high mild
single_thread/executor::spawn_batch
                        time:   [34.023 µs 36.536 µs 39.778 µs]
                        change: [+22.133% +34.229% +44.880%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe
single_thread/executor::spawn_many_local
                        time:   [4.6916 ms 4.7248 ms 4.7611 ms]
                        change: [-3.5935% -2.6420% -1.6484%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  6 (6.00%) high mild
  5 (5.00%) high severe
single_thread/executor::spawn_recursively
                        time:   [35.261 ms 35.584 ms 35.936 ms]
                        change: [-26.953% -26.069% -25.097%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
single_thread/executor::yield_now
                        time:   [5.4241 ms 5.4290 ms 5.4344 ms]
                        change: [-10.452% -9.0866% -7.9914%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

multi_thread/executor::spawn_one
                        time:   [14.511 µs 14.882 µs 15.177 µs]
                        change: [+674.99% +725.04% +767.31%] (p = 0.00 < 0.05)
                        Performance has regressed.
multi_thread/executor::spawn_batch
                        time:   [53.164 µs 58.006 µs 63.134 µs]
                        change: [-19.348% -12.758% -5.2161%] (p = 0.00 < 0.05)
                        Performance has improved.
multi_thread/executor::spawn_many_local
                        time:   [27.513 ms 27.608 ms 27.705 ms]
                        change: [+1.8542% +2.4549% +3.0788%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild
Benchmarking multi_thread/executor::spawn_recursively: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 17.7s, or reduce sample count to 20.
multi_thread/executor::spawn_recursively
                        time:   [174.31 ms 174.66 ms 175.04 ms]
                        change: [-1.8165% -1.5293% -1.2438%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low mild
  4 (4.00%) high mild
  3 (3.00%) high severe
multi_thread/executor::yield_now
                        time:   [23.860 ms 23.931 ms 23.996 ms]
                        change: [-1.8530% -1.4776% -1.0672%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low severe
  5 (5.00%) low mild

single_thread/static_executor::spawn_one
                        time:   [671.98 ns 680.60 ns 689.94 ns]
                        change: [-53.423% -50.975% -48.005%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) high mild
  9 (9.00%) high severe
single_thread/static_executor::spawn_many_local
                        time:   [4.4846 ms 4.5148 ms 4.5500 ms]
                        change: [-11.369% -10.414% -9.3355%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe
single_thread/static_executor::spawn_recursively
                        time:   [24.470 ms 24.599 ms 24.738 ms]
                        change: [-6.6356% -5.5074% -4.4066%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe
single_thread/static_executor::yield_now
                        time:   [5.3366 ms 5.3424 ms 5.3486 ms]
                        change: [-6.6970% -6.4629% -6.2391%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

multi_thread/static_executor::spawn_one
                        time:   [13.876 µs 14.197 µs 14.443 µs]
                        change: [+704.49% +755.40% +805.04%] (p = 0.00 < 0.05)
                        Performance has regressed.
multi_thread/static_executor::spawn_many_local
                        time:   [4.5342 ms 4.6639 ms 4.7927 ms]
                        change: [-21.861% -19.142% -16.242%] (p = 0.00 < 0.05)
                        Performance has improved.
multi_thread/static_executor::spawn_recursively
                        time:   [43.786 ms 44.057 ms 44.264 ms]
                        change: [-0.6134% -0.0025% +0.5506%] (p = 1.00 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) low severe
  1 (1.00%) low mild
multi_thread/static_executor::yield_now
                        time:   [23.979 ms 24.052 ms 24.121 ms]
                        change: [+0.4275% +0.8193% +1.1661%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low severe
  4 (4.00%) low mild

This is a breaking change. It makes the future returned by Executor::run no longer Send or Sync.

james7132 added 4 commits December 4, 2022 00:16

Use st3 as local queues

5427437

Merge branch 'master' into st3

476453e

Use fifo instead of lifo

97fe4a3

Add back in the self-stealing avoidance

0579b9e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace the local worker queues with st3's #115

Replace the local worker queues with st3's #115

james7132 commented Apr 22, 2024 •

edited

Loading

Replace the local worker queues with st3's #115

Are you sure you want to change the base?

Replace the local worker queues with st3's #115

Conversation

james7132 commented Apr 22, 2024 • edited Loading

james7132 commented Apr 22, 2024 •

edited

Loading