Mutex is slower than standard library's on Intel i9 and ARM CPUs #338

jerry73204 · 2022-04-23T19:47:23Z

I run the mutex benchmark using this command for 36 system cores for example:

cargo run --bin mutex --release -- 9:36:9 5 5 2 2

The parking_lot's mutex is faster only on Intel CPU and with smaller number of threads. It gets drastically slower when full cores/hyperthreads are utilized. Please tell me if there was anything done wrong.

Intel i7-10750H (6 cores, 12 threads)

- Running with 3 threads
- 5 iterations inside lock, 5 iterations outside lock
- 2 seconds per test
        name         |    average     |     median     |    std.dev.
parking_lot::Mutex   |   8521.370 kHz |   8489.291 kHz |    512.243 kHz
std::sync::Mutex     |   4192.444 kHz |   4216.470 kHz |    145.741 kHz
pthread_mutex_t      |   4985.169 kHz |   5060.596 kHz |    233.319 kHz
- Running with 6 threads
        name         |    average     |     median     |    std.dev.
parking_lot::Mutex   |   2094.819 kHz |   2091.441 kHz |    109.118 kHz
std::sync::Mutex     |   1304.931 kHz |   1310.466 kHz |     23.064 kHz
pthread_mutex_t      |   1672.081 kHz |   1687.514 kHz |     61.920 kHz
- Running with 9 threads
        name         |    average     |     median     |    std.dev.
parking_lot::Mutex   |    924.263 kHz |    925.953 kHz |     34.467 kHz
std::sync::Mutex     |    866.749 kHz |    868.601 kHz |     33.733 kHz
pthread_mutex_t      |   1085.478 kHz |   1092.233 kHz |     46.641 kHz
- Running with 12 threads
        name         |    average     |     median     |    std.dev.
parking_lot::Mutex   |    470.583 kHz |    472.499 kHz |      8.968 kHz
std::sync::Mutex     |    638.854 kHz |    640.303 kHz |      7.311 kHz
pthread_mutex_t      |    809.968 kHz |    809.067 kHz |     15.230 kHz

Intel i9-10980XE (18 cores, 36 threads)

    Finished release [optimized] target(s) in 0.01s
     Running `target/release/mutex '9:36:9' 5 5 2 2`
- Running with 9 threads
- 5 iterations inside lock, 5 iterations outside lock
- 2 seconds per test
        name         |    average     |     median     |    std.dev.
parking_lot::Mutex   |    386.341 kHz |    393.885 kHz |     22.261 kHz
std::sync::Mutex     |    651.888 kHz |    652.626 kHz |     12.211 kHz
pthread_mutex_t      |    664.773 kHz |    666.073 kHz |     15.252 kHz
- Running with 18 threads
        name         |    average     |     median     |    std.dev.
parking_lot::Mutex   |     38.226 kHz |     38.199 kHz |      0.377 kHz
std::sync::Mutex     |    331.488 kHz |    332.057 kHz |      4.793 kHz
pthread_mutex_t      |    327.079 kHz |    327.972 kHz |      4.731 kHz
- Running with 27 threads
        name         |    average     |     median     |    std.dev.
parking_lot::Mutex   |     24.683 kHz |     24.690 kHz |      0.167 kHz
std::sync::Mutex     |    225.616 kHz |    225.855 kHz |      3.005 kHz
pthread_mutex_t      |    214.773 kHz |    214.687 kHz |      2.342 kHz
- Running with 36 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |     18.378 kHz |     18.379 kHz |      0.158 kHz
std::sync::Mutex     |    170.763 kHz |    171.248 kHz |      3.193 kHz
pthread_mutex_t      |    158.567 kHz |    158.870 kHz |      2.144 kHz

ARM Neoverse-N1 (80 cores)

    Finished release [optimized] target(s) in 0.01s
     Running `target/release/mutex '15:60:15' 5 5 2 2`
- Running with 15 threads
- 5 iterations inside lock, 5 iterations outside lock
- 2 seconds per test
        name         |    average     |     median     |    std.dev.
parking_lot::Mutex   |     60.814 kHz |     60.603 kHz |      0.925 kHz
std::sync::Mutex     |    258.884 kHz |    211.127 kHz |    116.148 kHz
pthread_mutex_t      |    599.016 kHz |    571.777 kHz |    126.924 kHz
- Running with 30 threads
        name         |    average     |     median     |    std.dev.
parking_lot::Mutex   |     28.282 kHz |     28.918 kHz |      1.672 kHz
std::sync::Mutex     |    134.056 kHz |    116.737 kHz |     49.365 kHz
pthread_mutex_t      |    173.002 kHz |    179.024 kHz |     73.466 kHz
- Running with 45 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |     18.901 kHz |     18.913 kHz |      0.857 kHz
std::sync::Mutex     |     89.172 kHz |     78.882 kHz |     27.318 kHz
pthread_mutex_t      |     91.019 kHz |     82.296 kHz |     37.272 kHz
- Running with 60 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |     14.117 kHz |     14.122 kHz |      0.346 kHz
std::sync::Mutex     |     66.068 kHz |     57.331 kHz |     22.050 kHz
pthread_mutex_t      |     64.659 kHz |     57.855 kHz |     29.639 kHz

The text was updated successfully, but these errors were encountered:

bjorn3 · 2022-04-24T13:18:18Z

Did you check against stable rustc or nightly rustc? Nightly rustc has switched from pthreads to using futexes directly on linux.

jerry73204 · 2022-04-25T01:50:05Z

It's stable rustc with version 1.60.0. The futex seems to have been merged in 2020 (rust-lang/rust#93740). I looked through the source code and confirm this.

$ rustc --version
rustc 1.60.0 (7737e0b5c 2022-04-04)

bjorn3 · 2022-04-25T10:03:45Z

The futex based mutex has been merged in march of this year: rust-lang/rust#95035 It will end up in rustc 1.62 which is currently nightly. Could you try the benchmarks on nightly too?

jerry73204 · 2022-04-26T15:58:10Z

Sure. I run with +nightly (version 1.62.0-nightly (055bf4ccd 2022-04-25)). Here are the numbers.

Intel i7-10750H (6 cores, 12 hthreads)

- Running with 3 threads
- 5 iterations inside lock, 5 iterations outside lock
- 2 seconds per test
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |   8079.344 kHz |   8315.660 kHz |   1955.966 kHz
std::sync::Mutex     |   4448.222 kHz |   4423.091 kHz |    243.777 kHz
pthread_mutex_t      |   3608.558 kHz |   3656.827 kHz |    143.451 kHz
- Running with 6 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |   1997.490 kHz |   2078.867 kHz |    295.290 kHz
std::sync::Mutex     |   1633.612 kHz |   1661.483 kHz |    104.163 kHz
pthread_mutex_t      |   1327.281 kHz |   1324.530 kHz |     45.710 kHz
- Running with 9 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |    869.781 kHz |    857.399 kHz |     39.693 kHz
std::sync::Mutex     |   1026.849 kHz |   1036.559 kHz |     41.423 kHz
pthread_mutex_t      |    885.059 kHz |    884.695 kHz |     12.900 kHz
- Running with 12 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |    474.713 kHz |    475.740 kHz |      8.920 kHz
std::sync::Mutex     |    740.497 kHz |    737.418 kHz |     17.617 kHz
pthread_mutex_t      |    651.320 kHz |    653.363 kHz |      9.331 kHz

Intel i9-10980XE (18 cores, 36 hthreads)

- Running with 9 threads
- 5 iterations inside lock, 5 iterations outside lock
- 2 seconds per test
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |    400.088 kHz |    401.635 kHz |     27.633 kHz
std::sync::Mutex     |    799.452 kHz |    806.210 kHz |     12.699 kHz
pthread_mutex_t      |   1118.849 kHz |   1139.824 kHz |     45.309 kHz
- Running with 18 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |     39.772 kHz |     39.779 kHz |      0.781 kHz
std::sync::Mutex     |    416.553 kHz |    416.450 kHz |      7.506 kHz
pthread_mutex_t      |    521.151 kHz |    520.936 kHz |     12.094 kHz
- Running with 27 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |     25.261 kHz |     25.212 kHz |      0.238 kHz
std::sync::Mutex     |    285.391 kHz |    284.586 kHz |      5.852 kHz
pthread_mutex_t      |    333.913 kHz |    333.454 kHz |      4.978 kHz
- Running with 36 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |     19.225 kHz |     19.230 kHz |      0.189 kHz
std::sync::Mutex     |    215.833 kHz |    214.800 kHz |      5.407 kHz
pthread_mutex_t      |    249.402 kHz |    249.304 kHz |      7.159 kHz

ARM Neoverse-N1 (80 cores)

- Running with 20 threads
- 5 iterations inside lock, 5 iterations outside lock
- 2 seconds per test
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |     41.570 kHz |     41.523 kHz |      0.909 kHz
std::sync::Mutex     |    196.561 kHz |    157.346 kHz |     58.384 kHz
pthread_mutex_t      |    458.303 kHz |    465.090 kHz |    114.535 kHz
- Running with 40 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |     22.233 kHz |     22.284 kHz |      0.490 kHz
std::sync::Mutex     |     95.063 kHz |     80.203 kHz |     31.423 kHz
pthread_mutex_t      |    114.583 kHz |    104.330 kHz |     49.829 kHz
- Running with 60 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |     14.725 kHz |     14.677 kHz |      0.348 kHz
std::sync::Mutex     |     63.602 kHz |     52.345 kHz |     20.195 kHz
pthread_mutex_t      |     70.214 kHz |     60.858 kHz |     39.201 kHz
- Running with 80 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |     10.849 kHz |     10.860 kHz |      0.357 kHz
std::sync::Mutex     |     48.115 kHz |     40.566 kHz |     14.621 kHz
pthread_mutex_t      |     52.509 kHz |     45.279 kHz |     28.964 kHz

I run the test with wrong number of threads on ARM for stable rust. Here is re-evaluated numbers.

        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |     43.606 kHz |     43.715 kHz |      1.123 kHz
std::sync::Mutex     |    207.311 kHz |    180.506 kHz |     71.008 kHz
pthread_mutex_t      |    419.453 kHz |    449.673 kHz |    133.824 kHz
- Running with 40 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |     23.374 kHz |     23.422 kHz |      0.516 kHz
std::sync::Mutex     |    103.129 kHz |     89.750 kHz |     35.075 kHz
pthread_mutex_t      |    115.463 kHz |    111.490 kHz |     44.795 kHz
- Running with 60 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |     15.207 kHz |     15.172 kHz |      0.438 kHz
std::sync::Mutex     |     67.932 kHz |     58.165 kHz |     22.777 kHz
pthread_mutex_t      |     72.892 kHz |     66.856 kHz |     37.401 kHz
- Running with 80 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |     11.313 kHz |     11.292 kHz |      0.351 kHz
std::sync::Mutex     |     50.727 kHz |     43.523 kHz |     16.412 kHz
pthread_mutex_t      |     46.900 kHz |     40.028 kHz |     26.674 kHz

jerry73204 · 2022-04-26T16:01:32Z

Nightly std locks has nontrivial performance gain on Intel chips. The improvement is subtle on ARM server. They are generally faster that parking_lot except on my laptop with small thread count.

Amanieu · 2022-04-27T18:34:31Z

Could you try testing with one small change to see if it makes a difference?

Change the if condition in this code:

            // If there is no queue, try spinning a few times
            if state & PARKED_BIT == 0 && spinwait.spin() {
                state = self.state.load(Ordering::Relaxed);
                continue;
            }

to this:

            // Try spinning a few times
            if spinwait.spin() {
                state = self.state.load(Ordering::Relaxed);
                continue;
            }

jerry73204 · 2022-04-28T14:18:32Z

As your request

Stable

version: rustc 1.60.0 (7737e0b5c 2022-04-04)

Intel i7-10750H (6 cores, 12 hthreads)

- Running with 3 threads
- 5 iterations inside lock, 5 iterations outside lock
- 2 seconds per test
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |   9512.955 kHz |   9844.392 kHz |   1710.757 kHz
std::sync::Mutex     |   3990.979 kHz |   4097.404 kHz |    298.792 kHz
pthread_mutex_t      |   5756.698 kHz |   5535.084 kHz |    412.475 kHz
- Running with 6 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |   2173.738 kHz |   2204.604 kHz |    136.714 kHz
std::sync::Mutex     |   1246.379 kHz |   1253.521 kHz |     48.720 kHz
pthread_mutex_t      |   2023.961 kHz |   2056.113 kHz |    117.137 kHz
- Running with 9 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |    923.682 kHz |    930.546 kHz |     78.446 kHz
std::sync::Mutex     |    822.559 kHz |    821.358 kHz |     25.387 kHz
pthread_mutex_t      |   1199.843 kHz |   1222.746 kHz |     80.207 kHz
- Running with 12 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |    490.625 kHz |    489.680 kHz |     13.922 kHz
std::sync::Mutex     |    613.061 kHz |    613.125 kHz |      7.627 kHz
pthread_mutex_t      |    869.513 kHz |    878.801 kHz |     24.075 kHz

Intel i9-10980XE (18 cores, 36 hthreads)

- Running with 9 threads
- 5 iterations inside lock, 5 iterations outside lock
- 2 seconds per test
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |    463.259 kHz |    460.805 kHz |     11.222 kHz
std::sync::Mutex     |    684.091 kHz |    690.038 kHz |     15.927 kHz
pthread_mutex_t      |    698.060 kHz |    698.018 kHz |      9.498 kHz
- Running with 18 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |     26.698 kHz |     27.238 kHz |      2.279 kHz
std::sync::Mutex     |    329.123 kHz |    331.169 kHz |      7.511 kHz
pthread_mutex_t      |    338.953 kHz |    339.309 kHz |      4.683 kHz
- Running with 27 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |     10.322 kHz |     10.812 kHz |      0.696 kHz
std::sync::Mutex     |    222.162 kHz |    222.048 kHz |      5.238 kHz
pthread_mutex_t      |    219.526 kHz |    219.201 kHz |      4.931 kHz
- Running with 36 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |      7.937 kHz |      8.136 kHz |      0.403 kHz
std::sync::Mutex     |    169.129 kHz |    169.392 kHz |      3.969 kHz
pthread_mutex_t      |    162.699 kHz |    163.165 kHz |      2.677 kHz

ARM Neoverse-N1 (80 cores)

- Running with 20 threads
- 5 iterations inside lock, 5 iterations outside lock
- 2 seconds per test
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |     37.334 kHz |     37.333 kHz |      1.826 kHz
std::sync::Mutex     |    200.904 kHz |    172.072 kHz |     62.720 kHz
pthread_mutex_t      |    479.463 kHz |    480.944 kHz |    140.405 kHz
- Running with 40 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |     19.690 kHz |     19.650 kHz |      0.738 kHz
std::sync::Mutex     |     96.815 kHz |     83.215 kHz |     29.787 kHz
pthread_mutex_t      |    115.659 kHz |    104.032 kHz |     57.641 kHz
- Running with 60 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |     11.799 kHz |     11.780 kHz |      0.473 kHz
std::sync::Mutex     |     64.071 kHz |     53.037 kHz |     21.963 kHz
pthread_mutex_t      |     66.619 kHz |     60.455 kHz |     28.089 kHz
- Running with 80 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |      8.711 kHz |      8.851 kHz |      0.713 kHz
std::sync::Mutex     |     47.930 kHz |     39.697 kHz |     15.694 kHz
pthread_mutex_t      |     47.973 kHz |     40.601 kHz |     26.523 kHz

Nightly

version: rustc 1.62.0-nightly (055bf4ccd 2022-04-25)

Intel i7-10750H (6 cores, 12 hthreads)

- Running with 3 threads
- 5 iterations inside lock, 5 iterations outside lock
- 2 seconds per test
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |   9480.185 kHz |  10501.123 kHz |   1659.376 kHz
std::sync::Mutex     |   4691.234 kHz |   4456.199 kHz |    534.902 kHz
pthread_mutex_t      |   3596.680 kHz |   3623.751 kHz |     85.039 kHz
- Running with 6 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |   2251.437 kHz |   2454.017 kHz |    513.541 kHz
std::sync::Mutex     |   1630.019 kHz |   1647.544 kHz |     60.959 kHz
pthread_mutex_t      |   1289.952 kHz |   1296.755 kHz |     75.895 kHz
- Running with 9 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |    947.498 kHz |   1032.173 kHz |    187.871 kHz
std::sync::Mutex     |   1043.343 kHz |   1042.035 kHz |     33.874 kHz
pthread_mutex_t      |    893.411 kHz |    896.926 kHz |     37.356 kHz
- Running with 12 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |    500.471 kHz |    504.843 kHz |     24.989 kHz
std::sync::Mutex     |    766.263 kHz |    771.987 kHz |     21.780 kHz
pthread_mutex_t      |    676.832 kHz |    677.781 kHz |      9.765 kHz

Intel i9-10980XE (18 cores, 36 hthreads)

- Running with 9 threads
- 5 iterations inside lock, 5 iterations outside lock
- 2 seconds per test
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |    488.074 kHz |    480.617 kHz |     25.215 kHz
std::sync::Mutex     |    898.785 kHz |    900.102 kHz |     13.578 kHz
pthread_mutex_t      |   1255.121 kHz |   1248.007 kHz |     37.719 kHz
- Running with 18 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |     23.964 kHz |     25.768 kHz |      2.568 kHz
std::sync::Mutex     |    420.218 kHz |    421.319 kHz |      6.785 kHz
pthread_mutex_t      |    554.471 kHz |    556.094 kHz |     12.375 kHz
- Running with 27 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |     10.497 kHz |     10.963 kHz |      0.692 kHz
std::sync::Mutex     |    291.106 kHz |    289.175 kHz |      8.553 kHz
pthread_mutex_t      |    347.698 kHz |    345.151 kHz |     14.098 kHz
- Running with 36 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |      8.084 kHz |      8.313 kHz |      0.441 kHz
std::sync::Mutex     |    221.338 kHz |    221.404 kHz |      3.359 kHz
pthread_mutex_t      |    251.674 kHz |    251.463 kHz |      3.995 kHz

ARM Neoverse-N1 (80 cores)

- Running with 20 threads
- 5 iterations inside lock, 5 iterations outside lock
- 2 seconds per test
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |     40.473 kHz |     41.017 kHz |      1.942 kHz
std::sync::Mutex     |    205.320 kHz |    172.345 kHz |     59.429 kHz
pthread_mutex_t      |    433.864 kHz |    423.927 kHz |    167.615 kHz
- Running with 40 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |     20.767 kHz |     20.815 kHz |      1.092 kHz
std::sync::Mutex     |     97.718 kHz |     79.918 kHz |     31.395 kHz
pthread_mutex_t      |    107.521 kHz |    101.050 kHz |     48.100 kHz
- Running with 60 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |     14.453 kHz |     14.402 kHz |      0.637 kHz
std::sync::Mutex     |     64.556 kHz |     52.341 kHz |     21.550 kHz
pthread_mutex_t      |     69.293 kHz |     63.544 kHz |     30.714 kHz
- Running with 80 threads
        name         |    average     |     median     |    std.dev.   
parking_lot::Mutex   |      9.850 kHz |      9.766 kHz |      0.536 kHz
std::sync::Mutex     |     48.600 kHz |     40.692 kHz |     15.437 kHz
pthread_mutex_t      |     52.613 kHz |     48.058 kHz |     24.042 kHz

jerry73204 · 2022-04-28T14:23:21Z

Adding this tweak turns out to be a bit slower under contention in general, but it's faster for small number of threads on Intel chips.

Amanieu · 2022-04-28T18:08:52Z

I don't think much can be done about this, it's fundamentally part of how parking_lot works. You may want to explore other alternatives like https://github.com/kprotty/usync which is based on Windows's SRWLock.

kprotty · 2022-05-09T15:32:01Z

FWIW, 5 iterations is a very small amount of work. This makes threads hit the sleeping path almost immediately and which ever sleeps faster under contention has higher throughput. parking_lot effectively emulates futex in userspace which handles sleeping contention worse than futex in the kernel (from what i've seen anecdotally).

In practice, lock usage either has some delay between attempts from actual work being done, or the work in the critical section takes longer than 5 floating point additions & multiplications. Would recommend instead trying 10 100 or 100 100 which may result in a different conclusion.

pingzhaozz · 2023-12-07T01:04:42Z

PR #419 seems to be able to fix this issue.

Before:

- Running with 9 threads
- 5 iterations inside lock, 5 iterations outside lock
- 2 seconds per test
        name         |    average     |     median     |    std.dev.
parking_lot::Mutex   |    363.574 kHz |    330.716 kHz |    137.608 kHz
std::sync::Mutex     |    699.105 kHz |    613.367 kHz |    223.500 kHz
pthread_mutex_t      |   1125.346 kHz |   1115.156 kHz |     46.531 kHz
- Running with 18 threads
        name         |    average     |     median     |    std.dev.
parking_lot::Mutex   |     26.346 kHz |     26.358 kHz |      0.407 kHz
std::sync::Mutex     |    353.564 kHz |    336.981 kHz |     61.858 kHz
pthread_mutex_t      |    529.260 kHz |    529.450 kHz |     19.034 kHz
- Running with 27 threads
        name         |    average     |     median     |    std.dev.
parking_lot::Mutex   |     17.232 kHz |     17.306 kHz |      0.428 kHz
std::sync::Mutex     |    226.792 kHz |    213.642 kHz |     41.591 kHz
pthread_mutex_t      |    315.459 kHz |    316.935 kHz |     19.549 kHz
- Running with 36 threads
        name         |    average     |     median     |    std.dev.
parking_lot::Mutex   |     12.920 kHz |     12.921 kHz |      0.287 kHz
std::sync::Mutex     |    167.901 kHz |    158.800 kHz |     34.085 kHz
pthread_mutex_t      |    221.542 kHz |    212.404 kHz |     17.621 kHz

After

- Running with 9 threads
- 5 iterations inside lock, 5 iterations outside lock
- 2 seconds per test
        name         |    average     |     median     |    std.dev.
parking_lot::Mutex   |   1179.909 kHz |   1075.771 kHz |    345.058 kHz
std::sync::Mutex     |    808.353 kHz |    853.010 kHz |    128.165 kHz
pthread_mutex_t      |   1382.591 kHz |   1388.152 kHz |     22.390 kHz
- Running with 18 threads
        name         |    average     |     median     |    std.dev.
parking_lot::Mutex   |    571.435 kHz |    539.379 kHz |    157.537 kHz
std::sync::Mutex     |    384.533 kHz |    397.613 kHz |     86.923 kHz
pthread_mutex_t      |    572.527 kHz |    572.878 kHz |     26.068 kHz
- Running with 27 threads
        name         |    average     |     median     |    std.dev.
parking_lot::Mutex   |    368.414 kHz |    345.560 kHz |     85.260 kHz
std::sync::Mutex     |    238.103 kHz |    223.207 kHz |     59.118 kHz
pthread_mutex_t      |    337.518 kHz |    344.361 kHz |     23.691 kHz
- Running with 36 threads
        name         |    average     |     median     |    std.dev.
parking_lot::Mutex   |    268.471 kHz |    259.280 kHz |     50.276 kHz
std::sync::Mutex     |    175.217 kHz |    166.470 kHz |     41.695 kHz
pthread_mutex_t      |    237.988 kHz |    225.297 kHz |     25.596 kHz

chenfengyuan · 2024-07-16T07:58:32Z

AMD Ryzen Threadripper PRO 3975WX 32-Cores
rust 1.79.0
commit id ca920b3
cargo run --bin mutex --release -- 9:36:9 5 5 2 2

- Running with 9 threads
- 5 iterations inside lock, 5 iterations outside lock
- 2 seconds per test
        name         |    average     |     median     |    std.dev.
parking_lot::Mutex   |    683.736 kHz |    724.483 kHz |    141.180 kHz
std::sync::Mutex     |   1073.679 kHz |   1082.240 kHz |    102.621 kHz
pthread_mutex_t      |   1490.921 kHz |   1467.843 kHz |    171.040 kHz
- Running with 18 threads
        name         |    average     |     median     |    std.dev.
parking_lot::Mutex   |     46.122 kHz |     44.773 kHz |      4.846 kHz
std::sync::Mutex     |    517.548 kHz |    515.833 kHz |     42.100 kHz
pthread_mutex_t      |    730.395 kHz |    726.745 kHz |     49.781 kHz
- Running with 27 threads
        name         |    average     |     median     |    std.dev.
parking_lot::Mutex   |     24.747 kHz |     25.099 kHz |      1.423 kHz
std::sync::Mutex     |    337.840 kHz |    341.754 kHz |     22.491 kHz
pthread_mutex_t      |    480.829 kHz |    477.595 kHz |     27.153 kHz
- Running with 36 threads
        name         |    average     |     median     |    std.dev.
parking_lot::Mutex   |     17.436 kHz |     17.380 kHz |      0.285 kHz
std::sync::Mutex     |    255.680 kHz |    257.084 kHz |     20.736 kHz
pthread_mutex_t      |    357.704 kHz |    354.987 kHz |     19.194 kHz

bjorn3 mentioned this issue Apr 27, 2022

Consider moving from parking_lot back to std locks bevyengine/bevy#4610

Closed

Xuanwo mentioned this issue Jul 11, 2022

refactor: replace infallible databendlabs/databend#6568

Merged

2 tasks

kemingy mentioned this issue Mar 18, 2023

feat: change parking lot mutex to std mutex mosecorg/mosec#315

Merged

pingzhaozz mentioned this issue Dec 7, 2023

Fix issue #418. #419

Open

tsxiaofang mentioned this issue Jan 31, 2024

parking_lot mutex is slower than standard library tokio-rs/tokio#6317

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mutex is slower than standard library's on Intel i9 and ARM CPUs #338

Mutex is slower than standard library's on Intel i9 and ARM CPUs #338

jerry73204 commented Apr 23, 2022

bjorn3 commented Apr 24, 2022

jerry73204 commented Apr 25, 2022

bjorn3 commented Apr 25, 2022

jerry73204 commented Apr 26, 2022

jerry73204 commented Apr 26, 2022

Amanieu commented Apr 27, 2022

jerry73204 commented Apr 28, 2022

jerry73204 commented Apr 28, 2022

Amanieu commented Apr 28, 2022

kprotty commented May 9, 2022

pingzhaozz commented Dec 7, 2023

chenfengyuan commented Jul 16, 2024 •

edited

Loading

Mutex is slower than standard library's on Intel i9 and ARM CPUs #338

Mutex is slower than standard library's on Intel i9 and ARM CPUs #338

Comments

jerry73204 commented Apr 23, 2022

bjorn3 commented Apr 24, 2022

jerry73204 commented Apr 25, 2022

bjorn3 commented Apr 25, 2022

jerry73204 commented Apr 26, 2022

jerry73204 commented Apr 26, 2022

Amanieu commented Apr 27, 2022

jerry73204 commented Apr 28, 2022

Stable

Nightly

jerry73204 commented Apr 28, 2022

Amanieu commented Apr 28, 2022

kprotty commented May 9, 2022

pingzhaozz commented Dec 7, 2023

chenfengyuan commented Jul 16, 2024 • edited Loading

chenfengyuan commented Jul 16, 2024 •

edited

Loading