Benchmarking results are unstable for native target due to object pinning done by the Blackhole #114

fzhinkin · 2023-05-22T11:57:43Z

Benchmarking results on native platforms are unstable and measured performance degrades from iteration to iteration.

Issue

While running some benchmarks for the native target I spotted the problem: performance degrades from iteration to iteration.
The root cause is how the Blackhole is implemented for native: its consume methods pins a value: https://github.com/Kotlin/kotlinx-benchmark/blob/master/runtime/nativeMain/src/kotlinx/benchmark/NativeBlackhole.kt#L7

actual class Blackhole {
    actual fun consume(obj: Any?) {
        obj?.pin()
    }
...

References pinned by the blackhole are never unpinned and that exposes additional work for native GC as every pinned reference is a GC root.

The issue could be reproduced with the KMP example from the repository, but I added an additional configuration (204c223) to make the issue more obvious:

$ ./gradlew nativePinningIssueBenchmark
… test.NativeTestBenchmark.sqrtBenchmark
Warm-up #0: 9,217.20 ops/ms
Warm-up #1: 8,086.55 ops/ms
Warm-up #2: 6,463.68 ops/ms
Warm-up #3: 2,530.98 ops/ms
Warm-up #4: 2,163.17 ops/ms
Warm-up #5: 1,716.82 ops/ms
Warm-up #6: 1,456.48 ops/ms
Warm-up #7: 9,395.89 ops/ms
Warm-up #8: 7,259.49 ops/ms
Warm-up #9: 7,420.73 ops/ms
Iteration #0: 450.845 ops/ms
Iteration #1: 395.228 ops/ms
Iteration #2: 319.273 ops/ms
...

As you can see, measured time drifts significantly, and it degrades with each iteration.

Here's a flame graph built from dtrace profiling results:

Suggested fix

To avoid the issue NativeBlackhole may use the same approach as the JMH's blackhole and pin objects conditionally with a condition that is always false, like:

   init {
         val rnd = Random(getTimeMillis())
         param1 = rnd.nextInt()
         param2 = param1 + 1
     }

     actual fun consume(obj: Any?) {
         if (param1 xor param2 == 0) {
             obj?.pin()
         }
     }

Here's a commit with a suggested workaround: fzhinkin@439e2ac

Note that actual pinning is not necessary and we can save the consume's argument to some public field instead.

With that change benchmarks execution results are stable:

$ ./gradlew nativePinningIssueBenchmark
… test.NativeTestBenchmark.sqrtBenchmark
Warm-up #0: 35,376.2 ops/ms
Warm-up #1: 39,050.2 ops/ms
Warm-up #2: 39,947.6 ops/ms
Warm-up #3: 39,730.9 ops/ms
Warm-up #4: 39,609.2 ops/ms
Warm-up #5: 40,083.4 ops/ms
Warm-up #6: 40,185.6 ops/ms
Warm-up #7: 39,982.4 ops/ms
Warm-up #8: 39,912.6 ops/ms
Warm-up #9: 40,048.8 ops/ms
Iteration #0: 60,159.8 ops/ms
Iteration #1: 60,255.4 ops/ms
Iteration #2: 60,261.2 ops/ms
Iteration #3: 60,538.9 ops/ms
Iteration #4: 60,8401.0 ops/ms

The text was updated successfully, but these errors were encountered:

fzhinkin · 2023-05-22T12:02:25Z

Worth mentioning that in general, it's not a viable option to perform some sort of consumption under a never-executed branch as after inlining some sophisticated compiler optimizations (like GraalVM's control-flow sensitive partial escape analysis) may move actual work performed by the benchmark to that branch, but it's very unlikely to be an issue with KMP (at least for now).

fzhinkin · 2023-07-21T10:21:15Z

Created a reproducer that can catch the problem with unstable results: https://github.com/fzhinkin/kotlinx-benchmark-native-blackhole-reproducer/blob/main/build.gradle.kts#L110

Re-implemented native blackhole without using object pinning as it affects performance and leads to unstable results as each time GC has to spend more and more time scanning all pinned values. Instead, primitive values consumption is implemented as a comparison of the value for equality with two fields and publishing the value in case when comparison succeeds. The values themselves are never the same and one of the fields is volatile, thus the condition is always false and it could not be omitted because of volatility. That should prevent both dead code elimination and movement of the code computing the consumed value into an effectively unreachable branch. For the objects, identifyHashCode is used to obtain an int-value that is then passed into a regular consumption routine. That function is an intrinsic that simply gets an address of the object, so it has no performance impact, yet it requires an object. Fixes #114

…e.maven.plugins-maven-surefire-plugin-3.0.0 build(deps): bump maven-surefire-plugin from 3.0.0-M9 to 3.0.0

fzhinkin added the bug Something isn't working label May 22, 2023

qwwdfsad assigned qurbonzoda May 22, 2023

fzhinkin mentioned this issue Jul 21, 2023

Avoid using object pinning in native blackhole #132

Merged

This was referenced Aug 2, 2023

Temporarily disable benchmark's native target Kotlin/kotlinx-io#195

Closed

Enable benchmark's native target Kotlin/kotlinx-io#196

Closed

fzhinkin closed this as completed in #132 Aug 16, 2023

OndrejSliva pushed a commit to OndrejSliva/kotlinx-benchmark that referenced this issue Jan 10, 2024

Merge pull request Kotlin#114 from sureshg/dependabot/maven/org.apach…

0445bad

…e.maven.plugins-maven-surefire-plugin-3.0.0 build(deps): bump maven-surefire-plugin from 3.0.0-M9 to 3.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking results are unstable for native target due to object pinning done by the Blackhole #114

Benchmarking results are unstable for native target due to object pinning done by the Blackhole #114

fzhinkin commented May 22, 2023

fzhinkin commented May 22, 2023

fzhinkin commented Jul 21, 2023

Benchmarking results are unstable for native target due to object pinning done by the Blackhole #114

Benchmarking results are unstable for native target due to object pinning done by the Blackhole #114

Comments

fzhinkin commented May 22, 2023

Issue

Suggested fix

fzhinkin commented May 22, 2023

fzhinkin commented Jul 21, 2023