Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking results are unstable for native target due to object pinning done by the Blackhole #114

Closed
fzhinkin opened this issue May 22, 2023 · 2 comments · Fixed by #132
Assignees
Labels
bug Something isn't working

Comments

@fzhinkin
Copy link
Contributor

Benchmarking results on native platforms are unstable and measured performance degrades from iteration to iteration.

Issue

While running some benchmarks for the native target I spotted the problem: performance degrades from iteration to iteration.
The root cause is how the Blackhole is implemented for native: its consume methods pins a value: https://github.com/Kotlin/kotlinx-benchmark/blob/master/runtime/nativeMain/src/kotlinx/benchmark/NativeBlackhole.kt#L7

actual class Blackhole {
    actual fun consume(obj: Any?) {
        obj?.pin()
    }
...

References pinned by the blackhole are never unpinned and that exposes additional work for native GC as every pinned reference is a GC root.

The issue could be reproduced with the KMP example from the repository, but I added an additional configuration (204c223) to make the issue more obvious:

$ ./gradlew nativePinningIssueBenchmark
… test.NativeTestBenchmark.sqrtBenchmark
Warm-up #0: 9,217.20 ops/ms
Warm-up #1: 8,086.55 ops/ms
Warm-up #2: 6,463.68 ops/ms
Warm-up #3: 2,530.98 ops/ms
Warm-up #4: 2,163.17 ops/ms
Warm-up #5: 1,716.82 ops/ms
Warm-up #6: 1,456.48 ops/ms
Warm-up #7: 9,395.89 ops/ms
Warm-up #8: 7,259.49 ops/ms
Warm-up #9: 7,420.73 ops/ms
Iteration #0: 450.845 ops/ms
Iteration #1: 395.228 ops/ms
Iteration #2: 319.273 ops/ms
...

As you can see, measured time drifts significantly, and it degrades with each iteration.

Here's a flame graph built from dtrace profiling results:

Suggested fix

To avoid the issue NativeBlackhole may use the same approach as the JMH's blackhole and pin objects conditionally with a condition that is always false, like:

   init {
         val rnd = Random(getTimeMillis())
         param1 = rnd.nextInt()
         param2 = param1 + 1
     }

     actual fun consume(obj: Any?) {
         if (param1 xor param2 == 0) {
             obj?.pin()
         }
     }

Here's a commit with a suggested workaround: fzhinkin@439e2ac

Note that actual pinning is not necessary and we can save the consume's argument to some public field instead.

With that change benchmarks execution results are stable:

$ ./gradlew nativePinningIssueBenchmark
… test.NativeTestBenchmark.sqrtBenchmark
Warm-up #0: 35,376.2 ops/ms
Warm-up #1: 39,050.2 ops/ms
Warm-up #2: 39,947.6 ops/ms
Warm-up #3: 39,730.9 ops/ms
Warm-up #4: 39,609.2 ops/ms
Warm-up #5: 40,083.4 ops/ms
Warm-up #6: 40,185.6 ops/ms
Warm-up #7: 39,982.4 ops/ms
Warm-up #8: 39,912.6 ops/ms
Warm-up #9: 40,048.8 ops/ms
Iteration #0: 60,159.8 ops/ms
Iteration #1: 60,255.4 ops/ms
Iteration #2: 60,261.2 ops/ms
Iteration #3: 60,538.9 ops/ms
Iteration #4: 60,8401.0 ops/ms
@fzhinkin fzhinkin added the bug Something isn't working label May 22, 2023
@fzhinkin
Copy link
Contributor Author

Worth mentioning that in general, it's not a viable option to perform some sort of consumption under a never-executed branch as after inlining some sophisticated compiler optimizations (like GraalVM's control-flow sensitive partial escape analysis) may move actual work performed by the benchmark to that branch, but it's very unlikely to be an issue with KMP (at least for now).

@fzhinkin
Copy link
Contributor Author

Created a reproducer that can catch the problem with unstable results: https://github.com/fzhinkin/kotlinx-benchmark-native-blackhole-reproducer/blob/main/build.gradle.kts#L110

fzhinkin added a commit that referenced this issue Aug 16, 2023
Re-implemented native blackhole without using object pinning as  
it affects performance and leads to unstable results as each time GC
has to spend more and more time scanning all pinned values.

Instead, primitive values consumption is implemented as a comparison
of the value for equality with two fields and publishing the value in case
when comparison succeeds. The values themselves are never the same and
one of the fields is volatile, thus the condition is always false and it
could not be omitted because of volatility. That should prevent both
dead code elimination and movement of the code computing the consumed
value into an effectively unreachable branch.

For the objects, identifyHashCode is used to obtain an int-value that
is then passed into a regular consumption routine. That function is an
intrinsic that simply gets an address of the object, so it has no
performance impact, yet it requires an object.

Fixes #114
OndrejSliva pushed a commit to OndrejSliva/kotlinx-benchmark that referenced this issue Jan 10, 2024
…e.maven.plugins-maven-surefire-plugin-3.0.0

build(deps): bump maven-surefire-plugin from 3.0.0-M9 to 3.0.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants