-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RyuJIT performance regression with bool to int conversions #4398
Comments
The fact that all 3 methods display the same issue would imply that the issue isn't related to conversion. The conversion that happens in the In the particular case of The funny thing is that if you write public static int IfThenElse(bool[] bits) {
int sum = 0;
for (int i = 0; i < N; i++)
sum += bits[i] ? 1 : 0;
return sum;
} then the results are reversed, the code generated by RyuJIT is faster than the code generated by JIT64. Well, at least on my machine... |
@redknightlois, @mikedn, I think, we should try to avoid work with arrays in this benchmark. Otherwise, we measure conversion time + access to array elements time + CPU cache efficiency. |
The size of that array is too small for CPU data caching to impact the benchmark. Besides, if you get rid of the array there's not much left for a proper benchmark.
Actually the only difference is in the case of static arrays, in that case the compiler insists on reloading the array reference every iteration. But that didn't prevent JIT64 from getting a good result. |
BenchmarkDotNet assumes that you don't use arrays, if you don't want to measure some operations with arrays. You can measure just one or two instructions. For example, this benchmark works fine. |
@AndreyAkinshin I don't understand what you're trying to say. If you want to measure how long it takes to sum the elements of an array then you do just that, sum the elements of the array and measure the time. You don't go and measure how long an individual addition operation takes because the result won't tell you anything about how long the sum operation takes. |
@mikedn, but the issue is about the bool to int conversions, not about a sum calculation. |
I already stated in the my first post that the conversion code generated by RyuJIT and JIT64 for the IfThenElse case is identical. While the conversion code could be improved the regression isn't caused by that code but by surrounding, array/loop specific code. Code that you seem to suggest that is should be removed. The sum stuff was a simple analogy, much like the benchmark you linked to even if it's not related in any way to this issue. There is a difference in conversion code generated for UnsafeConvert but I didn't bothered too much with that case as it's kind of ugly. |
@mikedn I discovered the issue because I had a very tight routine that last time I checked used to take 2:03 minutes, now it takes 3:07 minutes. There was nothing different but then I remember that I had installed Visual Studio 2015 in that machine and reduced the issue to its core. Then I just added the other alternatives to see how it fared. Up to 1-3% difference is acceptable, Up to 5-7% I can live with but 55% worse is a performance regression. I probably could have gone the extra mile and find out the code generated by both is the same, but even if that would be the case, the perf regression in this example does exist. @AndreyAkinshin Probably the name is not the best one, as you say there are many different things playing in this beside the conversion itself. So I just went and got rid of the array and used 2 instance variables... definitely not much better. // BenchmarkDotNet=v0.7.6.0
Repro: https://gist.github.com/redknightlois/045d3022c2c28e1498e2 |
Which one of the 2 benchmarks more accurately represent your actual code? In your latest benchmark the issue is easy to see and easy to fix (at the cost of increased code size). RyuJIT doesn't align certain branch targets like JIT64 did: JIT 64 code for IfThenElse:
RyuJIT code for IfThenElse:
The JIT64 version has a The array version of the benchmark has the same alignment issue but attempting to fix alignment in that case doesn't appear to help. There's another issue in that case, one which I don't see and apparently JIT64 avoided by accident since some variations of that code actually run slower on JIT64. |
The first benchmark with unsafe conversions is effectively the case it looks more like the code in question. We rely on unsafe code a lot, if it is of interest we can measure other places and issue repros if time permit.
Love when that happens :D |
The code generated by RyuJIT for UnsafeConvert is worse, it cannot eliminate the local variable because its address has been taken and that means that there's an additional memory load/store which could be responsible for the degraded performance. But that code has a bug, it casts to |
@mikedn Yes, that's true it can read garbage in this version (the original code is working on packed arrays of booleans). I will rerun that one and tell you what it looks like on this end. I am testing a bunch of other routines (isolated) and have some other results. Do you want me to open new issues or I post them here? |
I'm not in a position to want anything from you, it's not like I work for Microsoft 😄. But I think that it would make sense to create different issues if the code is you have problem with is different. |
@mikedn LOL I was under the impression you were (you fake it quite well :D) Well with the fixed UnsafeConvert there is still a huge difference. // BenchmarkDotNet=v0.7.6.0
Either the LegacyJit is doing some extra optimization that would skew the results and I am not being able to negate, or the difference in huge. I have updated the gist code with the correct code. |
Here are some funny results: // BenchmarkDotNet=v0.7.6.0
// BenchmarkDotNet=v0.7.6.0
The first result set is from your unchanged code, the second is after I added The real problem isn't the conversion itself (even though the code isn't great as I stated above), the problem is that RyuJIT doesn't inline that method but JIT64 does. |
@AndreyAkinshin I know it is supposed to be called through a NoInlining call. Any idea why it appears to not being honoring the NoInlining in the first case? |
@mikedn, @redknightlois, it is definitely a BenchmarkDotNet bug. Inlining is a huge area of troubles for benchmarking. But I know how to fix it, I will publish BenchmarkDotNet v0.7.7 today or tomorrow. Just for now, you can use |
@mikedn, @redknightlois, I have published new version of BenchmarkDotNet: https://www.nuget.org/packages/BenchmarkDotNet/0.7.7
|
@mikedn, @redknightlois, I have also made some improvements for Here is my new results:
|
So, to try to sum this up:
|
@AndreyAkinshin Looks pretty good, small enough for the JIT guys to look into it. 👍 |
Thanks guys, this is great stuff! |
Related: #32716 |
Branch target alignment is being worked on. It doesn't seem like there is much here otherwise that is actionable. I'm going to close this. |
It looks like RyuJIT is almost 30% less efficient than legacy in bool-to-int conversions. All 3 methods show more or less the same issue.
// BenchmarkDotNet=v0.7.6.0
// OS=Microsoft Windows NT 6.2.9200.0
// Processor=Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz, ProcessorCount=4
// CLR=MS.NET 4.0.30319.42000, Arch=64-bit [RyuJIT]
Common: Type=Jit_BoolToInt Mode=Throughput Platform=X64 .NET=Current
Code in question: https://gist.github.com/redknightlois/c1ae5ddc6f73c2e53c9b
category:cq
theme:basic-cq
skill-level:intermediate
cost:medium
The text was updated successfully, but these errors were encountered: