-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance degradation in .Net 9 #108058
Comments
This comment was marked as resolved.
This comment was marked as resolved.
Looking the bot results I guess it is something specific for my cpu so I share disasm results. |
@EgorBo, please review the disasm from @iSazonov.
|
@amanasifkhalid PTAL. |
In the optimized .NET 9 codegen for G_M12859_IG04: ;; offset=0x003C
cmp r10d, 16
; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (cmp: 0 ; jcc erratum) 32B boundary ...............................
ja G_M12859_IG21
cmp gword ptr [rdx+0x08], 0
jne G_M12859_IG18 G_M12859_IG10: ;; offset=0x007F
test rdx, rdx
; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (test: 2 ; jcc erratum) 32B boundary ...............................
je SHORT G_M12859_IG14 The latter is particularly painful, since it's inside a loop body. For .NET 8, we're comparatively lucky -- I don't see any potential for hitting the JCC erratum mitigation. @iSazonov I believe the JCC erratum mitigation affects Kaby Lake, hence why you're seeing a performance hit locally but not with EgorBot. Could you please try re-running the benchmark with the environment variable |
@amanasifkhalid Thanks! It seems |
@iSazonov thanks for trying that out. Looking at the codegen with The JIT does have a mechanism for mitigating JCC erratum penalties in loops by adding additional loop padding. Currently, this is only available in Debug/Checked builds of the JIT, and it requires us to disable adaptive loop alignment for all methods, which might muddle improvements. But if this mitigation seems to help, I can tweak the loop alignment configuration to allow us to enable the mitigation on a per-method basis. @iSazonov if I share a Checked build of the .NET 9 RC1 JIT with you, would you be willing to try out the mitigation on your machine? I'm afraid I cannot repro the JCC erratum penalty locally. Thanks! |
Yes, I can. I hope this helps team to improve JIT. I found there are many processor families affected by this problem. I feel the number of such processors in operation in the world is simply huge. So I would like to get a solution, if not in the release, then in 9.0.1. Moreover, in my test, degradation is ~ 20%, and how much is the gain from new optimizations? 1%? 5%? In other words, don't we lose more than we gain in real scenarios? It may be worth doing these optimizations explicitly "opt-in" until there is a better and complete solution. |
The potential performance benefit from the new block layout strategy depends on the specific example; for the benchmark you shared, I looked at the old and new layouts, and the hottest paths are identical, so if we do see improvement for this example, it'll be from other .NET 9 enhancements. As far as I can tell, the new block layout isn't any more prone to triggering the JCC erratum penalty than the old one, and vice versa. Neither one tries to explicitly avoid JCC erratum conditions when reordering blocks because it's too early in the JIT to know the actual code offsets of the blocks. The tricky thing is even if the block layout of a method doesn't change, increases/reductions in code size from unrelated optimizations can push/pull a jump to a cache boundary, thus triggering the erratum mitigation. And regardless of the block layout strategy, changes in profile data from run to run can result in different block layouts, which may also trigger the erratum mitigation. Part of the reason we've seen an uptick in JCC erratum-related regressions is the new layout strategy has churned users' code enough to create new JCC erratum mitigation sites -- this churn has likely also removed many instances of JCC erratum penalties, but I wouldn't expect customers to report these improvements. Because of these factors, I don't see justification for disabling the new block layout on the basis of avoiding JCC erratum penalties. However, I think we are justified in trying to enable a compiler-level mitigation in product builds, similar to what MSVC has done. The JIT team has previously considered productizing this (#93243), though one of the tricky parts is determining if the user's CPU is affected by the JCC erratum, and as far as I know, we don't have the ability to detect the user's CPU model during jitting; since a compiler-level mitigation would increase code size, we'd prefer to not enable this for all x64 codegen scenarios. I apologize for the inconvenience the JCC erratum has caused you and other users. Internally, we've faced plenty of similar headaches when triaging benchmark regressions on Intel machines, and since we're seeing users' code affected by it too, I think we're justified in exposing some JIT configurations to mitigate this. cc @dotnet/jit-contrib for other opinions.
Thanks! I've attached a zip of the Checked JIT build below. You can patch this into your RC1 installation by replacing its current Please let me know if you'd like any clarification. |
That's not correct. We pass a lot of information about the current CPU model over JIT/EE interface today, so passing one more bit for JCC erratum would not be a big deal. |
A good example is the way how we detect CPUs with "bad" AVX512: https://github.com/dotnet/runtime/blob/main/src/coreclr/vm/codeman.cpp#L1613 |
I see, thanks for clarifying. So we can easily determine if the target CPU is affected, but the current mitigation is less sophisticated than ideal: If adaptive loop alignment is disabled and |
@amanasifkhalid - was the asmdiff between net8 vs. net9 just with regards to JCC or were there more diffs that could have contributed to the regression? JCC is not a problem on newer Intel hardware and is not problem on AMD. @iSazonov - If you don't mind sharing, what hardware are you seeing this? |
@kunalspathak @iSazonov ran the benchmark on Kaby Lake, which I believe is susceptible to the JCC erratum. For |
Ah, just noticed that in the PR description. This jcc doc from Intel confirms that it is affected.
Ok, so no other diffs other than block layout related changes? Edit: Just a word of caution, |
Nothing else stuck out to me from the diffs in the benchmark report (zip). |
It's not clear to me we actually have an effective mitigation strategy for JCC errata. And as Aman says the impact of it is somewhat "random".
It might be worth investigating with VTune. I can try this but it may take a few days. |
That's my point. We do not and |
@amanasifkhalid Here is new results for Checked JIT build. I don't see any change in performance. |
I remember back in the 90s, Intel CPUs were already crashing operating systems due to unforeseen "features" of these processors. Even then, compilers had to compensate for these problems. |
Interesting, Java got Intel jcc mitigation over 4 years ago https://github.com/openjdk/jdk17/blame/5fcf72086ffca85f524fae2d5bd9fd328c9a77e0/src/hotspot/cpu/x86/vm_version_x86.cpp#L1769 |
@iSazonov thanks for giving the Checked build a try -- I'm sorry to hear it didn't help. It looks like the JIT needs a more sophisticated mitigation to make a difference. Unfortunately, we're at a point in the product cycle where additional feature work won't be included in .NET 9. I'm going to keep this issue open, and re-target for .NET 10; based on feedback from you and other users, I think we're justified in pursuing a mitigation.
I hear you. To paraphrase @AndyAyersMS, the JIT is quite sophisticated in some places, and immature in others -- users with compiler backgrounds are sometimes surprised to discover the JIT lacks a particular feature. Historically, we've prioritized optimizations that would most benefit .NET customers. Thus, optimizations that reduce the cost of commonly-used language features would typically take precedence over more standard compiler features. We've recently made an effort to catch up in the latter domain (especially in .NET 9, considering the emphasis placed on loop optimizations and code layout), though clearly we still have some holes to patch. When compared to features that benefit all users, processor-specific features typically take lower priority, though the frequency of JCC errata-related regressions impacting our ability to triage changes to block layout incentivizes us to implement a mitigation. |
In code I shared above I manually unroll internal loop and I get great perf results:
|
Description
I ran a performance comparison of my simple project and was surprised that the code runs noticeably slower on .Net 9.0 than on .Net 8.0. (35 ms vs 28 ms on my notebook)
Moreover, a code (not my code) that I used as baseline (BinaryTrieBase) works almost the same, but my code is slower.
The project:
BinaryTrie.zip
Regression?
Yes.
Data
The text was updated successfully, but these errors were encountered: