Consider re-compling CoreCLR's unmanaged code to mitigate JCC erratum (Nov. 2019) #13794

damageboy · 2019-11-14T12:24:12Z

Intel has recently released a microcode update that adversely affects the performance of... all jumps (!):

https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf

I've personally had the "pleasure" of encountering this whilst benchmarking CoreCLR 3.0 Array.Sort() on clearlinux which seems to have pushed this microcode update earlier than others:

https://twitter.com/damageboy/status/1194751035136450560?s=20

For context, Here's the before/after perf hit on Array.Sort():
Before:

After:

Very specifically, I was hit with a 20% drop in Array.Sort(), and a smaller (by sheer luck I presume?) hit with my own code. Intel claims that the average drop in performance should be in the %0-%4 range.

The erratum claims that the microcode update adversely effects:

In this context, Jump Instructions include all jump types: conditional jump (Jcc), macro-
fused op-Jcc (where op is one of cmp, test, add, sub, and, inc, or dec), direct
unconditional jump, indirect jump, direct/indirect call, and return.

The recommended fix, per the erratum, is to recompile the code with (Section 3.1.1):
-mbranches-within-32B-boundaries

In theory, this could end with a relatively minor(?) CoreCLR runtime release. Or incorporated into the 3.1 release cycle?

In addition, I think we should have a discussion about possibly applying similar (conditional) fixes to the JIT itself w.r.t the generated code on Intel processors, as it doesn't seem this issue is about to simply go away on its own.

I will open a separate issue for the JIT related discussion.

The text was updated successfully, but these errors were encountered:

jkotas · 2019-11-14T12:36:30Z

Thanks for opening the issue on this. We have been looking into this.

stephentoub · 2019-11-14T12:53:27Z

Separately, since you used Array.Sort as an example, it's no longer implemented in C; as of last week the implementation is all C#. I'm curious if that affects your stated impact at all, or if you see a similar affect on Array.Sort with the latest master?

damageboy · 2019-11-14T13:07:57Z

@stephentoub That's a very good question.
I haven't tested with master yet, and I am aware of the C++ -> C# transition for that specific piece of code.

My code itself (in this case the Scalar + Unmanaged variations of IntroSort) is definitely
affected (as seen in the same screenshot), with very wild outcomes:

Scalar, for example, suffers a 13% perf drop.
Unmanaged is hit with a very minor 1.4% drop.

Given that It was pretty much a copy-paste of IntroSort from CoreCLR to begin with, I think this gives you a "preview" of where this might go.

I assume that the effects are random in nature, depending on code generation/allocation address randomness.
But in general, the more branchy your code is, you'll probably get hit harder, which can also be seen in my two examples: these are practically the same functions, except that only one of them has extra bounds-checking that is not eliminated normally (nor did I attempt to optimize it away) and the other doesn't...

My other benchmarks actually end up doing substantially fewer branches (as in orders of
magnitudes fewer branches), so I suspect I won't see a lot of movement on that front...

damageboy · 2019-11-15T04:41:40Z

Just wanted to circle back and to say that I managed to revert the microcode update in a very localized way on clearlinux (or I assume any Linux):

Change the kernel boot parameters to avoid early loading of microcode updates (remove the initrd=/EFI/org.clearlinux/freestanding-00-intel-ucode.cpio from the boot cmd line)
Revert to an older package of microcode from Intel to make sure that late loading of microcode does not sneak the update back in, by updating /lib/firmware/intel-ucode/ to the contents of the 2019-09-18 ucode release

After doing those two steps and (naturally) rebooting, I'm back to older microcode (0xb4) for my kaby-lake processor:

$ grep 'stepping\|model\|microcode' /proc/cpuinfo | head -4
model		: 158
model name	: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
stepping	: 9
microcode	: 0xb4

Rerunning the same benchmarks now yields the good old results I've been seeing in the past months:

BenchmarkDotNet=v0.12.0, OS=clear-linux-os 31620
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 4 logical and 4 physical cores
.NET Core SDK=3.0.100
  [Host]     : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT
  Job-BMTZZL : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT

InvocationCount=3  IterationCount=15  LaunchCount=2  
UnrollFactor=1  WarmupCount=10

Method	N	Mean [us]	Time / N [ns]
ArraySort	100	2.795	27.9545
Scalar	100	4.511	45.1068
Unmanaged	100	4.154	41.5399

ArraySort	1000	35.758	35.7583
Scalar	1000	58.767	58.7666
Unmanaged	1000	42.754	42.7538

ArraySort	10000	507.538	50.7538
Scalar	10000	680.009	68.0009
Unmanaged	10000	516.547	51.6547

ArraySort	100000	5,920.847	59.2085
Scalar	100000	6,678.108	66.7811
Unmanaged	100000	5,585.576	55.8558

ArraySort	1000000	69,018.648	69.0186
Scalar	1000000	76,603.616	76.6036
Unmanaged	1000000	65,503.492	65.5035

ArraySort	10000000	794,042.304	79.4042
Scalar	10000000	882,212.364	88.2212
Unmanaged	10000000	743,297.864	74.3298

danmoseley · 2020-02-28T19:31:06Z

@brianrob is work planned here? Can you provide guidance on what to do with this and #13795

brianrob · 2020-02-28T19:44:59Z

I hope to have all of the data in a format for sharing shortly. The short answer is that I don't think there is any immediate work that needs to be done here, but I will provide more context on this once I have all of the data compiled.

jkotas assigned brianrob Nov 14, 2019

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

damageboy mentioned this issue Jan 31, 2020

Consider mitigating the JCC erratum in the JIT (Nov. 2019) #13795

Closed

maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020

This was referenced May 1, 2020

JCC Erratum Impact on .NET Core #35730

Open

JCC Erratum Impact on .NET Core dotnet/announcements#153

Open

damageboy closed this as completed May 2, 2020

ghost locked as resolved and limited conversation to collaborators Dec 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider re-compling CoreCLR's unmanaged code to mitigate JCC erratum (Nov. 2019) #13794

Consider re-compling CoreCLR's unmanaged code to mitigate JCC erratum (Nov. 2019) #13794

damageboy commented Nov 14, 2019

jkotas commented Nov 14, 2019

stephentoub commented Nov 14, 2019 •

edited

Loading

damageboy commented Nov 14, 2019 •

edited

Loading

damageboy commented Nov 15, 2019 •

edited

Loading

danmoseley commented Feb 28, 2020

brianrob commented Feb 28, 2020

Consider re-compling CoreCLR's unmanaged code to mitigate JCC erratum (Nov. 2019) #13794

Consider re-compling CoreCLR's unmanaged code to mitigate JCC erratum (Nov. 2019) #13794

Comments

damageboy commented Nov 14, 2019

jkotas commented Nov 14, 2019

stephentoub commented Nov 14, 2019 • edited Loading

damageboy commented Nov 14, 2019 • edited Loading

damageboy commented Nov 15, 2019 • edited Loading

danmoseley commented Feb 28, 2020

brianrob commented Feb 28, 2020

stephentoub commented Nov 14, 2019 •

edited

Loading

damageboy commented Nov 14, 2019 •

edited

Loading

damageboy commented Nov 15, 2019 •

edited

Loading