-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: remove fallthrough limitations from redundant branch optimizer #97722
JIT: remove fallthrough limitations from redundant branch optimizer #97722
Conversation
Now that `BBJ_COND` no longer has fallthrough semantics, remove a bunch of code that would limit redundant branch jump threading when one of the predecessors was fallthrough. Contributes to dotnet#93020.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsNow that Contributes to #93020.
|
@amanasifkhalid PTAL |
Ah, my email didn't refresh automatically, and I just opened #97724 to do the same thing. I'll close that PR in favor of yours. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're here, in Compiler::optRedundantBranches()
, could you please rename the bbNext
and bbJump
locals? Maybe something like bbFalse
and bbTrue
...
Otherwise LGTM, thanks!
Done. |
Diff results for #97722Assembly diffsAssembly diffs for linux/arm64 ran on linux/x64Diffs are based on 2,506,119 contexts (1,007,092 MinOpts, 1,499,027 FullOpts). MISSED contexts: base: 8 (0.00%), diff: 1,199 (0.05%) Overall (-71,864 bytes)
FullOpts (-71,864 bytes)
Assembly diffs for osx/arm64 ran on linux/x64Diffs are based on 2,270,095 contexts (932,669 MinOpts, 1,337,426 FullOpts). MISSED contexts: base: 9 (0.00%), diff: 775 (0.03%) Overall (-66,736 bytes)
FullOpts (-66,736 bytes)
Assembly diffs for windows/arm64 ran on linux/x64Diffs are based on 2,340,158 contexts (938,449 MinOpts, 1,401,709 FullOpts). MISSED contexts: base: 8 (0.00%), diff: 959 (0.04%) Overall (-72,576 bytes)
FullOpts (-72,576 bytes)
Assembly diffs for windows/x64 ran on linux/x64Diffs are based on 2,511,303 contexts (997,391 MinOpts, 1,513,912 FullOpts). MISSED contexts: base: 8 (0.00%), diff: 909 (0.04%) Overall (-50,652 bytes)
FullOpts (-50,652 bytes)
Details here Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (+0.01% to +0.19%)
FullOpts (+0.01% to +0.22%)
Throughput diffs for linux/x64 ran on windows/x64Overall (+0.01% to +0.19%)
FullOpts (+0.01% to +0.21%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (+0.01% to +0.11%)
FullOpts (+0.01% to +0.14%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (+0.01% to +0.19%)
MinOpts (-0.01% to +0.00%)
FullOpts (+0.01% to +0.22%)
Throughput diffs for windows/x64 ran on windows/x64Overall (+0.01% to +0.19%)
FullOpts (+0.01% to +0.22%)
Details here Throughput diffs for windows/x86 ran on linux/x86Overall (+0.01% to +0.17%)
FullOpts (+0.01% to +0.17%)
Details here |
Diff results for #97722Assembly diffsAssembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,516,692 contexts (991,070 MinOpts, 1,525,622 FullOpts). MISSED contexts: base: 8 (0.00%), diff: 1,217 (0.05%) Overall (-35,373 bytes)
FullOpts (-35,373 bytes)
Details here Assembly diffs for linux/arm ran on windows/x86Diffs are based on 2,239,012 contexts (829,328 MinOpts, 1,409,684 FullOpts). MISSED contexts: base: 71,273 (3.08%), diff: 71,652 (3.10%) Overall (-25,352 bytes)
FullOpts (-25,352 bytes)
Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,292,076 contexts (839,658 MinOpts, 1,452,418 FullOpts). MISSED contexts: base: 1 (0.00%), diff: 1,420 (0.06%) Overall (-7,570 bytes)
FullOpts (-7,570 bytes)
Details here Throughput diffsThroughput diffs for linux/arm64 ran on linux/x64Overall (+0.01% to +0.19%)
FullOpts (+0.01% to +0.21%)
Throughput diffs for linux/x64 ran on linux/x64Overall (+0.01% to +0.19%)
FullOpts (+0.01% to +0.21%)
Details here Throughput diffs for linux/arm ran on windows/x86Overall (+0.01% to +0.14%)
FullOpts (+0.01% to +0.15%)
Details here Throughput diffs for linux/arm64 ran on windows/x64Overall (+0.01% to +0.19%)
FullOpts (+0.01% to +0.22%)
Throughput diffs for linux/x64 ran on windows/x64Overall (+0.01% to +0.19%)
FullOpts (+0.01% to +0.21%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (+0.01% to +0.11%)
FullOpts (+0.01% to +0.14%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (+0.01% to +0.19%)
FullOpts (+0.01% to +0.22%)
Throughput diffs for windows/x64 ran on windows/x64Overall (+0.01% to +0.19%)
FullOpts (+0.01% to +0.22%)
Details here |
Diff results for #97722Assembly diffsAssembly diffs for linux/arm64 ran on windows/x64Diffs are based on 2,506,119 contexts (1,007,092 MinOpts, 1,499,027 FullOpts). MISSED contexts: base: 8 (0.00%), diff: 1,199 (0.05%) Overall (-71,864 bytes)
FullOpts (-71,864 bytes)
Assembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,516,692 contexts (991,070 MinOpts, 1,525,622 FullOpts). MISSED contexts: base: 8 (0.00%), diff: 1,217 (0.05%) Overall (-35,373 bytes)
FullOpts (-35,373 bytes)
Assembly diffs for osx/arm64 ran on windows/x64Diffs are based on 2,270,095 contexts (932,669 MinOpts, 1,337,426 FullOpts). MISSED contexts: base: 9 (0.00%), diff: 775 (0.03%) Overall (-66,736 bytes)
FullOpts (-66,736 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 2,340,158 contexts (938,449 MinOpts, 1,401,709 FullOpts). MISSED contexts: base: 8 (0.00%), diff: 959 (0.04%) Overall (-72,576 bytes)
FullOpts (-72,576 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,511,303 contexts (997,391 MinOpts, 1,513,912 FullOpts). MISSED contexts: base: 8 (0.00%), diff: 909 (0.04%) Overall (-50,652 bytes)
FullOpts (-50,652 bytes)
Details here Assembly diffs for linux/arm ran on windows/x86Diffs are based on 2,239,012 contexts (829,328 MinOpts, 1,409,684 FullOpts). MISSED contexts: base: 71,273 (3.08%), diff: 71,652 (3.10%) Overall (-25,352 bytes)
FullOpts (-25,352 bytes)
Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,292,076 contexts (839,658 MinOpts, 1,452,418 FullOpts). MISSED contexts: base: 1 (0.00%), diff: 1,420 (0.06%) Overall (-7,570 bytes)
FullOpts (-7,570 bytes)
Details here Throughput diffsThroughput diffs for linux/arm ran on windows/x86Overall (+0.01% to +0.14%)
FullOpts (+0.01% to +0.15%)
Throughput diffs for windows/x86 ran on windows/x86Overall (+0.01% to +0.17%)
FullOpts (+0.01% to +0.17%)
Details here Throughput diffs for linux/arm64 ran on linux/x64Overall (+0.01% to +0.19%)
FullOpts (+0.01% to +0.21%)
Throughput diffs for linux/x64 ran on linux/x64Overall (+0.01% to +0.19%)
FullOpts (+0.01% to +0.21%)
Details here |
Diff results for #97722Assembly diffsAssembly diffs for linux/arm64 ran on windows/x64Diffs are based on 2,506,119 contexts (1,007,092 MinOpts, 1,499,027 FullOpts). MISSED contexts: base: 8 (0.00%), diff: 1,199 (0.05%) Overall (-71,864 bytes)
FullOpts (-71,864 bytes)
Assembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,516,692 contexts (991,070 MinOpts, 1,525,622 FullOpts). MISSED contexts: base: 8 (0.00%), diff: 1,217 (0.05%) Overall (-35,373 bytes)
FullOpts (-35,373 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 2,340,158 contexts (938,449 MinOpts, 1,401,709 FullOpts). MISSED contexts: base: 8 (0.00%), diff: 959 (0.04%) Overall (-72,576 bytes)
FullOpts (-72,576 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,511,303 contexts (997,391 MinOpts, 1,513,912 FullOpts). MISSED contexts: base: 8 (0.00%), diff: 909 (0.04%) Overall (-50,652 bytes)
FullOpts (-50,652 bytes)
Details here Assembly diffs for linux/arm ran on windows/x86Diffs are based on 2,239,012 contexts (829,328 MinOpts, 1,409,684 FullOpts). MISSED contexts: base: 71,273 (3.08%), diff: 71,652 (3.10%) Overall (-25,352 bytes)
FullOpts (-25,352 bytes)
Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,292,076 contexts (839,658 MinOpts, 1,452,418 FullOpts). MISSED contexts: base: 1 (0.00%), diff: 1,420 (0.06%) Overall (-7,570 bytes)
FullOpts (-7,570 bytes)
Details here Throughput diffsThroughput diffs for linux/arm ran on windows/x86Overall (+0.01% to +0.14%)
FullOpts (+0.01% to +0.15%)
Details here Throughput diffs for linux/arm64 ran on linux/x64Overall (+0.01% to +0.19%)
FullOpts (+0.01% to +0.21%)
Throughput diffs for linux/x64 ran on linux/x64Overall (+0.01% to +0.19%)
FullOpts (+0.01% to +0.21%)
Details here |
TP impact on benchmarks_pgo is oddly high (0.15%ish) compared to everything else. Will have to dig in. |
Looking at TP impact, I'm puzzled. runtime1 is my changes, merge-base at a102e88, runtime2 is at the merge-base. Main TP script and lab show:
but running locally in detail mode, via
and then
gives a different picture:
@dotnet/jit-contrib anyone ever seen things like this? Am tempted to write this off and just merge. |
The asm diff shows 79 misses in benchmarks.run_pgo in the diff. The way you are running the detailed TP measurement on the base and diff means that the diff is naturally going to execute fewer instructions since those 79 contexts won't be fully accounted for. I think @SingleAccretion has some scripts to create a proper .mcl with the non-missing contexts. You can also try the -details arg to the superpmi.exe invocation under pin. It will produce a .csv that can tell what the most diverging contexts are, so running the detailed pin tool on some of those only might be insightful. |
For reference, this is https://github.com/SingleAccretion/Dotnet-Runtime.Dev#pinps1---diff-the-tp-impact-measured-as-the-count-of-retired-instructions. It's not a standalone script however. |
Thanks, that makes sense. Let me try stripping those. |
Looks like that was indeed the issue; would not have expected 79/104,000 or so would matter that much.
Will look into the most afflicted contexts, could be there are some outliers. |
…e it can remove all preds of a BBJ_COND, and there's no point in retrying RBO from such blocks.
Since jump threading now happens more often, we see more cases where a BBJ_COND loses all its predecessors. The main driver loop was retrying RBO on any BBJ_COND where we changed flow, but if we've removed all preds, this is pointless. There is a bit more TP to be had by actually deleting (or perhaps just nuking the IR) in these blocks. I recall deleting blocks in RBO lead to problems but it might be worth revisiting. I see Assertion Prop at least spending some cycles looking at trees in dead blocks and possibly polluting the assertion table with irrelevant stuff. But I will probably save that for a follow-up. |
OSX build hosts unable to access runtime repo. |
Looked a bit into some of the bigger size regressions since I generally wouldn't expect to see them with more RBO. It appears it is a late flow opt deciding to duplicate more branches (again likely from fewer joints). We might need to revisit this heuristic too. |
Diff results for #97722Assembly diffsAssembly diffs for linux/arm64 ran on windows/x64Diffs are based on 2,505,885 contexts (1,007,092 MinOpts, 1,498,793 FullOpts). MISSED contexts: base: 258 (0.01%), diff: 1,433 (0.06%) Overall (-71,880 bytes)
FullOpts (-71,880 bytes)
Assembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,516,325 contexts (991,070 MinOpts, 1,525,255 FullOpts). MISSED contexts: base: 389 (0.02%), diff: 1,584 (0.06%) Overall (-36,066 bytes)
FullOpts (-36,066 bytes)
Assembly diffs for osx/arm64 ran on windows/x64Diffs are based on 2,270,098 contexts (932,669 MinOpts, 1,337,429 FullOpts). MISSED contexts: base: 26 (0.00%), diff: 772 (0.03%) Overall (-66,720 bytes)
FullOpts (-66,720 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 2,339,808 contexts (938,449 MinOpts, 1,401,359 FullOpts). MISSED contexts: base: 365 (0.02%), diff: 1,309 (0.06%) Overall (-72,740 bytes)
FullOpts (-72,740 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,510,842 contexts (997,391 MinOpts, 1,513,451 FullOpts). MISSED contexts: base: 478 (0.02%), diff: 1,370 (0.05%) Overall (-49,371 bytes)
FullOpts (-49,371 bytes)
Details here Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (+0.01% to +0.10%)
MinOpts (-0.01% to +0.00%)
FullOpts (+0.01% to +0.12%)
Throughput diffs for linux/x64 ran on windows/x64Overall (+0.01% to +0.10%)
FullOpts (+0.01% to +0.12%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (+0.01% to +0.04%)
MinOpts (-0.00% to +0.01%)
FullOpts (+0.01% to +0.06%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (+0.01% to +0.10%)
MinOpts (-0.00% to +0.01%)
FullOpts (+0.01% to +0.11%)
Throughput diffs for windows/x64 ran on windows/x64Overall (+0.01% to +0.08%)
FullOpts (+0.01% to +0.10%)
Details here Throughput diffs for linux/arm ran on windows/x86Overall (+0.00% to +0.09%)
FullOpts (+0.00% to +0.09%)
Throughput diffs for windows/x86 ran on windows/x86Overall (+0.01% to +0.08%)
FullOpts (+0.01% to +0.09%)
Details here Throughput diffs for linux/arm64 ran on linux/x64Overall (+0.01% to +0.11%)
FullOpts (+0.01% to +0.13%)
Throughput diffs for linux/x64 ran on linux/x64Overall (+0.01% to +0.11%)
FullOpts (+0.01% to +0.12%)
Details here |
Diff results for #97722Assembly diffsAssembly diffs for linux/arm64 ran on windows/x64Diffs are based on 2,505,885 contexts (1,007,092 MinOpts, 1,498,793 FullOpts). MISSED contexts: base: 258 (0.01%), diff: 1,433 (0.06%) Overall (-71,880 bytes)
FullOpts (-71,880 bytes)
Assembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,516,325 contexts (991,070 MinOpts, 1,525,255 FullOpts). MISSED contexts: base: 389 (0.02%), diff: 1,584 (0.06%) Overall (-36,066 bytes)
FullOpts (-36,066 bytes)
Assembly diffs for osx/arm64 ran on windows/x64Diffs are based on 2,270,098 contexts (932,669 MinOpts, 1,337,429 FullOpts). MISSED contexts: base: 26 (0.00%), diff: 772 (0.03%) Overall (-66,720 bytes)
FullOpts (-66,720 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 2,339,808 contexts (938,449 MinOpts, 1,401,359 FullOpts). MISSED contexts: base: 365 (0.02%), diff: 1,309 (0.06%) Overall (-72,740 bytes)
FullOpts (-72,740 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,510,842 contexts (997,391 MinOpts, 1,513,451 FullOpts). MISSED contexts: base: 478 (0.02%), diff: 1,370 (0.05%) Overall (-49,371 bytes)
FullOpts (-49,371 bytes)
Details here Assembly diffs for linux/arm ran on windows/x86Diffs are based on 2,239,029 contexts (829,328 MinOpts, 1,409,701 FullOpts). MISSED contexts: base: 71,275 (3.08%), diff: 71,635 (3.10%) Overall (-24,840 bytes)
FullOpts (-24,840 bytes)
Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,292,042 contexts (839,658 MinOpts, 1,452,384 FullOpts). MISSED contexts: base: 47 (0.00%), diff: 1,452 (0.06%) Overall (-9,165 bytes)
FullOpts (-9,165 bytes)
Details here Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (+0.01% to +0.10%)
MinOpts (-0.01% to +0.00%)
FullOpts (+0.01% to +0.12%)
Throughput diffs for linux/x64 ran on windows/x64Overall (+0.01% to +0.10%)
FullOpts (+0.01% to +0.12%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (+0.01% to +0.04%)
MinOpts (-0.00% to +0.01%)
FullOpts (+0.01% to +0.06%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (+0.01% to +0.10%)
MinOpts (-0.00% to +0.01%)
FullOpts (+0.01% to +0.11%)
Throughput diffs for windows/x64 ran on windows/x64Overall (+0.01% to +0.08%)
FullOpts (+0.01% to +0.10%)
Details here Throughput diffs for linux/arm ran on windows/x86Overall (+0.00% to +0.09%)
FullOpts (+0.00% to +0.09%)
Throughput diffs for windows/x86 ran on windows/x86Overall (+0.01% to +0.08%)
FullOpts (+0.01% to +0.09%)
Details here Throughput diffs for linux/arm64 ran on linux/x64Overall (+0.01% to +0.11%)
FullOpts (+0.01% to +0.13%)
Throughput diffs for linux/x64 ran on linux/x64Overall (+0.01% to +0.11%)
FullOpts (+0.01% to +0.12%)
Details here |
That cut worst-case TP roughly in half. Going to call this good. |
Now that
BBJ_COND
no longer has fallthrough semantics, remove a bunch of code that would limit redundant branch jump threading when one of the predecessors was fallthrough.Contributes to #93020.