-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Clean up gtCloneExpr #97411
JIT: Clean up gtCloneExpr #97411
Conversation
- Remove its ability to replace locals by constants while cloning. Only loop unrolling uses this capability and it can be replaced by a simple visit over the tree after cloning, so that everyone else does not need to pay for it. - Remove its ability to add flags. This is only used by hoisting to add `GTF_MAKE_CSE` flag recursively to the cloned tree. I do not see any good reason not just to only put this on the root node. - Simplify how it propagates flags; simply copy them from the source node instead of having a separate `gtUpdateNodeSideEffects` call for the cloned tree. I do not see why the cloned version should have its flags updated like this when the original shouldn't.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue Details
A few diffs expected due to not marking hoisted clones with
|
Diff results for #97411Assembly diffsAssembly diffs for linux/arm64 ran on windows/x64Diffs are based on 2,501,261 contexts (1,003,806 MinOpts, 1,497,455 FullOpts). MISSED contexts: 3,956 (0.16%) Overall (-664 bytes)
FullOpts (-664 bytes)
Assembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,595,036 contexts (1,052,329 MinOpts, 1,542,707 FullOpts). MISSED contexts: 3,599 (0.14%) Overall (-1,262 bytes)
FullOpts (-1,262 bytes)
Assembly diffs for osx/arm64 ran on windows/x64Diffs are based on 2,262,764 contexts (930,876 MinOpts, 1,331,888 FullOpts). MISSED contexts: 3,201 (0.14%) Overall (-672 bytes)
FullOpts (-672 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 2,318,293 contexts (931,543 MinOpts, 1,386,750 FullOpts). MISSED contexts: 2,601 (0.11%) Overall (-616 bytes)
FullOpts (-616 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,492,949 contexts (983,689 MinOpts, 1,509,260 FullOpts). MISSED contexts: 3,862 (0.15%) Overall (-2,177 bytes)
FullOpts (-2,177 bytes)
Details here Assembly diffs for linux/arm ran on windows/x86Diffs are based on 2,237,703 contexts (827,812 MinOpts, 1,409,891 FullOpts). MISSED contexts: 74,543 (3.22%) Overall (-328 bytes)
FullOpts (-328 bytes)
Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,298,783 contexts (841,817 MinOpts, 1,456,966 FullOpts). MISSED contexts: base: 2,552 (0.11%), diff: 2,555 (0.11%) Overall (-1,172 bytes)
FullOpts (-1,172 bytes)
Details here Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (-0.27% to -0.11%)
MinOpts (-0.10% to -0.02%)
FullOpts (-0.27% to -0.11%)
Throughput diffs for linux/x64 ran on windows/x64Overall (-0.24% to -0.12%)
MinOpts (-0.12% to -0.02%)
FullOpts (-0.24% to -0.12%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (-0.18% to -0.11%)
MinOpts (-0.13% to -0.02%)
FullOpts (-0.20% to -0.11%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (-0.25% to -0.11%)
MinOpts (-0.13% to -0.02%)
FullOpts (-0.25% to -0.11%)
Throughput diffs for windows/x64 ran on windows/x64Overall (-0.23% to -0.13%)
MinOpts (-0.15% to -0.02%)
FullOpts (-0.23% to -0.13%)
Details here Throughput diffs for linux/arm ran on windows/x86Overall (-0.19% to -0.13%)
MinOpts (-0.14% to -0.03%)
FullOpts (-0.22% to -0.13%)
Throughput diffs for windows/x86 ran on windows/x86Overall (-0.22% to -0.14%)
MinOpts (-0.20% to -0.03%)
FullOpts (-0.26% to -0.14%)
Details here Throughput diffs for linux/arm64 ran on linux/x64Overall (-0.26% to -0.11%)
MinOpts (-0.08% to 0.00%)
FullOpts (-0.26% to -0.11%)
Throughput diffs for linux/x64 ran on linux/x64Overall (-0.23% to -0.11%)
MinOpts (-0.10% to 0.00%)
FullOpts (-0.23% to -0.11%)
Details here |
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, Fuzzlyn |
Azure Pipelines successfully started running 3 pipeline(s). |
cc @dotnet/jit-contrib PTAL @EgorBo, maybe @AndyAyersMS too since you've been looking at Should be ready pending CI. Diffs. Some from the |
// This is used to replace the loop iterator with the constant value when | ||
// unrolling. | ||
// | ||
void Compiler::optReplaceScalarUsesWithConst(BasicBlock* block, unsigned lclNum, ssize_t cnsVal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familair with the existing logic - can you explain why it's fine to do propagation like this? can the locals be address exposed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FlowGraphNaturalLoop::AnalyzeIteration
checks for this (it guarantees that the iterator local is only defined once within the loop and that the loop invariant is always true).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise
Diff results for #97411Assembly diffsAssembly diffs for linux/arm64 ran on windows/x64Diffs are based on 2,501,157 contexts (1,003,806 MinOpts, 1,497,351 FullOpts). MISSED contexts: 4,060 (0.16%) Overall (-1,372 bytes)
FullOpts (-1,372 bytes)
Assembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,595,007 contexts (1,052,329 MinOpts, 1,542,678 FullOpts). MISSED contexts: 3,628 (0.14%) Overall (-2,033 bytes)
FullOpts (-2,033 bytes)
Assembly diffs for osx/arm64 ran on windows/x64Diffs are based on 2,262,709 contexts (930,876 MinOpts, 1,331,833 FullOpts). MISSED contexts: 3,256 (0.14%) Overall (-1,300 bytes)
FullOpts (-1,300 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 2,318,207 contexts (931,543 MinOpts, 1,386,664 FullOpts). MISSED contexts: 2,687 (0.12%) Overall (-1,208 bytes)
FullOpts (-1,208 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,492,912 contexts (983,689 MinOpts, 1,509,223 FullOpts). MISSED contexts: 3,899 (0.16%) Overall (-3,190 bytes)
FullOpts (-3,190 bytes)
Details here Assembly diffs for linux/arm ran on windows/x86Diffs are based on 2,237,690 contexts (827,812 MinOpts, 1,409,878 FullOpts). MISSED contexts: 74,588 (3.23%) Overall (-724 bytes)
FullOpts (-724 bytes)
Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,296,274 contexts (841,817 MinOpts, 1,454,457 FullOpts). MISSED contexts: base: 5,093 (0.22%), diff: 5,096 (0.22%) Overall (-1,264 bytes)
FullOpts (-1,264 bytes)
Details here Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (-0.26% to -0.11%)
MinOpts (-0.10% to -0.02%)
FullOpts (-0.26% to -0.11%)
Throughput diffs for linux/x64 ran on windows/x64Overall (-0.23% to -0.12%)
MinOpts (-0.12% to -0.02%)
FullOpts (-0.23% to -0.12%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (-0.18% to -0.11%)
MinOpts (-0.13% to -0.02%)
FullOpts (-0.20% to -0.11%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (-0.24% to -0.11%)
MinOpts (-0.13% to -0.02%)
FullOpts (-0.24% to -0.11%)
Throughput diffs for windows/x64 ran on windows/x64Overall (-0.23% to -0.12%)
MinOpts (-0.15% to -0.02%)
FullOpts (-0.23% to -0.12%)
Details here Throughput diffs for linux/arm64 ran on linux/x64Overall (-0.25% to -0.10%)
MinOpts (-0.08% to 0.00%)
FullOpts (-0.25% to -0.10%)
Throughput diffs for linux/x64 ran on linux/x64Overall (-0.23% to -0.11%)
MinOpts (-0.10% to 0.00%)
FullOpts (-0.23% to -0.11%)
Details here Throughput diffs for linux/arm ran on windows/x86Overall (-0.19% to -0.13%)
MinOpts (-0.14% to -0.03%)
FullOpts (-0.20% to -0.13%)
Throughput diffs for windows/x86 ran on windows/x86Overall (-0.22% to -0.14%)
MinOpts (-0.20% to -0.03%)
FullOpts (-0.25% to -0.14%)
Details here |
GTF_MAKE_CSE change is good, I had the same thing in the branch I was using to investigate why we skip doing some of them. |
Diff results for #97411Assembly diffsAssembly diffs for linux/arm ran on windows/x86Diffs are based on 2,237,690 contexts (827,812 MinOpts, 1,409,878 FullOpts). MISSED contexts: 74,588 (3.23%) Overall (-724 bytes)
FullOpts (-724 bytes)
Details here |
Failures are #97437 |
GTF_MAKE_CSE
flag recursively to the cloned tree. I do not see any good reason not just to only put this on the root node.gtUpdateNodeSideEffects
call for the cloned tree. I do not see why the cloned version should have its flags updated like this when the original shouldn't.A few diffs expected due to not marking hoisted clones with
GTF_MAKE_CSE
recursively, which sometimes allows us to optimize them away. Some other diffs from the removal ofgtUpdateNodeSideEffects(copy)
ingtCloneExpr
, which can impact cases where we get our flag maintenance wrong (e.g. #13758).