-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unroll byref struct copies #86820
unroll byref struct copies #86820
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue Detailsnull
|
Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it. |
Diff results - lots of interference from the negative diffs hitting all PRs right now linux arm64Diffs are based on 876,006 contexts (105,646 MinOpts, 770,360 FullOpts). Overall (-4,320 bytes) linux x64Diffs are based on 929,778 contexts (110,567 MinOpts, 819,211 FullOpts). Overall (+1,189 bytes) windows arm64Diffs are based on 923,219 contexts (84,344 MinOpts, 838,875 FullOpts). Overall (-4,164 bytes) windows x64Diffs are based on 987,198 contexts (109,803 MinOpts, 877,395 FullOpts). Overall (-4,641 bytes) linux armDiffs are based on 846,987 contexts (105,632 MinOpts, 741,355 FullOpts). MISSED contexts: 33,754 (3.99%) Overall (-10,776 bytes) windows x86Diffs are based on 976,498 contexts (114,341 MinOpts, 862,157 FullOpts). Overall (+2,116 bytes) |
good diff from arm64
|
similar on x64
|
occasional size regression
|
@dotnet/jit-contrib |
/azp run runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 1 pipeline(s). |
Thank you! It looks good to me. CC @t-mustafin @alpencolt |
<Project Sdk="Microsoft.NET.Sdk"> | ||
<PropertyGroup> | ||
<!-- Needed for CLRTestEnvironmentVariable --> | ||
<RequiresProcessIsolation>true</RequiresProcessIsolation> | ||
|
||
<AllowUnsafeBlocks>True</AllowUnsafeBlocks> | ||
<Optimize>True</Optimize> | ||
</PropertyGroup> | ||
<ItemGroup> | ||
<Compile Include="$(MSBuildProjectName).cs" /> | ||
|
||
<CLRTestEnvironmentVariable Include="DOTNET_TieredCompilation" Value="0" /> | ||
<CLRTestEnvironmentVariable Include="DOTNET_JITMinOpts" Value="0" /> | ||
</ItemGroup> | ||
</Project> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason to set these environment variables and RequiresProcessIsolation
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this might be my misunderstanding after seeing these set in a number (which isn't actually as many as I thought) of optimization tests. It looks like they are set when we do FileCheck asm checks (ideally we could say something like "do FileCheck X if and only if optimized" rather than "always optimize and do FileCheck X"), but I figured normal asm diffs would be sufficient here.
I think I'm still missing something because I'd expect most tests to run without optimizations due to tiering yet most of the asmdiff contexts have optimizations enabled. Perhaps if I dug into the asmdiff setup I'd find some settings for most/all tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm still missing something because I'd expect most tests to run without optimizations due to tiering yet most of the asmdiff contexts have optimizations enabled. Perhaps if I dug into the asmdiff setup I'd find some settings for most/all tests?
PR runs of runtime
always run with tiered compilation disabled. In the rolling builds on main
we run all tests both with and without tiered compilation enabled. It's set here:
runtime/eng/pipelines/common/templates/runtimes/run-test-job.yml
Lines 376 to 382 in ac3979a
${{ elseif eq(variables['Build.Reason'], 'PullRequest') }}: | |
scenarios: | |
- no_tiered_compilation | |
${{ else }}: | |
scenarios: | |
- normal | |
- no_tiered_compilation |
So usually (for non file check) we will not set any environment variables and leave it up to these defined scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks! I hadn't found this test snippet yet.
/azp run runtime-coreclr superpmi-diffs, runtime-coreclr superpmi-replay |
Azure Pipelines successfully started running 2 pipeline(s). |
src/coreclr/jit/layout.cpp
Outdated
unsigned slots = this->GetSlotCount(); | ||
for (unsigned i = 0; i < slots; i++) | ||
{ | ||
if (this->IsGCByRef(i)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We usually don't prefix with this
unless necessary.
cleaner results; linux arm64Diffs are based on 1,536,168 contexts (480,629 MinOpts, 1,055,539 FullOpts). Overall (-5,232 bytes) linux x64Diffs are based on 1,660,406 contexts (558,006 MinOpts, 1,102,400 FullOpts). Overall (-1,142 bytes) osx arm64Diffs are based on 1,187,983 contexts (429,171 MinOpts, 758,812 FullOpts). Overall (-5,732 bytes) windows arm64Diffs are based on 1,401,929 contexts (440,568 MinOpts, 961,361 FullOpts). Overall (-4,416 bytes) windows x64Diffs are based on 1,661,390 contexts (471,241 MinOpts, 1,190,149 FullOpts). MISSED contexts: 3 (0.00%) Overall (-8,993 bytes) linux armDiffs are based on 1,323,737 contexts (358,680 MinOpts, 965,057 FullOpts). MISSED contexts: 44,200 (3.34%) Overall (-13,518 bytes) windows x86Diffs are based on 1,438,138 contexts (387,507 MinOpts, 1,050,631 FullOpts). No diffs found. |
Extended set of cases: https://csharp.godbolt.org/z/zf8e5qa4G |
Thanks @xtqqczze. I think that I have all of these except for CreateSpan3 (quite similar to CreateSpan2 except with byref instead of |
bool ClassLayout::HasGCByRef() const | ||
{ | ||
unsigned slots = GetSlotCount(); | ||
for (unsigned i = 0; i < slots; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an intuition that the GCByRef slots are at the beginning or at the end or scattered? In other words, are we more likely to fast-return if we walk them list backwards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any intuition or measurements on this. If necessary, then this could be cached similar to whether the layout has any gc pointers, though if I recall correctly we are out of bits at this size.
If a struct contains a byref, then it is known to be on the stack/regs (not in the heap), so GC write barriers are not required. This adds that case to lower*.cpp and attempts to make the code more similar. I didn't actually factor them (especially with a few subtle differences such as the call to
getUnrollThreshold
).This partially handles #80086. It improves the code for common cases, but since the strategy is not always used, the correctness issue in it is not completely handled. Next step is to apply the fix for that and see how bad the regressions are; this change will reduce the impact.
Example: