Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TrimMode partial very slow to link #3083

Closed
markphillips100 opened this issue Oct 25, 2022 · 33 comments
Closed

TrimMode partial very slow to link #3083

markphillips100 opened this issue Oct 25, 2022 · 33 comments

Comments

@markphillips100
Copy link

I've been using self-contained trimming to publish apps in linux docker images in .NET 6 for a little while now, and the link step usually takes about a minute or so. There are references to my own libs and 3rd parties that don't publish themselves as trimmable, so as expected, I get a ton of warnings, and of course the code isn't trimmed being that this is all .NET 6.

Moving along to .NET 7 RC.2 and when I use the default TrimMode "full" the app breaks as expected as all those warnings will be ignored and code removed. The link step is roughly the same length of time to complete though.

As documentation suggests, I switch to "partial" to obtain the previous .NET 6 behaviour. This time code is not stripped and the app works but the linker step takes 20 times longer. Literally 20 minutes to complete.

Anyone else experience this or know what might be going on? I don't have a repo I can supply for a reproducible example unfortunately, only my private code.

In the meantime I thought I might be able to use "full" trim mode if I resolve the warnings. Whilst I can set IsTrimmable false on my libs I don't know of a way of telling the linker to ignore a 3rd party lib that hasn't yet supplied an appropriate IsTrimmable. Is there some undocumented approach here?

@marek-safar
Copy link
Contributor

/cc @sbomer

@vitek-karas
Copy link
Member

@jtschuster for the perf question - I know that we fixed some perf issues recently, were the issues present in RC2? (and are we shipping the linker with them in 7.0 GA?)

@jtschuster
Copy link
Member

jtschuster commented Oct 25, 2022

@vitek-karas The worst of the regression was fixed here and should be included in RC1, but there was still a net regression and the fix has yet to be merged in. That fix slightly modifies behavior (though really just fixes buggy behavior), though, so I'm not sure if we could get it in GA.

@sbomer
Copy link
Member

sbomer commented Oct 25, 2022

I don't know of a way of telling the linker to ignore a 3rd party lib that hasn't yet supplied an appropriate IsTrimmable. Is there some undocumented approach here?

You can set <IsTrimmable>false</IsTrimmable> on specific assemblies following this example https://learn.microsoft.com/en-us/dotnet/core/deploying/trimming/trimming-options?pivots=dotnet-6-0#trimming-settings-for-individual-assemblies.

@markphillips100
Copy link
Author

@sbomer thanks for your suggestion. I completely missed that in the docs and will give it a try (in a few days hopefully) as if that at least gets me a working "full" trim mode build for .NET 7 when released then happy to incorporate trimmable deps as and when they support it.

@vitek-karas
Copy link
Member

@jtschuster could you please try to see if we can repro the perf regression as mentioned by @markphillips100 ? If it's that severe I think we should at least consider doing something about it for 7.0.

@jtschuster
Copy link
Member

Sure, I'll take a look. @markphillips100 if you're able to share some of the 3rd party libraries you're using, that could help me reproduce the issue.

@jtschuster
Copy link
Member

Compiler generated code analysis could also be causing a slowdown. @markphillips100 do you use yield return, local functions, and anonymous functions a lot throughout your codebase?

@sbomer are there any other bottlenecks related to compiler generated code I'm missing?

@sbomer
Copy link
Member

sbomer commented Oct 26, 2022

It's definitely possible that this could be related to compiler-generated code handling. The potential performance issues I'm aware of in that area are:

  • Compiler-generated code is currently the only place we do a fixpoint-style analysis (iterating over the group of methods until the analysis converges). Certain patterns could make that analysis expensive by requiring many passes. This could in theory be an issue for non-compiler-generated code too, but we don't do a fixpoint analysis there currently.

  • Compiler-generated code analysis makes a conservative approximation of values for hoisted locals, by tracking all values assigned in the group of methods. Certain patterns could result in large MultiValues with many tracked values. I think this can also happen in non-compiler-generated code, but the conservative analysis might make it more likely for compiler-generated code.

#2977 is one example of the latter issue, but it's probably easy to come up with others. We might need a general way to stop the analysis if it takes too many passes or tracks too many values, and emit a general warning like "this method was too complex to analyze".

I don't think #2979 is in 7.0 so it might be that issue, but it's hard to be sure - I think we need some kind of repro.

@sbomer
Copy link
Member

sbomer commented Oct 26, 2022

if that at least gets me a working "full" trim mode build for .NET 7 when released then happy to incorporate trimmable deps as and when they support it.

@markphillips100 if you are looking to disable trimming for all assemblies that don't have the [assembly: AssemblyMetadata("IsTrimmable", "True") attribute, that is already the behavior of partial. The docs I linked are useful if you want to disable trimming case-by-case. Just wanted to clarify that in case it saves you some trouble.

@markphillips100
Copy link
Author

@jtschuster I've used all 3 approaches in my libs that I reference, but not extensively. I can't speak for the 3rd party libs of course.

@sbomer Thanks for the heads up. I'm guessing that by applying this trim disabling during full mode I just makes the process approach partial's behaviour and so would encounter the same issue.

I think I'll try and come up with a repro if I can, at least see if I can isolate most of my code and perhaps pin down a smaller set of 3rd party libs just to make it easier on you all. Bit side-tracked on other work maybe the rest of this week so likely the weekend before I respond with what I come up with.

@jtschuster
Copy link
Member

@markphillips100 One other thing that could help us narrow down the issue is if you try to use the latest linker build in your project. We just merged a couple of changes that improve performance, and if they fix the slowdown you're seeing, we could look into porting them back to .Net 7.

To do this, add a package reference to the latest linker build in your .csproj file:

  <ItemGroup>
    <PackageReference Include="Microsoft.Net.ILLink.Tasks" Version="7.0.100-1.22527.4" />
  </ItemGroup>

Then, add a nuget.config file if you don't already have one by running dotnet new nugetconfig in the same directory as the .csproj, and add the dotnet8 package sources by adding the <add key=...> line below.

<configuration>
  <packageSources>
    <add key="dotnet8" value="https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet8/nuget/v3/index.json" />
  </packageSources>
</configuration>

Hopefully trimming will be a bit faster.

@markphillips100
Copy link
Author

@jtschuster that might be the quicker option to try. I'll hopefully get to it later today or at least over the weekend.

@markphillips100
Copy link
Author

@jtschuster I tried the dotnet8 linker with partial trim mode setting and I think we have some success.

I thought code was still trimmed but turns out AutoMapper does some private reflection in the version I use and this causes an issue with Enumerable.MaxFloat in dotnet7. So not related to trimming I believe. I was able to comment the code registering AutoMapper to continue the test.

Automapper issue aside, the time for linking in partial mode is around 1 minute, and the app is working. I'd call that a success.

@markphillips100
Copy link
Author

Just for time comparison completeness, full trim mode links in approximately 10 seconds. My app breaks due to warnings trimmed of course, as expected which is fine.

@marek-safar
Copy link
Contributor

Hmm, I think 1 minute is still way too long.

@markphillips100
Copy link
Author

I'm running on an i7-8700k (not overclocked) just for added perspective. I did have "normal" msbuild logging too so will switch to minimal logging and try again tmrw in case that makes some difference.

@jtschuster
Copy link
Member

jtschuster commented Oct 31, 2022

@markphillips100 Thanks for testing that for us, it's good to hear it's back to around .net6 speeds.

@vitek-karas I've ported the virtual method marking change back to the release branch if we want to try to get the fix through servicing (#3094). That should have the biggest impact, but the ExportedType caching had a significant improvement as well (#3075) and is a simple enough change that we could port it back too.

@vitek-karas
Copy link
Member

I think we should base this on data. Ideally we would have some measurements of:

  • How long it took to trim a given app in .NET 6 (using the default settings - to partial trimming)
  • How long does it take to trim the same app in .NET 7 using compatible settings (so partial trimming)
  • How much each of the two changes helps in the above case
  • And maybe additionally, how long does it to take to trim the app in .NET 7 using the new default (full trimming) - just for context.

Taking servicing fixing is always a compromise between the risk of taking a change (depends on the complexity of the change as well) and the benefits. For example, it's not very likely we would take a servicing change which improves perf by 5%, but we would definitely like to take one which improves it by 50%.

So the main question is - how much of a regression we shipped in .NET 7 and how close can we get to .NET 6 with either of the two fixes.

@jtschuster
Copy link
Member

Profiling the linker locally with VS on a hello world console app, I got the following results:

release/6.0.x release/7.0 #3094
2846 cpu ms 5088 cpu ms 3528 cpu ms

#3094 takes 69% as long as release/7.0, a 1.45x speedup

This is probably where we would see the least improvement, since the virtual method issues scaled particularly poorly (in ASP.Net benchmark build, the time went from 87310 cpu ms to 32011 cpu ms (a 2.72x speedup) after #3073), so I think with a 30+% reduction in execution time on hello world, we should try to take it to servicing.

I didn't test full vs partial on hello world since that should be the same with the framework fully trim compatible, and I don't know of another great benchmark project to test.

@vitek-karas
Copy link
Member

Another thing we can try to measure is the sfx project in runtime repo. It is not a good benchmark for trimming per se (as it removes very little), but it's a good benchmark for analysis, since it runs through all of BCL's code. And its setup should be nearly identical between 6 and 7.

@markphillips100
Copy link
Author

I've just tried the 7.0.3 SDK's linker for my partial trimming issue and unfortunately there's no real improvement. The dotnet build is fine but dotnet publish reports Time Elapsed 00:17:14.54.

If I simply switch to <PackageReference Include="Microsoft.Net.ILLink.Tasks" Version="8.0.100-1.22612.2" /> the time reports Time Elapsed 00:00:35.33

In case it helps, my csproj trim-relevant property settings are (and have always been since issue opened) as follows.
PublishSelfContained is set to true when I want things partial trimmed.

    <RuntimeIdentifier Condition=" '$(PublishSelfContained)' == 'true'">linux-musl-x64</RuntimeIdentifier>
    <PublishTrimmed Condition=" '$(PublishSelfContained)' == 'true'">true</PublishTrimmed>
    <TrimMode Condition=" '$(PublishSelfContained)' == 'true' AND '$(TargetFramework)' == 'net7.0'">partial</TrimMode>
    <EnableTrimAnalyzer>true</EnableTrimAnalyzer>

So I guess the question is, what further behavioural difference is there between the current 7.0.3 linker and 8.0.100-1.22612.2?

@markphillips100
Copy link
Author

@jtschuster I created a repro sample app to demonstrate the relative linker times between dotnet 7 and 8 and when including a single package reference (there are still packages referenced by this package though of course). The app shows the considerable difference a partial trim mode setting has for dotnet 7.0.103 when specifically including Microsoft.EntityFrameworkCore.SqlServer only.

I don't think whatever the issue might be is isolated only to this package, just that it demonstrates an added greater latency than other packages I include in my projects. During my initial phase of attempting to isolate a package for demo purposes the publish times for me swung from 48s with no packages referenced up to almost 6 minutes. This package clearly doesn't contribute the full 6 minutes, but maybe it's a start to try and isolate the problem?

Sample app repo is here

@jtschuster
Copy link
Member

@markphillips100 Thank you for the repro project, it really helps. It looks like the version of the SDK in your sample repro's global.json is 7.0.103, though, not 7.0.300 (Which is still in preview. You'll need to install from https://github.com/dotnet/installer#table). Running the tests on my machine, the 7.0.103 linker took about 30 seconds, the 7.0.300-preview sdk took about 16 seconds, and .net 8 sdk took about 13 seconds. It's still a little different (likely due to differences in runtime libraries and other smaller linker changes), but it the 7.0.300 linker should be at least in the same ballpark as the 8.0 linker for your larger project that was taking 17 minutes to trim with the 7.0.103 linker.

It is interesting that the efcore reference slows the linker down 3x, I'll look more into that.

Also, I apologize for making it sound like the 7.0.300 sdk was fully released, I'll update the pinned issue to make it clearer.

@markphillips100
Copy link
Author

Ah so I think I got my versions mixed up too. I shall try the preview.

I'd be really interested to know why specifically that efcore was slower too. Thanks

@markphillips100
Copy link
Author

@jtschuster do you know of a dockerfile for the 7.0.300-preview sdk as I ultimately want to test out the selfcontained builds? I don't see one here https://mcr.microsoft.com/en-us/product/dotnet/nightly/sdk/tags is all.

@jtschuster
Copy link
Member

@markphillips100 I'm not sure, I'm pretty unfamiliar with our docker ecosystem. @richlander @agocke @vitek-karas Do you know if we publish docker images for preview servicing releases (i.e. .Net SDK 7.0.300)?

@markphillips100
Copy link
Author

@jtschuster I may just test a local publish only for now. No real need to do a full docker test to achieve the test goal. I'll let you know how that goes.

@markphillips100
Copy link
Author

@jtschuster just tried a comparison locally publishing, with my usual trim mode settings, between 7.0.200 and 7.0.300-preview.23122.5. Times were approximately 6 minutes and 1 minute respectively. A significant improvement. Great work and massive thanks to all.

Without trimming enabled both sdk versions are much faster at approximately 3 seconds. My assumption is that is always to be expected.

Most of my projects reference the efcore libs, either directly or indirectly, so would be good to understand why those have more of an impact (as shown in that repro demo). Not pointing them out exclusively, but just that they have been shown to slow the process down.

@jtschuster
Copy link
Member

@markphillips100 It looks like the EFCore package just uses a lot of reflection which causes the linker to keep a lot of extra code and leads to much more of the runtime libraries being analyzed. From your repro project, the total size of the trimmed .dll's in the publish folder is 2.2mb without efcore, and over 20mb with efcore. So, the linker is slower because it basically has to analyze nearly 10x as much code because of the dependencies of efcore and the reflection that it does.

@markphillips100
Copy link
Author

Okay. That explains it then. I guess that's not something EFCore can optimize away.

Thanks for the explanation.

@vitek-karas
Copy link
Member

EF is working on some AOT compatibility: dotnet/efcore#29754
Being AOT compatible also means being trim compatible. It's very likely that bringing up trim compatibility will also reduce size. But it's obviously a very complex thing to do, so will take time.

@agocke
Copy link
Member

agocke commented Feb 23, 2023

Looks like we're back to 6.0 perf and full trimming should be even faster when the libraries include compatibility. I'll close this out as it doesn't look like there's anything else to do for now.

@agocke agocke closed this as completed Feb 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants