-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up incremental package asset resolution #2020
Conversation
Whenever project.assets.json changes (or more rarely, a setting influencing ResolvePackageAssets changes), write out a binary cache file for the next incremental build. The file has only the items that used by the build and they are written in a format that can be deserialized to MSBuild ITaskItem[] with minimal time and allocation.
Related: #1711 (resolves it for normal builds, more work to do for dependencies node / design-time build)_ |
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like you handle InvalidDataException here - I would have thought we'd want to throw away the cache in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InvalidDataException doesn't derive from IOException - which you may have thought it did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Need to add tests for this. I did think it derived from IOException.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed and test coverage added.
private static BinaryReader OpenCacheFile(string path, byte[] settingsHash) | ||
{ | ||
var stream = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.Read); | ||
var reader = new BinaryReader(stream, TextEncoding, leaveOpen: false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this just a using?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we want to keep the reader open for the caller to own unless the header is out-of-date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, awesome effort.
_bufferedMetadata = new List<int>(); | ||
|
||
Directory.CreateDirectory(Path.GetDirectoryName(task.ProjectAssetsCacheFile)); | ||
var stream = File.Open(task.ProjectAssetsCacheFile, FileMode.Create, FileAccess.ReadWrite, FileShare.None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to worry about concurrent writes here? I see you have None there, but if it happens, will this bubble up? Is that what you intend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache file is in Intermediate directory, and has msbuildprojectname in it. If we double write here, project configuration is busted. We'd double write a bunch of other stuff too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
|
||
private void WriteItemGroups() | ||
{ | ||
// NOTE: Order (alphabetical by group name) must match writer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean must match reader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes copy paste error. Thanks
My results are similar to @nguerrera:
The improvement is significantly larger for a single-proc build, even on a relative basis. |
The symptom that every task is slower in parallel builds is reminiscent of the logging bug. We should check that it hasn't regressed. It's very strange to me that all that cumulative time in ResolvePackageAssets (and ReportAssetsLogMessages) is gone but only such a small fraction of those savings accrue to elapsed time. |
I agree we should do more investigation of the relative performance between We should look at things like:
CC: @rainersigwald |
Took a quick look at the /detailedsummary output of an /m:8 no-op incremental build on sdk 2.1.100 (msbuild 15.6.82.30579)
That's definitely not great, but I'd expect it to do better than a 2x speedup. Looks like the node builds are densely packed for the first 44 seconds of the build and only then does the utilization graph start to have holes. Definitely deserves a deeper look. |
@dsplaisted Would you mind reviewing this one as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it could be a good idea to have unit-level tests for reading and writing the cache file.
@@ -59,58 +53,7 @@ public DiagnosticsHelper(ILog log) | |||
|
|||
_diagnosticMessages.Add(diagnostic); | |||
|
|||
if (logToMSBuild) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you delete the logToMSBuild
parameter to this method if you're removing the functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
/// dependency node in the solution explorer) | ||
/// Raise log messages in the assets file as DiagnosticMessage items | ||
/// that can be consumed downstream (e.g. by the dependency node in | ||
/// the solution explorer) | ||
/// </summary> | ||
public sealed class ReportAssetsLogMessages : TaskBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this task be deleted entirely? Based on this comment, I would think so:
On top of that, we merge ReportAssetsLogMessages into ResolvePackageAssets, which allows for project.assets.json to not be read at all if it has not changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. It is still used at design-time to add the diagnostics to nodes in the dependencies tree in the IDE. The old ResolvePackageDependencies is in the same boat...
It's next on the list to look at speeding up the depenencies node, but in the meantime, this has to stay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just use the ResolvePackageAssets
task to load the log messages for the designer, so we get the benefits of caching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I'm leaving the design-time changes out of this PR as much as possible. The messages are the least of your worries there right now. I'd rather tackle the whole design-time resolution together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#1711 is what I'm leaving open to tie up the loose ends in the design time case.
@@ -26,6 +26,7 @@ Copyright (c) .NET Foundation. All rights reserved. | |||
<!-- Project Assets File --> | |||
<PropertyGroup> | |||
<ProjectAssetsFile Condition="'$(ProjectAssetsFile)' == ''">$(BaseIntermediateOutputPath)/project.assets.json</ProjectAssetsFile> | |||
<ProjectAssetsCacheFile Condition="'$(ProjectAssetsCacheFile)' == ''">$(IntermediateOutputPath)$(MSBuildProjectName).assets.cache</ProjectAssetsCacheFile> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we base this on Path.GetDirectoryName($(ProjectAssetsFile))
instead of hard coding the IntermediateOutputPath
? This will work better with changes we are working on to use MSBuildExtensionsPath
more consistently (NuGet/NuGet.Client#2056).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. It is vital that we use a path that is unique to the current TFM (and to a lesser extent the current configuration). Each TFM caches the assets for that TFM and only that TFM. The assets file can move around to wherever it wants and this file can stay in the IntermediateOutputPath, so there's no maintenance issue. I can add a clarifying comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had mistaken IntermediateOutputPath
for BaseIntermediateOutputPath
in the line you added. Now it makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a clarifying comment.
[Required] | ||
public string ProjectAssetsFile { get; set; } | ||
|
||
/// <summary> | ||
/// Path to assets.cache file in intermediate directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be next to the project.assets.json file, but not necessarily in the intermediate directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. See above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove intermediate directory from this comment. It's the path to the cache file. The targets choose the location (but they need to pick a safe per-TFM, per-Configuration location).
|
||
/// <summary> | ||
/// Indicate to MSBuild ResolveAssemblyReferences that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete comment here.
|
||
for (int i = 0; i < metadataCount; i++) | ||
{ | ||
string key = _metadataStringTable[_reader.ReadInt32()]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's worth throwing something better than an IndexOutOfRangeException
if the index is wrong here? Should we try to throw an exception saying the assets file cache appears to be corrupt in cases like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be worth defending against. I will file a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #2049
} | ||
private void WriteItem(string itemSpec) | ||
{ | ||
FlushMetadata(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it would be quite easy to end up writing the metadata out of order, since metadata is flushed before an item is written instead of afterwards. What's the reason for buffering the metadata items anyway? Would it be more straightforward to have WriteMetadata
write directly to the stream, and in WriteItems
call the writeMetadata
callback after the call to WriteItem
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for buffering is because we don't know how many metadata items will be written and we need to prepend the count. We have the same problem for items but there we write a placeholder and seek back to set the count. Seeking costs flushing the file stream buffer, though so we don't want to do that for every item. I was trying hard to avoid allocations in the writer so as not to over-penalize the first, uncached build. I will give some thought to how to improve this, but I won't hold on this for the initial PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline another approach to this would have been to make the first run go from assets.json to ITaskItems directly (as before), but also write out the cache file for the incremental builds and incremental builds alone to use. The main reason why I did not write it that way was to make it so that full builds and incremental builds get the items from the same code so that there won't be mysterious bugs that only show up on incremental builds. This means that the entire test suite is stressing the cache reading/writing now.
There is potential to improve the readability of the code and the performance of the first build here, but I am reluctant to tackle that and reset the preview2 measurements that have been taken with this approach, nor do I want to lose the coverage property.
Filed #2048 to think about changing the approach.
{ | ||
if (!asset.IsPlaceholderFile() && (filter == null || filter(asset))) | ||
if (!NuGetUtils.IsApplicableAnalyzer(file, _task.ProjectLanguage)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class combines two things: Logic for how we interpret the contents of the lock file, and the format we use to cache it. It seems like it would be cleaner to separate these two things into two different classes. Is this possible, or do they need to be mixed in order to maintain the perf gains from the cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are mixed but the methods should be reasonably separated enough. This was indeed for perf so that we don't end up with any intermediate data to allocate when going from assets.json -> assets.cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is covered by #2048. See #2020 (comment)
p => p.ContentFiles, | ||
filter: asset => !string.IsNullOrEmpty(asset.PPOutputPath), | ||
setup: (asset, item) => | ||
private byte[] HashSettings() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any way of helping to keep this method up to date, so that whenever there is a property that would affect the result we make sure to add it to the hash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added test coverage for that in 7b0dbb0
There was actually a bug that this fixes: if
This would also have been easy to get wrong over time, so the test is definitely valuable. Great feedback!
File.SetLastWriteTimeUtc(assetsJsonPath, DateTime.UtcNow); | ||
buildCommand.Execute().Should().Pass(); | ||
var cacheWriteTime3 = File.GetLastWriteTimeUtc(assetsCachePath); | ||
cacheWriteTime3.Should().NotBe(cacheWriteTime2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that a build would run fast enough that these times are actually the same? Should the test do a short sleep if DateTime.UtcNow
hasn't increased at each stage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
It certainly wouldn't hurt, but as noted elsewhere every single integration test in the suite is going to be writing/reading from the cache. In fact, it is designed to go through the same path in full builds as incremental builds for exactly that reason. I noted in #2048 that if we take a different approach we will need much more testing for the incremental case. |
@dotnet-bot Test OSX10.12 Debug Filed #2052 The flakiness predates this change. I'm disabling the test on OS X if it fails again. |
Whenever project.assets.json changes (or more rarely, a setting influencing ResolvePackageAssets changes), write out a binary cache file for the next incremental build. The file has only the items that used by the build and they are written in a format that can be deserialized to MSBuild ITaskItem[] with minimal time and allocation.
On top of that, we merge ReportAssetsLogMessages into ResolvePackageAssets, which allows for project.assets.json to not be read at all if it has not changed.
Some early measurement below using https://github.com/mikeharder/dotnet-cli-perf/tree/master/scenarios/web/large/core
First observation, the total size of all assets.cache files is 5 MB vs. 37 MB for assets.json.
I will send a private drop to our internal perf alias as wll.
Incremental build --no-restore /m:1
Saving 10% overall here and reducing ResolvePackageAssets+ReportAssetsLogMessage (now combined into ResolvePackageAssets) from ~4 seconds to ~250 milliseconds.
Incremental parallel build --no-restore
Overall saving is less in parallel case, but time in (ResolvePackageAssets + ReportAssetsLogMessage) goes down from ~7.5 seconds to ~ 600 milliseconds.
It is interesting that just about every task is taking significantly more cumulative time when building in parallel. But, of course, some (most???) of the 7 seconds difference is being recuperated by parallelism.
Clean parallel build --no-restore (cleaned and restored beforehand)
This case shows the slowdown of (ResolvePackageAssets + ReportAssetsLogMessage) from ~8.8 seconds to ~9.7 seconds in the first run of ResolvePackageAssets where the cache needs to be generated. I have some ideas on how to narrow this down significantly, but note that it does not impact the inner loop case where project.assets.json would not be changing between every build.
cc @mikeharder @davkean @livarcocc @dsplaisted