-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change referenced assemblies processing to be on demand in MarkStep #1164
Comments
/cc @vitek-karas |
This removes the easily removable parts of the TypeMap step. Contributes to dotnet#1164.
This removes the easily removable parts of the TypeMap step. Contributes to #1164.
I plan to break the fix up into the following phases:
|
Looks good!
Since we basically just remove code it should be fine. We can theoretically overmark in this case, but it's very very unlikely. Also - there should be basically no dependencies from the already marked code into the newly loaded assembly, since otherwise we would have loaded the dependency earlier. |
Few topics/questions which I think are worth thinking before diving into implementation
|
Do you mean unresolved refs that are part of the static assembly reference closure? I think these should be handled the same as they are now - either skipped, or treated as an error (depending on --skip-unresolved).
Nothing would change in Phase 1 - such XML would be processed up-front, and it could load new assemblies, but those will not have their own XML processed. In Phase 2 delay loaded assemblies would have their dependencies XML processed recursively (if marked). This XML can pull in new delay loaded assemblies, etc. It will only support XML that adds new dependencies - not substitutions which would modify already loaded assemblies, same goes for attributes XML. In this phase I suggest not to process delay loaded substitutions/attributes. Substitutions from the static reference closure are still processed as before. In Phase 3 we could add partial support for delay loaded substitutions/attributes. We could limit them to modifying the containing assembly, or we could do as Vitek suggests and allow modifying already loaded assemblies, but without re-processing them - so it could lead to over-marking. I would prefer limiting the modifications since I find this easier to reason about.
I think we will need a similar extensibility mechanism for the delay-loaded per-assembly logic. I am imagining that there is a list of per-assembly steps where custom steps can be added. Maybe we could add something like: interface IAssemblyStep {
void ProcessAssembly (AssemblyDefinition assembly);
} Alternatively, maybe we could use
I think the latter - we want MarkStep to be able to add assemblies, for example when they are referenced by reflection, and have most of the processing logic run for these delay loaded assemblies (but not logic that would require re-analyzing already loaded assemblies).
I don't think they would need special considerations except that some steps need to traverse exported types. Do you foresee issues here? |
I think this is an interesting question - where exactly in the pipeline is not that interesting, but there will HAVE to be a point after which modifications are not allowed. Maybe a better way to put this would be "Once Sweep step starts, that's it". It's not that we could not handle it on the Cecil side, but we need to be able to freeze the annotations, so that sweep and similar steps can reason about them as a whole. For example the "Sealer" step is like that - we would only want to run it once "Everything" is annotated. That kind of partially answers the problem with custom steps - we probably need two kinds of custom steps:
|
I would vote for only allowing substitutions within the assembly that defines the thing being substituted. Otherwise we're opening ourselves interesting problems (linker will do substitutions based on value X returned from a method and eliminate branches at all callsites, then assembly Foo walks in and changes X to Y, invalidating linker's original assumptions). It can be dealt with I guess, but do we really need to? |
I agree that substitutions should only work on the same assembly (tricky for those on command line, but those will "preload" all the necessary code, so it should be doable). The potential problem is that linker then does constant propagation across methods (and assemblies) - my original point was about the fact that constant propagation will basically never go "back" into the already processed (loaded) code. So if we limit substitutions like Michal suggests, it should be OK to also limit the constant propagation to only the newly loaded assembly and any future loaded assemblies, but no need to revisit already loaded assemblies. I think we should write down the flow - the order of things which should happen - ignore the "step" design, just write down the actions we know need to happen. Something like:
And then put together a couple of evil cases (where new assemblies are requested in the middle of the above) and how the system should handle those to get the correct result. THEN we can figure out how to actually implement it (steps, or something else, and so on). |
I don't have a good mental model of how the constant propagation works - but I though it will also propagate actual simple bodies (i.e. no XML involved, but if the body is |
I mistakenly read this as "processing lazy substitutions is fine because they will only remove code, so by not revisiting we are at worst overmarking", but now I see that you meant "not processing lazy substitutions is fine". :) I think we're all on the same page - we should disallow substitutions that modify already loaded assemblies, no revisiting required. |
One catch - this means that the processing pipeline needs to work "in reverse" - that is:
It's recursive - so we might have trouble with deep callstacks... In a way it would be better if we could front-load loading references - maybe this will work:
|
I agree that we need to load the assemblyrefs up-front - doing otherwise would require us to scan method bodies to find callees before inlining. I don't think the recursion will be a huge issue. Dependency XMLs can't be processed immediately anyway - we need to wait until the assembly gets marked. Since we're only looking at static assemblyrefs, I don't think there will be loops (if the input is valid). I wasn't planning to process those assemblies in reverse order though - I was going to more or less stick with the current design of RemoveUnreachableBlocksStep and iteratively do inlining over the new assemblies. I think the performance should be no worse than what we do today - we can separate performance from correctness for now. In a perfect world we would not only do bottom-up processing of assemblies, but also bottom-up processing of methods, to avoid re-scanning for new inline candidates. |
OK - the bottom-up can potentially mean that we can run the constant propagation one assembly at a time - which will make it a LOT faster. But I agree that it's mostly a perf optimization, so we can look into that later. |
One thing to consider is that the "constant propagation step" currently happens upfront, and potentially runs on method bodies the linker is not even going to look at (they might be unreferenced). An option to consider would be to ban linker code from accessing the |
It cannot be handled the same way. Today the errors are handled upfront with delay-loading it can happen anywhere so with What is I think even more problematic is that you could end up resolving and hence triggering error which should never happen. Consider an example where you resolve attribute but the attribute should not be resolved at all because one of the attribute arguments triggers assembly load which suppressed the attribute processing and its second attribute arguments point to unavailable assembly.
One thing to consider is that if there is assembly ref to the facade and we are in the mode of removing facades we should probably never resolve delay-loaded assembly reference to facade but that can be complicated if for example, facade assembly embeds XML descriptors.
I think it's interesting if you consider that someone can inject a custom step and trigger assembly load from anywhere (e.g. last step in the pipeline).
Substitutions are not a problem at all. What is problematic is that existing XML descriptors can and they do in wild reference other assemblies, which with delay loading and your current proposal can be loaded anytime which means even after MarkStep. Second problematic areas can be attributes exclusion, how do you ensure that we don't mark custom attributes if they can have they attribute XML file which describes that they should be removed loaded anytime?
This should change for multiple reasons to happen inside MarkStep and body evaluation would be triggered by marking. I tried to make that change a few months ago but didn't finish it. I could bring it back once I have time for coding. |
With
I think we should treat this the same as substitutions - it's another case where delay loaded XML is trying to modify an already loaded assembly, so it shouldn't be supported.
Good point - I think today facades can have XML that's processed even if the facade is removed. This is also the behavior that would make sense to me for delay loaded facades - I think their XML should be processed if any typerefs that go through the facade are marked, even if the facade isn't kept. I added a note to Phase 2.1 for this.
I'm not sure we should be trying to enforce such ordering constraints - they feel very specific to what the step wants to do, and would probably require making parts of the API for custom steps immutable or restricting where custom steps can be inserted. Could you provide some examples of problematic behaviors we want to prevent? The way I look at it is that custom steps will need to be designed to run correctly at a given point in the pipeline, with the understanding that previous steps won't run again. For example something which modifies an assembly in memory after OutputStep should not expect those changes to be written to disk. Similarly, a step which processes some XML after MarkStep should not expect newly marked code to be processed again by MarkStep.
Like above, I'd say that if a step after MarkStep loads a new assembly, it needs to be aware that MarkStep won't re-run. I hope to design it so that calling
How would we reconcile bottom-up constant propagation with top-down marking? Are we ok with only propagating constants into direct callers? My current feeling is that this is orthogonal to the plan above, right? |
As I wrote earlier that logic today has no information about which assembly is missing. Secondly, there are cases where no error is reported when Resolve returns null which presumably would now need some extra state to detect missing assembly case.
No, this is the exact opposite. Delay loaded assembly has information which influences already processed types.
What I was trying to steer you towards is having separate APIs for assembly/type resolving. One which can trigger assembly load and one which never will. I'm concern that without that we could end up with hard to trace bugs unless we ensure the whole pipeline can be reliably processed from any point which I don't think is doable. Imagine a situation where e.g. SweepStep triggers new assembly load, do you suggest we should re-processed everything?
Correct, it's not related to this plan but as this topic was mentioned by other I shared what I think we should do there. |
This introduces the ability to add assemblies to LinkContext lazily and have them processed in MarkStep. For now, the only additional processing is to load references and run the TypeMap logic. Later, embedded XML processing and constant propagation will also happen for lazy assemblies. This is Phase 1 outlined in dotnet#1164 (comment). Fixes dotnet#943 Fixes dotnet#1079
This introduces the ability to add assemblies to LinkContext lazily and have them processed in MarkStep. For now, the only additional processing is to load references and run the TypeMap logic. Later, embedded XML processing and constant propagation will also happen for lazy assemblies. This is Phase 1 outlined in dotnet#1164 (comment). Fixes dotnet#943 Fixes dotnet#1079
This introduces the ability to add assemblies to LinkContext lazily and have them processed in MarkStep. For now, the only additional processing is to load references and run the TypeMap logic. Later, embedded XML processing and constant propagation will also happen for lazy assemblies. This is Phase 1 outlined in dotnet#1164 (comment). Fixes dotnet#943 Fixes dotnet#1079
I would love to restrict embedded XML to the same assembly only, but it may not be practical. For example currently it's easy to add embedded XML into the app - VS has UI to do that, you just need to know the name. Passing the same file to linker on command line is a much more complicated process for the user, involving modifying the project file. We may need to come up with a bit more relaxed rule which allows the app much more freedom in usage of the XMLs. But in general I agree we should try to limit the reach of the XMLs such that they don't create these issues. |
IMHO if a user needs to make an XML, we already failed to provide a good user experience. Whether they need to do extra work to hook it up doesn't change the fact that we already took them to a place where they need to figure out strange file formats with odd name mangling rules that is disconnected from IDE and refactoring tools. I wouldn't look at the XML as a user scenario. |
I absolutely agree it's not a mainline scenario, but in the foreseeable future, there will be NuGets and other components which are not trim friendly. Some people may want to use XMLs to annotate assemblies they don't have source code for (for example a NuGet) - but I guess for such case some more csproj magic doesn't matter that much. |
What behavior do we want for a dynamically loaded assembly that's unused? For example, The current behavior allows the linker to remove |
|
The plan above was still controversial so I've created a doc with a new design that avoids loading references up-front in most cases: #1626 Hopefully iterating on the design doc will be a better way to get to an agreement. |
This introduces the ability to add assemblies to LinkContext lazily and have them processed in MarkStep. For now, the only additional processing is to load references and run the TypeMap logic. Later, embedded XML processing and constant propagation will also happen for lazy assemblies. This is Phase 1 outlined in dotnet#1164 (comment). Fixes dotnet#943 Fixes dotnet#1079
This introduces the ability to add assemblies to LinkContext lazily and have them processed in MarkStep. For now, the only additional processing is to load references and run the TypeMap logic. Later, embedded XML processing and constant propagation will also happen for lazy assemblies. This is Phase 1 outlined in dotnet#1164 (comment). Fixes dotnet#943 Fixes dotnet#1079
This introduces the ability to add assemblies to LinkContext lazily and have them processed in MarkStep. For now, the only additional processing is to load references and run the TypeMap logic. Later, embedded XML processing and constant propagation will also happen for lazy assemblies. This is Phase 1 outlined in dotnet#1164 (comment). Fixes dotnet#943 Fixes dotnet#1079
This introduces the ability to add assemblies to LinkContext lazily and have them processed in MarkStep. For now, the only additional processing is to load references and run the TypeMap logic. Later, embedded XML processing and constant propagation will also happen for lazy assemblies. This is Phase 1 outlined in dotnet#1164 (comment). Fixes dotnet#943 Fixes dotnet#1079
This introduces the ability to add assemblies to LinkContext lazily and have them processed in MarkStep. For now, the only additional processing is to load references and run the TypeMap logic. Later, embedded XML processing and constant propagation will also happen for lazy assemblies. This is Phase 1 outlined in dotnet#1164 (comment). Fixes dotnet#943 Fixes dotnet#1079
This introduces the ability to add assemblies to LinkContext lazily and have them processed in MarkStep. For now, the only additional processing is to load references and run the TypeMap logic. Later, embedded XML processing and constant propagation will also happen for lazy assemblies. This is Phase 1 outlined in dotnet#1164 (comment). Fixes dotnet#943 Fixes dotnet#1079
This introduces the ability to add assemblies to LinkContext lazily and have them processed in MarkStep. For now, the only additional processing is to load references and run the TypeMap logic. Later, embedded XML processing and constant propagation will also happen for lazy assemblies. This is Phase 1 outlined in dotnet#1164 (comment). Fixes dotnet#943 Fixes dotnet#1079
This introduces the ability to add assemblies to LinkContext lazily and have them processed in MarkStep. For now, the only additional processing is to load references and run the TypeMap logic. Later, embedded XML processing and constant propagation will also happen for lazy assemblies. This is Phase 1 outlined in dotnet#1164 (comment). Fixes dotnet#943 Fixes dotnet#1079
This introduces the ability to add assemblies to LinkContext lazily and have them processed in MarkStep. For now, the only additional processing is to load references and run the TypeMap logic. Later, embedded XML processing and constant propagation will also happen for lazy assemblies. This is Phase 1 outlined in dotnet#1164 (comment). Fixes dotnet#943 Fixes dotnet#1079
Closing as implemented in 6.0 |
…ker#1186) This removes the easily removable parts of the TypeMap step. Contributes to dotnet/linker#1164. Commit migrated from dotnet/linker@e9f0009
For historical reasons ILLinker assembly resolution logic was very explicit and done as a single step in the pipeline
LoadReferencesStep
this was easy to implement but it does not work well with the more advanced features which are have been and will keep be adding to the linker.There are several drawbacks which would be addressed if the linker was able to resolve the assembly reference on demand during MarkStep (support for on-demand assembly resolution in any step is not required at this point) as well as potential speed improvement for loading and analyzing fewer assemblies. It'd also get ILLinker closer to csc behaviour and require only necessary references for processing to be specified and not all possible references as it's required today.
Related issues which would be addressed by this
The text was updated successfully, but these errors were encountered: