Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider marking linked assemblies on the output #1194

Open
vitek-karas opened this issue May 15, 2020 · 26 comments
Open

Consider marking linked assemblies on the output #1194

vitek-karas opened this issue May 15, 2020 · 26 comments
Labels

Comments

@vitek-karas
Copy link
Member

As far as I can tell linker doesn't mark the assemblies which it linked (as in changed) in any way. Some tools might want to detect linked applications to avoid problems with missing APIs. One such example is the startup hook feature.

One such mechanism could be to inject an assembly level attribute, something like:

[assembly: AssemblyMetadata("ILLinkProcessed", "True")]

Size impact of such change should be trivial.

@marek-safar
Copy link
Contributor

What would this attribute mean? ILLinker has several modes of how to process assemblies as some of them don't really modify the content.

@vitek-karas
Copy link
Member Author

We could have a metadata value which indicates that linker removed some parts (when we do actual Sweep step on it). And then we could have a different value for cases where we do rewrite the assembly but we don't intentionally remove anything (so it goes through Cecil read->write roundtrip, but we don't sweep).

The scenario is above mentioned cases where something tries to load a "plugin" into potentially linked app. Currently there's no indication that the app is linked and so it seems as if it's safe to load the plugin. In reality linked apps are likely not suitable for plugin loading and it should not be attempted (as the code may fail in unexpected ways or change behavior).

@marek-safar
Copy link
Contributor

We could go with

[assembly: AssemblyMetadata("ILLinkProcessed", "link")] // storing the ACTION on the assembly (except for skip)

The question remains what to do when the assembly is processed multiple times with different modes (e.g. link first then copy)

@vitek-karas
Copy link
Member Author

I think we will need a way to disable the marking - for cases like our own build of libraries. We do "link" the assemblies in that step, but we work hard to make sure that doesn't break anything. Or rather, we should definitely not ship all assemblies in the framework marked as "link' - that would defeat the purpose.

We could read the attribute which is already there and only make it "more strict" (so copy->link but not the other way round).

@samsp-msft
Copy link
Member

I think this should be about modification rather than linking. What if it was a counter of how many times it had been morphed, and if a tool morphs it in any way, then the counter would be incremented.

The other question is whether there should be some form of report from the linker to say what has been trimmed, and possibly what the roots are for each member that kept it?

@vitek-karas
Copy link
Member Author

The other question is whether there should be some form of report from the linker to say what has been trimmed, and possibly what the roots are for each member that kept it?

Linker can already produce a report which describes the root for everything it keeps. It doesn't have a report for things which were removed though.

@richlander
Copy link
Member

What is the scenario in which maintaining a counter inside the assembly would be useful, @samsp-msft? The roots might better be recorded in a PDB.

@samsp-msft
Copy link
Member

The reason I suggested a counter is that conversation further up indicated that multiple tools may have modified the assembly. What you want to know is has it been modified (could possibly be done by checking file size, but a counter is an alternative.

@richlander
Copy link
Member

What's the code you would write at the very end to take advantage of that counter?

@gjuttla
Copy link

gjuttla commented Aug 28, 2020

The other question is whether there should be some form of report from the linker to say what has been trimmed, and possibly what the roots are for each member that kept it?

The more general case (the information whether an assembly was trimmed) would be very useful for CLR profilers.

Trimmed assemblies present a challenge for a CLR profiler which performs instrumentation (IL rewriting, metadata changes) because there is no guarantee about the existence of types anymore. One safety net would be to disable instrumentation completely if a trimmed assembly is detected.

Maybe just by adding a custom attribute on the assembly?

@marek-safar
Copy link
Contributor

Trimmed assemblies present a challenge for a CLR profiler which performs instrumentation (IL rewriting, metadata changes) because there is no guarantee about the existence of types anymore.

I don't understand this comment. Could you describe the scenario in more details?

@gjuttla
Copy link

gjuttla commented Aug 31, 2020

The .NET component of our monitoring solution is a CLR profiler. We inject our code into our customers' applications at runtime. This includes adding new references to types and assemblies. Now with trimmed assemblies, the situation arises where we are injecting code with references to types and assemblies which have been trimmed from the customer's application and therefore cannot be resolved.

With an annotation on the assembly indicating that a certain assembly was trimmed, we could skip the modification of the assembly at all and prevent possible crashes.

@marek-safar
Copy link
Contributor

the situation arises where we are injecting code with references to types and assemblies which have been trimmed from the customer's application and therefore cannot be resolved.

It's still unclear to me which part is causing the problems. If you can create a new valid type-reference then the code should work, if you are saying that you are creating type-reference without checking it's valid at runtime then perhaps better fix is add a check there.

With an annotation on the assembly indicating that a certain assembly was trimmed, we could skip the modification of the assembly at all and prevent possible crashes.

Do you think that would work for you? You are basically saying that your tooling will never work when trimming step is used (either automatically or manually) even if illinker does not trim anything.

@gjuttla
Copy link

gjuttla commented Sep 1, 2020

It's still unclear to me which part is causing the problems. If you can create a new valid type-reference then the code should work, if you are saying that you are creating type-reference without checking it's valid at runtime then perhaps better fix is add a check there.

The instrumentation happens on the profiler callback ModuleLoadStarted. In almost all cases, there is already some kind of service call active in our end customer's application (web request, backend service call). Our whole instrumentation procedure has to be performant to keep the instrumentation overhead as low as possible. Therefore our type and assembly references are added unchecked. Adding a check for every type and its members would be computationally expensive.

Do you think that would work for you?

Yes, a new attribute denoting assembly trimming would work for us. Detecting a custom attribute of a certain type would be efficient. Detecting the general type AssemblyMetadata and accessing the values from unmanaged code is only partially possible and expensive.

Mirroring the configuration options of the trimming functionality by the end-user, the special attribute would contain similar information. No attribute on the assembly when trimming is disabled. E.g.

[assembly: AssemblyTrimmed(TrimMode.CopyUsed)]

or

[assembly: AssemblyTrimmed(TrimMode.Link)]

(Or usage with numeric value denoting the trimming granularity: 0 -> assembly-level trimming, 1 -> member-level trimming.)

Of course, the downside of this approach is that this new attribute has to be present in the runtime.

You are basically saying that your tooling will never work when trimming step is used (either automatically or manually) even if illinker does not trim anything.

Currently, our profiler is not compatible with trimmed assemblies regardless of the trimming granularity. So our initial goal would be to disable instrumentation if we detected a marked assembly. We will have to research various approaches for full compatibility.

@MichalStrehovsky
Copy link
Member

Would this really be sufficient?

The injected attribute means "linker touched the assembly you're currently looking at" (e.g. the one that triggered the ModuleLoadStarted event). This would be useful if you're injecting references either to things that live in assemblies that are already loaded (and you saw a ModuleLoadStarted event in the past), or are in the process of being loaded right now.

It will not work if the injected reference happens to be in a module that hasn't been loaded yet. And if that module is the one that was actually trimmed, the profiler is out of luck - it doesn't know trimming happened there and it just generated code that will throw MissingXXXException at runtime. It feels like this is exactly the kind of scenario profilers will find themselves in, because they tend to be injecting references to profiling libraries that the user code wasn't compiled against.

[assembly: AssemblyTrimmed(TrimMode.CopyUsed)]

I don't think we'll want a scheme where linker needs to read/modify/write assemblies (that would otherwise be just copied over) just to say "I didn't do anything to this". There are various reasons why CopyUsed could be used. One of the reasons could be C++/CLI that generates assemblies that are not possible to roundtrip (they contain non-.NET data that we don't know how to reconstruct after injecting the attribute).

@akoeplinger
Copy link
Member

Does your profiler inject itself dynamically without interaction from user code?
If there is some sort of registration/setup method the user code calls you could use the [DynamicDependency] attribute to make sure your dependencies are marked.

@gjuttla
Copy link

gjuttla commented Sep 4, 2020

Would this really be sufficient?

@MichalStrehovsky
Yes, this would be sufficient if [assembly: AssemblyTrimmed(TrimMode.Link)] is applied on framework and application assemblies and [assembly: AssemblyTrimmed(TrimMode.CopyUsed)] only on application assemblies which were modified.

The injected attribute means "linker touched the assembly you're currently looking at" (e.g. the one that triggered the ModuleLoadStarted event). This would be useful if you're injecting references either to things that live in assemblies that are already loaded (and you saw a ModuleLoadStarted event in the past), or are in the process of being loaded right now.

It will not work if the injected reference happens to be in a module that hasn't been loaded yet. And if that module is the one that was actually trimmed, the profiler is out of luck - it doesn't know trimming happened there and it just generated code that will throw MissingXXXException at runtime. It feels like this is exactly the kind of scenario profilers will find themselves in, because they tend to be injecting references to profiling libraries that the user code wasn't compiled against.

Yes, such cases can occur, but with the help of the additional attribute, these cases would be avoidable. If a profiler detects a CoreLib with marked with [assembly: AssemblyTrimmed(TrimMode.Link)] it should disable instrumentation globally and not inject anything into the CoreLib assembly and every subsequent loaded assembly.

I don't think we'll want a scheme where linker needs to read/modify/write assemblies (that would otherwise be just copied over) just to say "I didn't do anything to this". There are various reasons why CopyUsed could be used. One of the reasons could be C++/CLI that generates assemblies that are not possible to roundtrip (they contain non-.NET data that we don't know how to reconstruct after injecting the attribute).

I agree. This attribute should not be applied on assemblies which were not actually modified. However, this would make sense on application assemblies when the trim mode is CopyUsed and the System.Runtime facade reference is exchanged with the System.Private.CoreLib reference. In such cases, a profiler cannot inject profiling libraries built against System.Runtime anymore and should also disable instrumentation globally.

The usage of AssemblyTrimmed in a profiler could then be simplified to disable instrumentation as soon as the first AssemblyTrimmed attribute is detected.

@akoeplinger
Yes, the profiler injects itself dynamically without user interaction/setup.

@gjuttla
Copy link

gjuttla commented Nov 18, 2020

Hi all,
I was wondering if you had given the comment above some thought.

@vitek-karas
Copy link
Member Author

@gjuttla I think there's sort of an agreement - your description doesn't go against what @MichalStrehovsky suggested above. Regardless of which link action linker applies to an assembly, if it does write it (not direct copy) it would add an attribute. Unfortunately the consumer of such attribute can make only limited decisions based on what link action linker did. Since even just a simple "copy" could mean changes to the assembly, for example if linker removes facades. So maybe just a simple "linker touched this" attribute, without any parameter. It sounds like that would suffice for your use case.

As for the attribute, we would probably add it to the core framework, but technically it doesn't have to be there. We could declare the attribute in the assembly where it's used. For example some special nullability attributes are added this way by the compiler All of the tools consuming this attribute should rely solely on the namespace.typename of the attribute and not on which assembly it comes from. This is already true for all attribute the linker itself recognizes.

@gjuttla
Copy link

gjuttla commented Nov 19, 2020

Thanks for the update.
Yes, a simple "linker touched this" attribute would definitely suffice for profilers and would be a welcomed addition.

@vitek-karas
Copy link
Member Author

@marek-safar @MichalStrehovsky - what do you think - is it something we should do? Would mean adding a new public API attribute into runtime and then use it in the linker. Alternatively we could generate the attribute type into the linked assembly, but they goes sort of against the linker purpose (make things small). Or we could use the AssemblyMetadata for this, something like:

[assembly:AssemblyMetadata("Trimmed", "true")]

For example framework assemblies use this for:

[assembly: AssemblyMetadata("Serviceable", "True")]
[assembly: AssemblyMetadata("RepositoryUrl", "git://github.com/dotnet/runtime")]

@gjuttla
Copy link

gjuttla commented Nov 20, 2020

From a profilers perspective, detecting a custom attribute of a certain type (namespace.typename) would be efficient. Detecting the general type AssemblyMetadata and accessing the values from unmanaged code is only partially possible and expensive.

@marek-safar
Copy link
Contributor

Alternatively we could generate the attribute type into the linked assembly

I don't think that's a good idea. The attribute should survive linking so that means we'll have hundreds of copies of in every runtime pack.

I think this needs a better proposal about how it should work but I can imagine making this work by using AssemblyMetadataAttribute. Few questions to move this further

  • What would be the exact meaning of such an attribute? We "trim" every runtime assembly during the build and I don't see what a value by adding the attribute for that.

  • Would the process be cumulative? Should every linker run adds its own "stamp".

  • Something we "save" assembly which could rewrite it, is that same as "trimmed" marker?

@vitek-karas
Copy link
Member Author

My take on the questions (good ones):

  • The meaning would be "the content of the assembly has been modified and it's not guaranteed that it is equivalent to the assembly on input" - which can also be seen as "not equivalent to the assembly from its original source: NuGet, framework, runtime pack, ...".
  • dotnet/runtime usage of linker is meant to not change the functionality provided by the assembly, so those should not add such attribute (we would probably need a command line option to do that explicitly).
  • Ideally the process would be "cumulative" in the sense that if the attribute is set to true, subsequent processing should not change it (or remove it). I would hope we don't have scenarios where the same assembly goes through linker more than once (ignoring the special run which happens in dotnet/runtime). But we could also move this responsibility to the consumer and simply add an attribute every time.
  • Linker should "play it safe", and so the first implementation would simply mark all assemblies for which we call WriteAssembly. If later on we find a case where this is not desirable we can revisit this.

@marek-safar
Copy link
Contributor

The meaning would be "the content of the assembly has been modified and it's not guaranteed that it is equivalent to the assembly on input"

Hmm, this might be tricky to implement. Maybe we could limit that to the type or its metadata have been changed.

Linker should "play it safe", and so the first implementation would simply mark all assemblies for which we call WriteAssembly.

Would that work with the deterministic output if we forcibly "save" assembly and inject the attribute so in the process changing it from the original?

@vitek-karas
Copy link
Member Author

Deterministic output only means that given the same inputs we produce the same outputs. I don't expect the decision to "save" would be non-deterministic. And thus insertion of the attribute would also be deterministic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants