Add proposal for enhanced dataflow analysis #988

MichalStrehovsky · 2020-03-11T14:59:23Z

This would let us expand on what reflection use can the linker reason about.

The main problem it's trying to tackle is whether we can build a solution that has:

Actionable warnings when code is not linker-safe
Gives confidence that when there are no warnings, no problems will occur at runtime

Being able to make arbitrary code linker-safe is an explicit non-goal – from our past experience we know we can’t provide tools that meet goal 1 & 2 for that: it would be solving the halting problem.

Cc @vitek-karas @marek-safar

This would let us expand on what reflection use can the linker reason about. The main problem it's trying to tackle is whether we can build a solution that has: 1) Actionable warnings when code is not linker-safe 2) Gives confidence that when there are no warnings, no problems will occur at runtime Being able to make arbitrary code linker-safe is an explicit non-goal – from our past experience we know we can’t provide tools that meet goal 1 & 2 for that: it would be solving the halting problem.

mrvoorhe · 2020-03-11T15:48:32Z

docs/design/reflection-flow.md

+
+* Linker-safe: most code will fall into this category. Linking can be done safely based on information in the static callgraph.
+* Potentially unsafe: call to the method is unsafe if the linker cannot reason about a parameter value (e.g. `Type.GetType` with a type name string that could be unknown)
+* Always unsafe: calls to these methods are never safe (e.g. `Assembly.ExportedTypes`).


I find this example to be an interesting one.

Consider

int totalExportedTypes = 0; foreach (var type in typeof(int).Assembly.ExportedTypes) { totalExportedTypes++; } if (totalExportedTypes != <some expected number>) throw;

typeof(int) is easily seen by the linker. Assembly is easy. and so is ExportedTypes. The linker could mark all of the exported types. The behavior of the linked assemblies would match the original program.

So the linker could correctly mark all of the dependencies of the reflection API, however, doing so would result in over preserving in most real world use cases since I imagine inside that loop there would normally be some sort of filtering happening on the types.

I think it's confusing to call anything just linker-unsafe - linker could keep the entire app and everything would be safe. So maybe a better way is to say that we choose not to support the pattern in the linker as we're not able to achieve good trimming results with it.

Code patterns like this example are typical in Dependency Injection (or similar) scenarios - I think that we can potentially provide different ways of solving that problem. But that's outside of this proposal.

confusing to call anything just linker-unsafe

Would it be better to talk about this linker-compatible/incompatible or linker-friendly/unfriendly?

safe/unsafe meaning typically relates to security. This has not much to do with security.

We've been using linker-friendly in some places already - so I'd be happy with that.

The linker could mark all of the exported types

It could, but it negates a lot of benefits of the linking (one unfortunate API call and suddenly none of the public types get linked away). It doesn't seem to be worth the effort to make this work.

The problem is also that once we have the type back, people are going to do other things with it (not just count them). So we either need to understand IEnumerable<Type> or just keep all members to be sure.

"safe/unsafe meaning typically relates to security" I agree that it's good to not pollute the use of this convention, even though that's all it is.

Yes, I agree, we wouldn't want to support the case of assembly getting types.

However, what about typeof(Foo).GetFields()? in our linker we will mark all of the fields of Foo if we see this. Our rational was, if you are getting the fields of the specific type you probably care about those fields and the odds of an undesired ripple effect are much lower. I will say I was/am on the fence about it. I'm curious what stance you guys take.

However, what about typeof(Foo).GetFields()

I think keeping all fields on Foo in this case would align with how member access is treated in the proposed MemberKinds enum - type-level granularity feels more acceptable than assembly level granularity.

I think the current proposal is to mark this with AllFields which would get us basically the same behavior as you already have. I personally think that we should be very careful about including more types than "necessary", but I'm much more comfortable including more members.

mrvoorhe · 2020-03-11T15:58:54Z

docs/design/reflection-flow.md

+    static void Main()
+    {
+        // SAFE: _otherType was annotated as "some type that has all methods kept"
+        _otherType.GetMethod("RandomMethod");


Can you elaborate on this example more or point to where the pattern is used in CoreLib so I can understand the need a little better?

On the surface this pattern could become always safe by putting the GetType and GetMethod together.

Type.GetType("Foo, FooAssembly").GetMethod("RandomMethod");

Not sure which pattern in CoreLib had @MichalStrehovsky in mind, but Lazy<T> almost fits this as well. Some of the .ctors which don't take a valueCreator effectively need the T to have a default .ctor (and be instantiatable). Internally it's implemented by eventually calling the Activator.CreateInstance on it.

Now the way the code is currently written can't be directly annotated with the proposed attributes (since it actually doesn't store the Type instead it stores a boolean and relies on the T), but it could be refactored relatively easily to fit the pattern.

I put GetMethod, this this one is more interesting with Activator.CreateInstance. E.g. here:

https://github.com/dotnet/runtime/blob/0a2f0e6da98a784601b37d19417b930a33b2c478/src/coreclr/src/System.Private.CoreLib/src/Internal/Runtime/InteropServices/ComActivator.cs#L610

Of course they could cache the ConstructorInfo instead, eliminating the need for the annotation.

But the building block (being able to annotate a field) is necessary to get custom attributes that have System.Type properties or constructor arguments working. I would like to make those work because they're pretty common.

it could be refactored relatively easily to fit the pattern.

Would this refactoring make Lazy<T> more expensive?

Our linker would detect and correctly mark all of the reflection calls in LicenseInteropProxy() without the help of any annotations. So those are OK.

You are right that object licContext = Activator.CreateInstance(_licInfoHelper)!; is the tricky part.

Thinking out loud here. If we detected the type that is assigned to the field, then we could store that knowledge. Then when we come across the Activator.CreateInstance having the field passed to it we could mark the ctor of all types that had been assigned to that field. So I think we could actually automatically handle this scenario.

I guess what I'm dancing around is I'm trying to figure out is the pros/cons of this proposal vs attributes such as [RequiredMember] that I've discussed here #797 (comment). We are accumulating support for a variety of these attributes to start trying them out. I don't want to introduce things that would conflict with the wider .NET ecosystem but at the same time our use cases are slightly different which is why I'm trying to understand exactly what scenarios you are trying to solve.

In that case we can store just the Type - it will seem redundant since the T is already there, but if we call it "typeToCreate" or something along those lines it should be OK.

Storing Type would be better, but it still not clear how it would be done without introducing regression. Lazy<T> is perf critical e.g. for FSharp, and great deal of work went into making it as lean as possible.

I think Lazy<T> would be solvable by one of two ways:

Mark the Lazy<T> constructor overload that makes it call into CreateInstance<T> as LinkerUnsafe & mark the CreateViaDefaultConstructor() as "linker shut up, this is safe".

Or,

We can implement the extension Vitek added as a TODO to the doc:

class Lazy<[DynamicallyAccessedMembers(MemberKinds.DefaultConstructor)] T> { // ... }

This would ensure the Activator.CreateInstance<T> within lazy is not marked linker unsafe.

The drawback of the second approach is that now every use of Lazy<T> from generic code needs to be annotated because the annotations are contagious, even though the default constructor might not be needed if a factory delegate was provided:

void Frob<T>() { // This is now unsafe because the T is not marked [DynamicallyAccessedMembers(MemberKinds.DefaultConstructor)] // To fix it, we need to mark the T in Frob too Lazy<T> lazy = ... }

The problem with annotations that just say "keep this" is that they don't have a connection to the specific reflection API usage within the method - they cannot be used in validating that the actual usage is safe. The linker either has to trust that the annotation on the method itself

Yes, I agree that is a down side. We have users doing things that would fall into the "you should use source generators" category. Many of them want something that is better than nothing which is where [RequireMember] style attributes come in.

I like the double checking that the connection allows, but I'm wondering if the fact that the connection can be made means that the linker can just auto detect these things. @MichalStrehovsky Is there a use case where the connection is maintained but the linker cannot auto detect the usage? As I mentioned in my previous comment, the example you pointed to could be automatically detected I believe.

To some degree it's a tradeoff between complexity in the linker and explicit behavior in the API. For example, in some cases linker may be able to "propagate" these annotations to callers (over several jumps) - but then if one of the callers doesn't satisfy the necessary requirement (for example it passes in a Type which can't be figured out) the warning message would be relatively confusing as there would be no notion of it in the source code. The warning would say something like "The parameter inputType of GetSomeData requires a known type with all its fields" but the method GetSomeData would have nothing on it which would hint at that fact. Making the annotations explicit means that the behavior is to some degree self-documenting.

As for cases which we can't hope the linker to figure out: Casts (to and from object specifically), collections, and so on. In these cases the annotations could be used to hint the linker what's going on, but I don't think the linker could figure it out on its own.

docs/design/reflection-flow.md

Co-Authored-By: Petr Onderka <[email protected]>

MichalStrehovsky · 2020-03-20T15:00:29Z

dotnet/runtime#33861
dotnet/runtime#33862
dotnet/runtime#30902

Are the API proposals with the attribute additions.

vitek-karas · 2020-03-26T17:49:53Z

I think there's one more thing we should make part of the proposal - and that's the behavior of some of the core reflection methods with regard to the proposed attributes. Namely (and similar APIs):

Type.GetMethod
Type.GetProperty
Type.GetField
Type.GetMember
Type.GetNestedType

If the linker can determine the name and type statically it will mark only that member, but what happens if it can't determine these values statically? For example, would we mark the Type in Type.GetMethod as DynamicallyAccessedMemberKind.Methods ?

I think we'll have to try it, but my gut feel would be something like:

For methods, nested types and members the linker will NOT automatically mark all of them (unless the developer does that manually by annotating the type parameter as such). Meaning that typeof(Foo).GetMethod(unknownString) will issue a warning.
For fields, properties and events the linker would fallback to "mark all" semantics, so typeof(Foo).GetProperty(unknownString) would not warn, instead if would mark all properties on the type.

Reasoning is that including all fields is cheap, properties should be relatively small and not bring in more dependencies (same for events), Including all methods effectively means including the entire type... so that is "dangerous".

Thoughts?

vitek-karas · 2020-03-26T17:51:01Z

And another detail to specify: What is the behavior of these annotations with regard to Static/Instance members. For example if a type is annotated as DynamicallyAccessedMemberKind.Properties - does that mean only instance properties, or both instance and static properties?

MichalStrehovsky · 2020-03-27T08:55:23Z

We already put in a precedent for constructors in the prototype (Activator.CreateInstace). If we can't figure out which specific constructor it is, we demand all. If we're going to do the same for properties, I think there's value in being consistent and doing the same thing for methods too. Maybe we should log an issue on that and play with it on some real apps (how often does that kick in and what's the impact).

And another detail to specify: What is the behavior of these annotations with regard to Static/Instance members. For example if a type is annotated as DynamicallyAccessedMemberKind.Properties - does that mean only instance properties, or both instance and static properties?

I'll add a note - it would be both static and instance. We have enough bits that we could expose additional flags on DynamicallyAccessedMemberKind - DynamicallyAccessedMemberKind.PublicInstanceProperties, DynamicallyAccessedMemberKind.PublicStaticProperties, DynamicallyAccessedMemberKind.PublicProperties, etc.

The reason why I went with the public/non-public split is because the "overloads for lazy people" (Type.GetMethod(string), et al) default to searching for public static+instance and there might be enough use of that specific overload in the wild to make it worthwhile. Adding the static angle might be too much noise. I think we'll see in the API review for the enum.

The other thing to think about is that these APIs (except for the one that deal with ctors) search in base types too, so if we don't know the specific name, this could be referring to any member from the bases too.

docs/design/reflection-flow.md

jkotas · 2020-04-07T15:13:31Z

docs/design/reflection-flow.md

+* `Type.GetProperty`
+* `Type.GetField`
+* `Type.GetMember`
+* `Type.GetNestedType`


Are GetDeclaredMethod, etc. intentionally missing in this list?

No, I added this as a response to Vitek's comment - it's not trying to be an exhaustive list. We already e.g. special case methods on System.Linq.Expressions, so this will be a pretty long list. I'll add "for example" because trying to keep this up-to-date would be a vain effort.

jkotas · 2020-04-07T15:16:44Z

docs/design/reflection-flow.md

+* `Type.GetField`
+* `Type.GetMember`
+* `Type.GetNestedType`
+


"Are going ..." reads like a question, but it is not actually question. Maybe say "The following methods are going to be special cased by linker:" in front of the list instead.

Co-Authored-By: Jan Kotas <[email protected]>

Co-Authored-By: Petr Onderka <[email protected]> Co-Authored-By: Jan Kotas <[email protected]> Commit migrated from dotnet/linker@e4568b5

MichalStrehovsky requested a review from marek-safar as a code owner March 11, 2020 14:59

mrvoorhe reviewed Mar 11, 2020

View reviewed changes

svick reviewed Mar 17, 2020

View reviewed changes

docs/design/reflection-flow.md Outdated Show resolved Hide resolved

Update docs/design/reflection-flow.md

69c55f6

Co-Authored-By: Petr Onderka <[email protected]>

This was referenced Mar 20, 2020

Add DynamicallyAccessedMembersAttribute dotnet/runtime#33861

Closed

Add RequiresUnreferencedCodeAttribute dotnet/runtime#33862

Closed

vitek-karas mentioned this pull request Mar 30, 2020

[feature/dataflow] Port Expression.* pattern recognition to new infra #1039

Merged

tillig mentioned this pull request Mar 30, 2020

Pipeline discussion autofac/Autofac#1096

Closed

eerhardt mentioned this pull request Apr 3, 2020

System.Text.Json.Serialization.JsonStringEnumConverter isn't linker-safe dotnet/runtime#34449

Closed

MichalStrehovsky added 2 commits April 7, 2020 15:09

/s/safe/friendly

0f9d0e9

Add note on intrinsic handling

6e7e143

jkotas reviewed Apr 7, 2020

View reviewed changes

docs/design/reflection-flow.md Outdated Show resolved Hide resolved

jkotas reviewed Apr 7, 2020

View reviewed changes

marek-safar mentioned this pull request Apr 8, 2020

Lazy doesn't preserve contstructor. #1071

Closed

MichalStrehovsky and others added 2 commits April 9, 2020 21:22

Update docs/design/reflection-flow.md

e897582

Co-Authored-By: Jan Kotas <[email protected]>

Update reflection-flow.md

417ad0e

MichalStrehovsky mentioned this pull request Apr 23, 2020

Add SuppressLinkerWarningAttribute dotnet/runtime#35339

Closed

eerhardt mentioned this pull request Apr 24, 2020

Add new attributes for ILLink analysis. dotnet/runtime#35387

Merged

Update based on API review

acd3597

marek-safar merged commit e4568b5 into dotnet:master Apr 30, 2020

MichalStrehovsky deleted the flowDoc branch May 1, 2020 07:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add proposal for enhanced dataflow analysis #988

Add proposal for enhanced dataflow analysis #988

MichalStrehovsky commented Mar 11, 2020

mrvoorhe Mar 11, 2020

vitek-karas Mar 11, 2020

jkotas Mar 11, 2020

jkotas Mar 11, 2020

vitek-karas Mar 11, 2020

MichalStrehovsky Mar 11, 2020

ericsampson Mar 12, 2020

mrvoorhe Mar 12, 2020

MichalStrehovsky Mar 12, 2020

vitek-karas Mar 12, 2020

mrvoorhe Mar 11, 2020

vitek-karas Mar 11, 2020

MichalStrehovsky Mar 11, 2020

jkotas Mar 12, 2020

mrvoorhe Mar 12, 2020 •

edited

Loading

vitek-karas Mar 12, 2020

jkotas Mar 12, 2020

MichalStrehovsky Mar 12, 2020

mrvoorhe Mar 13, 2020

vitek-karas Mar 13, 2020

MichalStrehovsky commented Mar 20, 2020

vitek-karas commented Mar 26, 2020

vitek-karas commented Mar 26, 2020

MichalStrehovsky commented Mar 27, 2020 •

edited

Loading

jkotas Apr 7, 2020

MichalStrehovsky Apr 9, 2020

jkotas Apr 7, 2020

Add proposal for enhanced dataflow analysis #988

Add proposal for enhanced dataflow analysis #988

Conversation

MichalStrehovsky commented Mar 11, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrvoorhe Mar 12, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichalStrehovsky commented Mar 20, 2020

vitek-karas commented Mar 26, 2020

vitek-karas commented Mar 26, 2020

MichalStrehovsky commented Mar 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrvoorhe Mar 12, 2020 •

edited

Loading

MichalStrehovsky commented Mar 27, 2020 •

edited

Loading