Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proposal for enhanced dataflow analysis #988

Merged
merged 7 commits into from
Apr 30, 2020

Conversation

MichalStrehovsky
Copy link
Member

This would let us expand on what reflection use can the linker reason about.

The main problem it's trying to tackle is whether we can build a solution that has:

  1. Actionable warnings when code is not linker-safe
  2. Gives confidence that when there are no warnings, no problems will occur at runtime

Being able to make arbitrary code linker-safe is an explicit non-goal – from our past experience we know we can’t provide tools that meet goal 1 & 2 for that: it would be solving the halting problem.

Cc @vitek-karas @marek-safar

This would let us expand on what reflection use can the linker reason about.

The main problem it's trying to tackle is whether we can build a solution that has:

1) Actionable warnings when code is not linker-safe
2) Gives confidence that when there are no warnings, no problems will occur at runtime

Being able to make arbitrary code linker-safe is an explicit non-goal – from our past experience we know we can’t provide tools that meet goal 1 & 2 for that: it would be solving the halting problem.

* Linker-safe: most code will fall into this category. Linking can be done safely based on information in the static callgraph.
* Potentially unsafe: call to the method is unsafe if the linker cannot reason about a parameter value (e.g. `Type.GetType` with a type name string that could be unknown)
* Always unsafe: calls to these methods are never safe (e.g. `Assembly.ExportedTypes`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this example to be an interesting one.

Consider

int totalExportedTypes = 0;
foreach (var type in typeof(int).Assembly.ExportedTypes)
            {
                 totalExportedTypes++;
            }
if (totalExportedTypes != <some expected number>)
 throw;

typeof(int) is easily seen by the linker. Assembly is easy. and so is ExportedTypes. The linker could mark all of the exported types. The behavior of the linked assemblies would match the original program.

So the linker could correctly mark all of the dependencies of the reflection API, however, doing so would result in over preserving in most real world use cases since I imagine inside that loop there would normally be some sort of filtering happening on the types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's confusing to call anything just linker-unsafe - linker could keep the entire app and everything would be safe. So maybe a better way is to say that we choose not to support the pattern in the linker as we're not able to achieve good trimming results with it.

Code patterns like this example are typical in Dependency Injection (or similar) scenarios - I think that we can potentially provide different ways of solving that problem. But that's outside of this proposal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confusing to call anything just linker-unsafe

Would it be better to talk about this linker-compatible/incompatible or linker-friendly/unfriendly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

safe/unsafe meaning typically relates to security. This has not much to do with security.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been using linker-friendly in some places already - so I'd be happy with that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linker could mark all of the exported types

It could, but it negates a lot of benefits of the linking (one unfortunate API call and suddenly none of the public types get linked away). It doesn't seem to be worth the effort to make this work.

The problem is also that once we have the type back, people are going to do other things with it (not just count them). So we either need to understand IEnumerable<Type> or just keep all members to be sure.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"safe/unsafe meaning typically relates to security" I agree that it's good to not pollute the use of this convention, even though that's all it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree, we wouldn't want to support the case of assembly getting types.

However, what about typeof(Foo).GetFields()? in our linker we will mark all of the fields of Foo if we see this. Our rational was, if you are getting the fields of the specific type you probably care about those fields and the odds of an undesired ripple effect are much lower. I will say I was/am on the fence about it. I'm curious what stance you guys take.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, what about typeof(Foo).GetFields()

I think keeping all fields on Foo in this case would align with how member access is treated in the proposed MemberKinds enum - type-level granularity feels more acceptable than assembly level granularity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current proposal is to mark this with AllFields which would get us basically the same behavior as you already have. I personally think that we should be very careful about including more types than "necessary", but I'm much more comfortable including more members.

static void Main()
{
// SAFE: _otherType was annotated as "some type that has all methods kept"
_otherType.GetMethod("RandomMethod");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this example more or point to where the pattern is used in CoreLib so I can understand the need a little better?

On the surface this pattern could become always safe by putting the GetType and GetMethod together.

Type.GetType("Foo, FooAssembly").GetMethod("RandomMethod");

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure which pattern in CoreLib had @MichalStrehovsky in mind, but Lazy<T> almost fits this as well. Some of the .ctors which don't take a valueCreator effectively need the T to have a default .ctor (and be instantiatable). Internally it's implemented by eventually calling the Activator.CreateInstance on it.

Now the way the code is currently written can't be directly annotated with the proposed attributes (since it actually doesn't store the Type instead it stores a boolean and relies on the T), but it could be refactored relatively easily to fit the pattern.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put GetMethod, this this one is more interesting with Activator.CreateInstance. E.g. here:

https://github.com/dotnet/runtime/blob/0a2f0e6da98a784601b37d19417b930a33b2c478/src/coreclr/src/System.Private.CoreLib/src/Internal/Runtime/InteropServices/ComActivator.cs#L610

Of course they could cache the ConstructorInfo instead, eliminating the need for the annotation.

But the building block (being able to annotate a field) is necessary to get custom attributes that have System.Type properties or constructor arguments working. I would like to make those work because they're pretty common.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could be refactored relatively easily to fit the pattern.

Would this refactoring make Lazy<T> more expensive?

Copy link
Contributor

@mrvoorhe mrvoorhe Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our linker would detect and correctly mark all of the reflection calls in LicenseInteropProxy() without the help of any annotations. So those are OK.

You are right that object licContext = Activator.CreateInstance(_licInfoHelper)!; is the tricky part.

Thinking out loud here. If we detected the type that is assigned to the field, then we could store that knowledge. Then when we come across the Activator.CreateInstance having the field passed to it we could mark the ctor of all types that had been assigned to that field. So I think we could actually automatically handle this scenario.

I guess what I'm dancing around is I'm trying to figure out is the pros/cons of this proposal vs attributes such as [RequiredMember] that I've discussed here #797 (comment). We are accumulating support for a variety of these attributes to start trying them out. I don't want to introduce things that would conflict with the wider .NET ecosystem but at the same time our use cases are slightly different which is why I'm trying to understand exactly what scenarios you are trying to solve.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case we can store just the Type - it will seem redundant since the T is already there, but if we call it "typeToCreate" or something along those lines it should be OK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing Type would be better, but it still not clear how it would be done without introducing regression. Lazy<T> is perf critical e.g. for FSharp, and great deal of work went into making it as lean as possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Lazy<T> would be solvable by one of two ways:

Mark the Lazy<T> constructor overload that makes it call into CreateInstance<T> as LinkerUnsafe & mark the CreateViaDefaultConstructor() as "linker shut up, this is safe".

Or,

We can implement the extension Vitek added as a TODO to the doc:

class Lazy<[DynamicallyAccessedMembers(MemberKinds.DefaultConstructor)] T>
{
    // ...
}

This would ensure the Activator.CreateInstance<T> within lazy is not marked linker unsafe.

The drawback of the second approach is that now every use of Lazy<T> from generic code needs to be annotated because the annotations are contagious, even though the default constructor might not be needed if a factory delegate was provided:

void Frob<T>()
{
    // This is now unsafe because the T is not marked [DynamicallyAccessedMembers(MemberKinds.DefaultConstructor)]
    // To fix it, we need to mark the T in Frob too
    Lazy<T> lazy = ...
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with annotations that just say "keep this" is that they don't have a connection to the specific reflection API usage within the method - they cannot be used in validating that the actual usage is safe. The linker either has to trust that the annotation on the method itself

Yes, I agree that is a down side. We have users doing things that would fall into the "you should use source generators" category. Many of them want something that is better than nothing which is where [RequireMember] style attributes come in.

I like the double checking that the connection allows, but I'm wondering if the fact that the connection can be made means that the linker can just auto detect these things. @MichalStrehovsky Is there a use case where the connection is maintained but the linker cannot auto detect the usage? As I mentioned in my previous comment, the example you pointed to could be automatically detected I believe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To some degree it's a tradeoff between complexity in the linker and explicit behavior in the API. For example, in some cases linker may be able to "propagate" these annotations to callers (over several jumps) - but then if one of the callers doesn't satisfy the necessary requirement (for example it passes in a Type which can't be figured out) the warning message would be relatively confusing as there would be no notion of it in the source code. The warning would say something like "The parameter inputType of GetSomeData requires a known type with all its fields" but the method GetSomeData would have nothing on it which would hint at that fact. Making the annotations explicit means that the behavior is to some degree self-documenting.

As for cases which we can't hope the linker to figure out: Casts (to and from object specifically), collections, and so on. In these cases the annotations could be used to hint the linker what's going on, but I don't think the linker could figure it out on its own.

@MichalStrehovsky
Copy link
Member Author

dotnet/runtime#33861
dotnet/runtime#33862
dotnet/runtime#30902

Are the API proposals with the attribute additions.

@vitek-karas
Copy link
Member

I think there's one more thing we should make part of the proposal - and that's the behavior of some of the core reflection methods with regard to the proposed attributes. Namely (and similar APIs):

  • Type.GetMethod
  • Type.GetProperty
  • Type.GetField
  • Type.GetMember
  • Type.GetNestedType

If the linker can determine the name and type statically it will mark only that member, but what happens if it can't determine these values statically? For example, would we mark the Type in Type.GetMethod as DynamicallyAccessedMemberKind.Methods ?

I think we'll have to try it, but my gut feel would be something like:

  • For methods, nested types and members the linker will NOT automatically mark all of them (unless the developer does that manually by annotating the type parameter as such). Meaning that typeof(Foo).GetMethod(unknownString) will issue a warning.
  • For fields, properties and events the linker would fallback to "mark all" semantics, so typeof(Foo).GetProperty(unknownString) would not warn, instead if would mark all properties on the type.

Reasoning is that including all fields is cheap, properties should be relatively small and not bring in more dependencies (same for events), Including all methods effectively means including the entire type... so that is "dangerous".

Thoughts?

@vitek-karas
Copy link
Member

And another detail to specify: What is the behavior of these annotations with regard to Static/Instance members. For example if a type is annotated as DynamicallyAccessedMemberKind.Properties - does that mean only instance properties, or both instance and static properties?

@MichalStrehovsky
Copy link
Member Author

MichalStrehovsky commented Mar 27, 2020

We already put in a precedent for constructors in the prototype (Activator.CreateInstace). If we can't figure out which specific constructor it is, we demand all. If we're going to do the same for properties, I think there's value in being consistent and doing the same thing for methods too. Maybe we should log an issue on that and play with it on some real apps (how often does that kick in and what's the impact).

And another detail to specify: What is the behavior of these annotations with regard to Static/Instance members. For example if a type is annotated as DynamicallyAccessedMemberKind.Properties - does that mean only instance properties, or both instance and static properties?

I'll add a note - it would be both static and instance. We have enough bits that we could expose additional flags on DynamicallyAccessedMemberKind - DynamicallyAccessedMemberKind.PublicInstanceProperties, DynamicallyAccessedMemberKind.PublicStaticProperties, DynamicallyAccessedMemberKind.PublicProperties, etc.

The reason why I went with the public/non-public split is because the "overloads for lazy people" (Type.GetMethod(string), et al) default to searching for public static+instance and there might be enough use of that specific overload in the wild to make it worthwhile. Adding the static angle might be too much noise. I think we'll see in the API review for the enum.

The other thing to think about is that these APIs (except for the one that deal with ctors) search in base types too, so if we don't know the specific name, this could be referring to any member from the bases too.

* `Type.GetProperty`
* `Type.GetField`
* `Type.GetMember`
* `Type.GetNestedType`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are GetDeclaredMethod, etc. intentionally missing in this list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I added this as a response to Vitek's comment - it's not trying to be an exhaustive list. We already e.g. special case methods on System.Linq.Expressions, so this will be a pretty long list. I'll add "for example" because trying to keep this up-to-date would be a vain effort.

* `Type.GetField`
* `Type.GetMember`
* `Type.GetNestedType`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Are going ..." reads like a question, but it is not actually question. Maybe say "The following methods are going to be special cased by linker:" in front of the list instead.

@marek-safar marek-safar merged commit e4568b5 into dotnet:master Apr 30, 2020
@MichalStrehovsky MichalStrehovsky deleted the flowDoc branch May 1, 2020 07:57
tkapin pushed a commit to tkapin/runtime that referenced this pull request Jan 31, 2023
Co-Authored-By: Petr Onderka <[email protected]>
Co-Authored-By: Jan Kotas <[email protected]>

Commit migrated from dotnet/linker@e4568b5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants