Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: C# compiler should automatically store statically definable Expression-based lambda expressions as lazy, statically initialized fields #2404

Closed
marksmeltzer opened this issue Apr 7, 2019 · 22 comments
Assignees
Milestone

Comments

@marksmeltzer
Copy link

marksmeltzer commented Apr 7, 2019

Problem: lambda expression trees are always created fresh at runtime, even when they could be cached. This has substantial runtime impact on Linq-to-Sql performance, as well as other use cases involving dynamic lamda expressions.

The C# compiler is already smart enough to compile this code:

public static void Main ()
{
  Function<string> func = () => "hello world";
  Console.WriteLine( func() );
}

Into something that looks like this behind the scenes:

private static Function<string>_funcField;
public static void Main ()
{
  Function<string> func = _funcField;
  if (func == null)
  {
    func = ()=> "hello world";
    var currentFunc = Interlocked.CompareExchange( _funcField, func, null );
    func = currentFunc ?? func;
  }
  Console.WriteLine( func() );
}

We should have the same thing happen for lambda expressions trees!

The benefits of doing this for lambda definition can be enormous in certain use cases. Currently, even conceptually static lambdas will be dynamically allocated unless they are specifically stored in static fields.

  1. Systems that dynamically compile or process user-defined lambdas would be able to check whether the specified lambda already exists in some cache based on instance identity. If the lambda is already cached, some expensive operations could be avoided. Currently we would have to do a full walk of the lambda to calculate a deep hashcode based on all expression properties – or worse, to get the hashcode based on the “ToString” result of the lambda
  2. Code that passes a statically defined lambda as a parameter, etc., frequently or in loops would execute much faster with much less runtime allocation

My best work around currently is to pass performance sensitive lambdas with a wrapper function that returns the lambda. Then I can check runtime identity on that delegate since it should be statically cached as described above, and then I can maintain a cache based on the delegate’s identity.

That workaround works, and achieves the desired performance characteristics, but it is very cumbersome and should not be necessary! Doing this lifting automatically at compilation time would benefit all developers using lambda expressions.

  Function<Expression<Function<string>>> func = () => ()=> "hello world";
  //func will be statically cached, and now I can pass it around to performance sensitive code, since the code can maintain a cache based on the identity of the func instance
  //however, this line will always produce a new instance, and thus could only be called on cache misses:
  var expression = func();

Writing that kind of wrapper code is non-intuitive to the extreme, and it could easily be handled by the compiler.

Moreover, if this feature is added, it would be extremely useful to provide a new corollary property on Expression called IsStaticallyDefined or some such. That would allow the code that maintains a cache to decide whether to skip storing the specified lambda in the identity cache, for instance.

Regards,

@CyrusNajmabadi
Copy link
Member

there is already a PR for this: dotnet/roslyn#6642

Primary issue is that this can both be a breaking change (because some code may have depended on multiple instances of the delegate being created), as well as it potentially violates the .net security model (which tracks information like what stack were you on when the delegate was created). These are all being discussed, and i could imagine something changes in the future to allow this to go through. However, in the interim, this language proposal isn't really needed as the issue is understood with the right people looking at it.

Thanks!

@svick
Copy link
Contributor

svick commented Apr 7, 2019

@CyrusNajmabadi I believe this issue is about caching expression trees, so I don't think that PR is directly related. (Though it's possible someone could depend on multiple instances of an expression tree being created, so it's not completely unrelated either.)

@marksmeltzer
Copy link
Author

marksmeltzer commented Apr 7, 2019 via email

@CyrusNajmabadi
Copy link
Member

@CyrusNajmabadi I believe this issue is about caching expression trees, so I don't think that PR is directly related.

Right. But i was placing it in the same bucket because (to me at least), it's effectively the same thing. Is there a single delegate/expression-tree created.

(Though it's possible someone could depend on multiple instances of an expression tree being created, so it's not completely unrelated either.)

Right. This would be a potential break. And it would have to be evaluated if that was ok. To make this happen, it would be absolutely vital that the entire Expression future of the API and language was specified to be immutable.

@marksmeltzer
Copy link
Author

marksmeltzer commented Apr 7, 2019

Expression trees and delegates have almost nothing to do with each other, other than (a) C# offers similar mechanisms for declaring both as lamdas, and (b) some expressions trees can be compiled to delegates.

The internals and compiler support are very different sub-systems.

The only context where it makes sense to compare them is in the declaration syntax: they should have roughly the same syntax capabilities.

@CyrusNajmabadi
Copy link
Member

Expression trees and delegates have almost nothing to do with each other, other than (a) C# offers similar mechanisms for declaring both as lamdas, and (b) soarme expressions trees can be compiled to delegates.

I mean... i disagree :) As you mentioned, through the language they're exposed in an extremely similar mechanism. Expr trees were introduced right when we pushed hard in the language for the idea of being able ot easily pass code around. Delegates allow you to do that in an in-memory form. Expr-trees are useful for the same, jsut allowing for deeper introspection to enable other scenarios.

And, in the context of this issue, for both, there is hte open question of: should the language specify that it is open to the impl that you may get the same instance in memory, and given the constraints of the real world, should this be something the compiler should move to doing.

@CyrusNajmabadi
Copy link
Member

The only context where it makes sense to compare them is in the declaration syntax: they should have roughly the same syntax capabilities.

note: i was one of hte people who helped design and implement these originally. IMO from the time, it was very desirable to me to think of them as effectively nearly the same thing. Just that the Expr form was the introspectable version, and the the delgate-form was not. There was even an open question of would we have the second, or would that just be an optimization step performed on the first. That didn't happen mostly due to both the extra perf cost of that step, as well as the desire to minimize the surface-area of expr-trees. But, from a design perspective, we really were trying to think of them as being very close to each other (with the split only really necessary because of practical real-world time/design concerns).

@marksmeltzer
Copy link
Author

marksmeltzer commented Apr 8, 2019

I get your point about design intent.

I have also commented on dotnet/roslyn#6642, and I'm following the related issues.

In my mind, this request and the other have similar goals in terms of performance improvements, but the big, no huge, difference is that there is a very effective and simple to implement workaround for the delegate caching issue.

Not so for the expression tree caching problem. And the hack I am using is arguably an abuse.

Secondly, I don't want this request lumped with the aforementioned delegate request because the considerations blocking that issue from being delivered do not apply here.

Feel free to correct me if I am around about security implications, etc. I'm not aware of any such risks.

@marksmeltzer
Copy link
Author

marksmeltzer commented Apr 8, 2019

@CyrusNajmabadi, if there is some concern about expression trees being mutable, add a Freeze() method and an IsFrozen property. In that case, the compiler optimization could kick in only when the lambda declaration is done in an explicit manner - something like this:

Expression<Function<string>>.Immutable staticallyCachedImmutableExpression = () => "hello world";

The Immutable above would be a simple wrapper struct with implicit conversion operators back to the expression. And the compiler can check for that explicit pattern to implement the caching pattern.

The constructor for the Immutable struct would simply assert that IsFrozen on the passed expression is true.

That solves the compatibility issue as well: it would be entirely opt-in.

@marksmeltzer
Copy link
Author

marksmeltzer commented Apr 8, 2019

Although the solution I just proposed has an obvious defect: it doesn't lend well to inline expression trees.

I could live with something like this, however, in the worst case where this was the only solution:

SomethingCool( Expession.Freeze( () => "hello world" ) );

Same idea here: the compiler would recognize the pattern and implement the static cache if the expression meets the cachability requirements.

Although I would greatly prefer the simpler approach that requires immutable expressions:

SomethingCool( () => "hello world" );

However, the original suggestion should allow for opt-in if an appropriate overload is defined:

public void SomethingCool( Expession<Function<string>>.Immutable cachableExpression) {} 
... 

SomethingCool( () => "hello world" ); //should trigger the caching pattern in the compiler! 

As an author of APIs that accept expressions, I would happily provide those additional APIs in order to boost runtime performance.

Additionally, overloads for those APIs could be added to Linq-to-Sql (EF) for improved performance.

Caveat: this workaround only applies if (a) mutability is a genuine concern, and (b) preserving the default create-a-new-instance-everytime-by-default behavior is required for some reason.

Regards,

@marksmeltzer
Copy link
Author

@CyrusNajmabadi any thoughts?

@CyrusNajmabadi
Copy link
Member

@marksmeltzer I think it would be great for the language to open itself to the idea that these can be cached and for the impl to follow. I have no issues with that. I think it would be fine for both delegates and expr-trees.

@marksmeltzer
Copy link
Author

@CyrusNajmabadi do you think this could move forward without waiting on dotnet/roslyn#6642? I don't think the expression trees have the same potential security implications.

@CyrusNajmabadi
Copy link
Member

@marksmeltzer There's a bit of uncertainty on my part if this should actually be a language issue (i.e. this would require a language change), or if this can just be done on the compiler side. It would probably be good to hve a knowledgeable LDM member answer that.

If this is a language issue, it would have to stay here, and it would only move forward if an LDM member 'championed it.

If this can just be treated as a compiler optimization, you would have to go through a process over at dotnet/roslyn to see how to move forward with this.

@marksmeltzer
Copy link
Author

marksmeltzer commented Apr 11, 2019

@CyrusNajmabadi Ah, I see your point.

In cases like this, that could imply both a clarification to the C# spec as well as implementation to the corresponding C# compiler, which proposal leads the way forward?

I can make an equivalent proposal to the roslyn project... but this seemed to me like it should be driven from the C# language spec.

@CyrusNajmabadi
Copy link
Member

Since there is an open question. I would leave it here. It can always be closed in the future if it's felt to not be a language issue.

@gafter gafter self-assigned this Apr 11, 2019
@gafter gafter added this to the Likely Never milestone Apr 29, 2019
@gafter gafter changed the title Feature request: C# compiler should automatically store statically definable lambda expressions as lazy, statically initialized fields Feature request: C# compiler should automatically store statically definable Expression-based lambda expressions as lazy, statically initialized fields Apr 29, 2019
@gafter
Copy link
Member

gafter commented Apr 29, 2019

The LDM looked at this today and decided that it doesn't rise to the level of deserving a language feature. If it is important in your application, you can do it yourself.

@guillaume86
Copy link

@marksmeltzer did you create a roslyn issue? Would gladly upvote it as I like to (ab)use Expressions trees.

@marksmeltzer
Copy link
Author

@guillaume86 good idea. Have not done it yet. Will let you know.

@marksmeltzer
Copy link
Author

@gafter Thanks for the info. Could you (briefly) provide some info on what "do it yourself" means?

(As discussed in the description, there's not currently a reasonable means of achieve a performant result from the language itself. The trick I mentioned using delegate expression mediators are quite the hack and aren't scalable as they require APIs to expose Func<Expression> accepting variants.)

@gafter
Copy link
Member

gafter commented Jun 10, 2019

What I mean by "do it yourself" is to define your own static field and initialize it with a compiled expression tree. Lazily, if you need that.

@olmobrutall
Copy link

I know is an old issue, but wouldn't this feature play very nice with https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/proposals/csharp-9.0/static-anonymous-functions

I think it will solve the backwards compatibility problem and, being consistent, be very discoverable:

//Creates always a new expression tree
Expression<Func<Author, object>> name = a => a.Name; 

//Lazy and cached expression tree
Expression<Func<Author, object>> name = static a => a.Name; 

Static expression trees have the same restrictions as static anonymous functions: You can not du closures over local variables.

You could also mix them in the same IQueryable query.

db.Authors.Where(a => a.Name == name).Select(static a => new { a.Id, a.Name, a.DateOfBirth });

And the performace improvements will be woth ti, since Selects are the most common and complex expression trees but they are typically not closing over local variables.

An Analyzer could also suggest to introduce static.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants