-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track macro dependencies at the library level for improved invalidation #55784
Comments
There are a few complications, at least in the analyzer.
var methods = await builder.methodsOf(superclassDeclaration);
methods[0].returnType; ...the method could have its
|
This should be an
Interesting point, this could be a challenge, but it would anyways result in an error in the user code right? So maybe we don't invalidate the macro, but the user does have to do something to resolve it either way. So it might be OK.
It does sound like you would need a separate cache for macro execution results (which happen to be serializable already). To simplify things, you could even group up the dependencies of all the macros that run on a single file, into one cached result. So you re-use the entire macro augmentation file or not, in which case the contents of the cache could be the literal resulting augmentation file, which would avoid doing the merging and optimizing/formatting etc also. The downside being if any macro in the file has to re-run, you have to re-run all of them. But, I think this would still have a huge positive effect overall and quite possibly simplify things further. It might even be faster on the whole, if you can cache the entire resulting merged and optimized augmentation file.
Yes, more incremental invalidation in general is a task for another time. I just want to get to a state where macros don't severely regress performance for incremental edits to a single file, when lots of libraries with macros depend on that file. |
To clarify I also want this optimization to work for large library cycles - we should still have a good invalidation story that is more granular than library cycles, ideally. There is a concern that lots of applications are just one big library cycle today. |
I updated the section above to capture the OmittedTypeAnnotation case as well as the new types being introduced case. |
Nothing super interesting yet, but some base to build upon. For now my thinking is that I need to get back from the macro framework the list of |
The caching today is always per macro application (per phase also). So it should be safe to do it on your end. Although the macro executor code could also do it, which might help with the eventual CFE implementation, if this is already done. |
I think the implementation is not so easy as it is described. Or at least there is complexity behind "except for the results that depended on the library which caused the invalidation". Or I don't understand the proposed solution, and think that there is implied simplicity where simplicity is not implied; or the opposite - I see complexity and don't understand that solution is actually simple. I'm looking from the perspective of loading libraries from the file system cache, after restart, with zero or more files possibly changed with the file system cache was filled. Not live updates of currently loaded libraries after a single file change, this does not fit well the current analyzer architecture. Let's consider two APIs. We are running macros in the library
There are First, the Second, the Third, the
How do we get the import 'a.dart';
import 'c2.dart';
import 'c3.dart'
@Macro1()
class X extends A {} If Otherwise, we will use Here, the fact that And this is not necessary even requires a compile-time error. I can imagine // c2.dart
import 'y.dart';
@Macro2() // generate A if Y extends Z
class C2 extends Y {} // c3.dart
import 'y.dart';
@Macro3() // generate A if Y does not extend Z
class C3 extends Y {} So that changes to So, it looks to me that almost always we can reuse either all macro results in the library cycle, or none. We will still mostly reuse macro results, and a change to a deep dependency will almost never invalidate every transitive consumer. But if we run macros, we will run them for the whole library cycle. Obviously, let me know if there are holes in my logic. |
Importantly, this does not add a dependency on all the files in the library cycle, only the immediate imports. Yes, it would add more dependencies, but as long as we are considering the final library API from those dependencies (after macro execution) we should usually still get incremental invalidation even within library cycles, unless one of the immediate dependencies macro outputs was affected by the change. This does somewhat break down if using "convenience libraries" which export tons of other libraries, etc. But it should scale based on the immediate imports + their exports still.
The dependency on the library defining
What specifically makes the library cycles different here? I am not really understanding what concretely makes them different from other import structures. In other words, lets say |
This is correct, strictly speaking we depend only of The implementation would be simpler if we redo macro results for the whole library cycle. But I guess the main question here is whether we can afford it, or think that there are large library cycles with many libraries that have macros applied. Although in this case, because we depend on immediate imports, we still with high probability will redo all macros, because there is a high probability that one of the immediate imports does use macros. So, if there are a few libraries with macros in the cycle, it does not matter much if we redo them all; and if there are many such libraries, we will redo them all anyway.
I'm a bit confused by "considering the final library API from those dependencies". When dependencies of a library are inside the same library cycle, at the moment when we about to start building the elements, there is no final API yet. We can know top-level declarations from user code, but don't know what macros will generate this time.
Yes, with library level dependencies exports are additional complication.
Yes, I supposed that macros in But my point was that external changes might cause changes to macros, and so
Well, in this example, it is true that if We can do per-library dependencies, I just wondered if this will make significant difference in the performance, while adding complication in the implementation. |
There is an assumption that large library cycles do exist in practice, and that we need to support them well. This is not an assumption based on evidence as far as I know, but more just based on an educated guess. It is at least plausible to assume these large library cycles do exist, given the fact that we do not tell users to split up their applications into several packages, and so it would be very easy to create large library cycles in apps without a lot of discipline in sub-package library structure.
Ah, this is the piece that I was missing. I think we actually require for library cycles, that you run each phase across all libraries in the cycle, before continuing to the next phase in any of them. They are treated essentially as if they are one large library, from this perspective. Because of that, you cannot run all the macros on just one library in the cycle (which was invalidated), in order to see if other things should be invalidated based on that result changing. Is that correct? |
Yes, I can see that it might happen.
Correct, we run each phase across all libraries in the cycle, before continuing to the next phase in any of them.
Yes. |
Given that, it does sound like a more complicated problem than I anticipated. @davidmorgan has some related ideas to share at the sync up today, so we can try to evaluate whether it solves this problem any better or not. |
Thanks Konstantin, thanks Jake. Per our chat yesterday: I think we should reframe this discussion in terms of data, which is simpler than dependencies. So the question is: what data can the macro host provide; what data does the macro need; how does it change as we run through the phases; when and how can input to one macro change due to output of another macro. Right now I don't understand enough about the execution model inside the analyzer or the CFE @johnniwinther so I am surely saying things that don't make sense :) ... one thing I'd like to get to is an "implementation doc" that describes what a macro host does in sufficient detail that it describes what is happening both in the analyzer and the CFE including any performance-relevant differences. For example if the analyzer and CFE necessarily take a different approach with different performance characteristics then that might need considering for the design as a whole. I realize that I have been unhelpfully working in docs instead of on GitHub, let me try to pivot to using GitHub more :) Suggestion, maybe we start a series of "implementation docs" living next to the feature spec and expanding on what is specified with implementation detail/design, would that be a good way to make progress? https://github.com/dart-lang/language/blob/main/working/macros/feature-specification.md I can go ahead and create some wrong outlines to be filled in with the actual details if that's a good way to start ;) Or, any other approach someone would like to suggest. Thanks. |
I shared a small document with you. Discussing dependencies is the same as discussing data. You get a dependency because you asked for a piece of data. Anything that could affect the provided answer, e.g. So, we still need to decide what to do with |
Thanks @scheglov, much appreciated :) I sent a PR adding your notes next to the macros spec dart-lang/language#3852 so we can refer to it + evolve into a more general doc including CFE. @johnniwinther a description like this for what the CFE does would be great please, so we can start to understand the differences :) Re: data or deps, you are right that the difference does not matter for correctness. However the optimization from thinking about data is likely to be very large :) ... it is in the nature of macros that they consume only a small fraction of the information from their dependencies, so almost all reruns triggered by changed dependencies are unnecessary. As an extreme example, most macros only consume a small fraction of the library they are applied in: probably a single class, when the library might have tens or hundreds of classes. So the % of same-library changes that actually need a rerun is already probably <10% even before looking at imports. Correctness questions are of course also important :) I'm going to keep notes on investigation around data-oriented macros on language #3706, I have started to check in exploratory code there that we can use for experiments around performance and correctness. |
https://github.com/dart-lang/language/tree/main/working/macros/dart_model now has a runnable example comparing the current macro implementation vs data model approach on a huge library cycle. The trivial macros in the example only want the list of field names, so the data model approach can easily know that the list of field names did not change, and only rerun for one file. There is a pretty big gap in complexity between this and something like the JSON macro, I will see next how far I can push it and what useful results I can get along the way :) |
You maybe think about dependencies as whole libraries or imports. For the purpose of fine grained invalidation the problem is not the dependencies, but outputs of macros. |
I have put some more thought into what I briefly mentioned in the macro implementation sync up. In order to work around the library cycle issue, it seems to me like we need to cache the macro outputs from each phase. In the situation mentioned above, with deps like A -> B -> C -> A, assuming macro applications in all 3 libraries, and an API change to C, we could do essentially the following:
@scheglov does that make sense? Does it sound feasible? |
We used to run the declarations phase for all libraries of the cycle at once, IIRC because we can introspect other types. During the types phase, I think we cannot reference any local identifier, right? /*macro*/ class MyMacro implements ClassTypesMacro {
const MyMacro();
@override
FutureOr<void> buildTypesForClass(declaration, ClassTypeBuilder builder) {
builder.declareType(
'X',
DeclarationCode.fromParts([
'class X extends ',
declaration.superclass!.code,
' {}',
]),
);
}
} @MyMacro()
class B extends A {} This macro throws an exception
We cannot resolve We could do theoretically var c = declaration.library.uri.resolve('c.dart');
builder.declareType(
'X',
DeclarationCode.fromParts([
'class X extends ',
await builder.resolveIdentifier(c, 'C'),
' {}',
]),
); But I think it should not be allowed to resolve identifiers from the same library cycles, because there is no defined order in which the types phase macro are applied across libraries. Maybe So, it seems to me that the types phase in |
For questions of correctness related to ordering, I think we should consider moving to a model with no ordering #3858 ... we can now compare using test cases, hopefully this can be a way to push to a convincing decision one way or the other. Regarding performance, I am now 90% convinced that evolving the current execution model into an optimal or near-optimal one is not possible. Fine-grained requests and tracking dependencies dynamically based on them cannot be as fast in a serialization-based execution model as broad requests and data structures that can be diffed / recombined. So I think it would be good to refocus effort on poking holes in the proposed new approach: if there are reasons it won't work, let's find them as quickly as possible; if it will work, let's switch to it. I am ready to add functionality/experiments/benchmarks as needed :) and welcome suggestions. What convinced me is the benchmark in my exploratory code, documented in the README, in which I apply three trivial macros to a 67k LOC library cycle with 64 files and 640 macro applications. In my exploratory code, the "macros" introspect by making three RPCs and receiving three responses. Not three responses per macro: three responses for the whole codebase. The data is then chopped up and passed to a generate method that is as nice to write as any other generate method String generateFor(Interface clazz) {
final result = StringBuffer();
result.writeln('augment class ${clazz.name} {');
result.writeln(' bool operator==(Object other) =>');
result.writeln(' other is ${clazz.name}');
for (final field in clazz.members.values.where((m) => m.isField)) {
result.writeln(' && other.${field.name} == this.${field.name}');
}
result.writeln(';}');
return result.toString();
} then the macros make another RPC each with the results, for a total of nine messages; this is a big example, so they are big results, about 3.5Mb of data:
so the current implementation has # of messages on startup: 640 * 3 * 2 * (# introspection calls) + 640 * 3 for the results, many thousands, when it's possible to cut this down to nine. Then when you go and change one file, the current implementation has to rerun everything because it's a library cycle, while the exploratory implementation just reruns one application; six messages sent, three updates to macro input and three (much smaller this time) updates to macro output.
So again it's thousands of requests vs just a few, and the analyzer only has to do work on the output A complex macro could need hundreds of introspection calls in realistic cases, so there will be large codebases where the current implementation will need the best part of a million(!) messages passed, redescribing most of the package from scratch, and the exploratory implementation still needs six, describing only what changed. As I said, this feels pretty conclusive to me--looking forward to hearing what others think. Thanks! |
I still don't actually fully understand what exactly is being proposed with the data model approach enough to be able to really provide good feedback at this time. I think we need some much more specific details about what is actually being proposed to understand what the potential pitfalls might be? |
@jakemac53 That's fair :) I'm short on time before the weekend starts here, let me summarize quickly and I can do further writeup as needed on Monday. Summary added to the query-like API issue since it belongs better there :) feel free to throw any questions/observations there, too. Thanks! |
@scheglov This identifier resolution is only happening in the augmentation library creation phase right? It seems likely that we will start implicitly including the "parent" library import scope into augmentation files (with the move to the "part" file unification). This might allow us to not do these We could even more generally rely on this kind of resolution for identifiers which came from the original library. Would that help here? I think the specific issue is the import might change in the augmentation? |
Ah, interesting. import 'c2.dart';
import 'c3.dart'
@Macro1()
class X extends A {} Well, when we talk about reusing macro generated augmentations, we want to make sure that the reused code is the same as we would create if we re-run macros in the library. It can mean something different (e.g. when Yeah, we can still have a conflict, here method import 'dart:math' as foo;
class A {
final pi = foo.pi;
void foo() {}
} So, this probably will solve the issue with types. |
No, I don't think this is OK for UX. |
This saves another 120 ms. Bug: #55784 Change-Id: I4b195f977b7e49cc5b07dcecd4836b3f25399428 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/372820 Commit-Queue: Konstantin Shcheglov <[email protected]> Reviewed-by: Brian Wilkerson <[email protected]>
I'm not quite sure I follow. If there are two unrelated macros running on the same library, why is it clearer to merge the output into one file? We see this today with built_value output, it has multiple generators: value types and serialization, and the output is merged to one file. Okay, it works. But usually when I go to the generated code I'm either looking for value type code or serializer code, not both, because they do completely different things. So as a user I would not be sad if the output is split to value type generated code and serializer generated code. It would be clearer. Here is some example large output that is merged :) Back to the question of performance, I'm not 100% clear on how these numbers relate to incremental performance. If we know a macro needs rerunning for just one library, which parts can we avoid rerunning for the whole library cycle? Which parts are forced to rerun for the whole library cycle? Thanks. |
They are related by being applied to the same classes.
If
Do you want to scary me with 7469 lines of generated code?
Here are new numbers, which get closer to previously outlined theoretical ones, we skip some extra parse.
These numbers are for the case when we don't have any cached macro generated results. Here, Which gives We need to run such operations as |
Another thought that occurred to me is that it might be not worth to attempt to reuse macro results. Here we can save at most Combined with the complexity of the implementation, this makes the idea less palatable. OTOH, we already improved performance by avoiding parsing code more often than necessary. |
It's a specific example where I do end up looking at generated code and would prefer it's split by generator, not merged. It's true the augmentations (or parts, in this case) are always related to the same underlying library, but they're much less closely related to each other. Anyway, I don't think splitting up is a great feature :) ... except if it turns out to bring a big performance improvement, then it might be worth considering. We could always come back to it later if so. It's odd to think of the IDE UX affecting the compile / incremental compile so much, it is a bit more strongly coupled than I'd like. But then, source code is exactly on that boundary between tooling and human use :) thanks. |
I do agree with Morgan that if merging the files is costing a lot, we should re-evaluate it. We could for example present a merged file to the user still, but not have it be used by the CFE/analyzer, possibly with a source map back to the real files. |
I don't know which macro this is or how many requests it is making. But, for other macros it would likely be much more worthwhile, so we need to keep that in mind. Another question - it seems like in the non-macro case you were able to avoid any re-parsing? Is that accurate? If so, it seems like to be competitive, we would also need to get to that same state. Could we cache merged+parsed augmentation results, in the case that all the results that would be merged together are identical? |
Another though that makes it even less palatable. Because I wanted to see impact of my changes to processing macro (no extra parsing, maybe caching), I used
This is for
Here is what we currently pay for merging.
Yes, for non-macro case we don't have to process
Yes, I planned caching merged code too, in case when all partial macro generated results were reused. That's why I initially subtracted corresponding You say "cache merged+parsed augmentation results". I did not think about serializing parsed AST recently. This might be faster than parsing. Or not. I don't know, I need to try implementing it to see the numbers.
There always be some additional cost of macros. |
For a non-cached result absolutely there will have to be a cost, but I am less concerned about that case. The big question here is around incremental changes, how cheap can we make them, and can it ultimately be equivalent to user written code if the macro doesn't re-run? I am a bit confused about the answer to that question currently, are there extra costs when macros are involved, even in the cached macro result case? It seems to me that the answer is yes, but it isn't clear to me where that extra time is going. |
Yes, there going to be extra costs even if we reuse cached results.
|
This saves another 160 ms. Bug: #55784 Change-Id: If626d616999b85edaaa4028cbe5fc70fbc1bf86a Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/372960 Commit-Queue: Konstantin Shcheglov <[email protected]> Reviewed-by: Brian Wilkerson <[email protected]>
Hopefully this isn't bad, but it definitely would depend on how sophisticated it was.
Yes, but I think that is fine. Especially if we can say "for which the result changed" - this is a potentially important difference. We should be able to re-use the merged result if the macro output was the same, even if it was re-ran.
Is this because of different logic or overhead from the augmentation declarations? Are we comparing identical code (functionally)?
Is this more expensive for macro code than user code? |
Hopefully. To know it with more precision I will need to implement it. One of the costs that I know right now, is that when the macro asks for methods for example, the returned declarations are "syntactic", and depend on the tokens of all these methods. My plan is to combine these tokens into a single signature, usually we use MD5 for this. But it is somewhat expensive to compute. I need to look if we can micro-optimize it.
True, if the result is the same after the re-run we still can reuse the merged code.
I presume that this is the same functionality. @davidmorgan can confirm or refute this. It looks that there are a few differences. Macro generated code has indentation, and uses augment library 'package:package_under_test/a16.dart';
import 'dart:core' as prefix0;
augment class A0 {
external A0.fromJson(prefix0.Map<prefix0.String, prefix0.Object?> json);
external prefix0.Map<prefix0.String, prefix0.Object?> toJson();
augment A0.fromJson(prefix0.Map<prefix0.String, prefix0.Object?> json, )
: this.a0 = json['a0'] as prefix0.int?,
this.a1 = json['a1'] as prefix0.int?, Manual code is un-indented and not prefixed. if (a97 != null) result['a97'] = a97;
if (a98 != null) result['a98'] = a98;
if (a99 != null) result['a99'] = a99;
return result;
}
factory A0.fromJson(Map<String, Object?> json) {
return A0._(
a0: json['a0'] as int,
a1: json['a1'] as int,
a2: json['a2'] as int,
Manual code does not use augmentations, it has all code for |
Yeah, I was wondering if it is more expensive to put code in an augmentation versus all in the main library? |
…to declarations. This saves us about 120 ms out of 2000. Change-Id: Ib352a78b39706b07fc2ee0d5b2c3e6e77d98119c Bug:#55784 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/373000 Commit-Queue: Konstantin Shcheglov <[email protected]> Reviewed-by: Brian Wilkerson <[email protected]>
It is hard to quantify. My best guess was to measure operations that merge members from augmentations into the |
Re: #55784 (comment) Decided to see if there is anything better than MD5.
For comparison on Rust
It looks that theoretically it could be |
Given that the MD5 implementation is much closer, it seems like possibly this library could just be optimized. I also noticed that the 64 and 32 bit versions are much faster, and probably would be fine to use here. |
I don't understand what you mean under And if we compare Dart's version of I suspect that |
OTOH, I might be wrong blaming MD5 per se.
The rest are other operations that feed data into it.
Some of it is |
However interestingly here we might be able to do less work when using macros :-) Any generated code is already based on the user written code and the signatures of libraries with macros. So, we don't need these |
It takes about Click to expand!
Hm... I probably over-estimated the savings for macro version. |
Yes
Right, my point was that if you compare MD5 on Dart & Rust, it is much closer (<4x). You could be right that the algorithm is just more amenable - but it seems worth checking on the Dart implementation to see if it is doing slow (if we want to dig into optimizing this). |
Sure, MD5 on Rust is not so much faster than MD5 on Dart. Which, given the measurements (under "expand" above): |
Bug: #55784 Change-Id: Ibe83b4eee8bbf47ee79193244f9233cdb1b6aece Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/373161 Commit-Queue: Konstantin Shcheglov <[email protected]> Reviewed-by: Phil Quitslund <[email protected]>
The Problem
Today whenever a library changes (it's API at least), all macros on all libraries that depend on that library (transitively) get re-ran.
In pathological cases, this can result in incremental analysis times which are O(# affected libraries * average time to run a macros per library). We can demonstrate today that if the number of affected macros reaches into the hundreds, this is noticeably slow (multiple seconds).
The need is most pressing here for the analyzer, since it will have to deal with this situation any time a character is typed in any file. Eventually, the CFE will also want to implement this for the incremental compiler, to improve hot reload performance.
Solution
We should track the dependencies of macro applications and only re-execute them if those dependencies change.
To start, "dependency" here should just be tracked at the library granularity. This should be sufficient to get what we want, without adding a huge amount of memory overhead. If desired we can later investigate more fine grained dependency tracking.
Implementation Details
For most API requests this should be relatively easy to implement - all
Declaration
objects sent to macros already have aLibrary
associated with them. We should track all of those libraries in a set (this could even be done in shared code potentially). Once a macro is done, that set of libraries should be saved in such a way it can be linked to the specific macro application, via expando or map etc.Whenever a library is invalidated by the API signature of a dependency changing, it should re-use the macro results it got previously, except for the results that depended on the library which caused the invalidation. Macros in the current library will always re-run, because they always are provided some declaration from the current library, and thus always depend on it.
The end goal of this should be that in most cases only the macros in the file which was edited are re-ran, or maybe a handful of others, but not all macros that might have been able to introspect on the changed library.
Harder cases
StaticType
Macros can ask for a StaticType, which encapsulates knowledge about the entire type hierarchy. We will need to record the libraries for all the types in that hierarchy when returning static types.
If we want the library tracking to be handled in shared code, we would need to add an
Iterable<Library>
field to the StaticTypeImpl so that we know which libraries a static type depends on.Constant evaluation
If/when we enable constant evaluation, we will also need to track the library dependencies of that, for any const identifiers that were read.
Identifiers in Code objects
Identifiers themselves also could change which library they came from, which would affect the imports in the generated macro augmentations.
For all calls to
resolveDeclaration
andresolveIdentifier
in the augmentation library code, we should track the Uris of the resolved identifiers as well, and add those to the set of dependencies.OmittedTypeAnnotation
When types are omitted, they might be inferred. We handle that in the macro apis through the special
OmittedTypeAnnotation
object, which we later allow you to infer through a separate call.When you do that inference, we need to add the proper dependencies for which libraries that inference depended on, which could get complicated.
New types being introduced in immediate imports
If a new type is added to the import scope, it could change how existing type annotations resolved, without a change to the libraries those types came from.
This usually would result in an error that anyways has to be resolved, but for core types it would not, and there might also be other situations like that, so we need to think carefully about it.
We could say that everything in the import scope is an implicit dependency, as an example (so all immediate imports and their transitive exports).
The text was updated successfully, but these errors were encountered: