File based metaprogramming #1864

lrhn · 2021-09-20T12:38:30Z

This is a (sketch/draft/initial) proposal for an approach to meta-programming/macros which puts very few restrictions on the code you can use to write macros (no circularity, basically) and focuses on ensuring that the result of compilation can always be represented by a consistent set of source files.

Dart File-Based Macros

Author: [email protected]
Version: 1.0

Goals

This is a proposal for a fully-featured and general meta programming functionality for Dart. The feature supports generating program code at compile- or analysis-time in a consistent and predictable way, and doing so using plain idiomatic Dart code.

The generated code will be inspectable during development and it will be possible to use the same meta-programming framework generate the code ahead-of-time and distribute the generated code instead of the generator.

Fundamentals

A Dart program is compiler, or analyzed, in a compilation environment.

A Dart program is defined by a set of libraries, which are loaded from a consistent set of files. Each library has a URL (possibly more if it contains parts), and the mapping from URL to source file is uniquely and consistently defined at compile-time by the compilation environment.

Further, the compilation environment defines the mapping from package: URLs to source files, the availability of dart:-URL libraries, and may define certain "environment declarations" or flags which are available to the compilation process.

The most central requirement here is that this environment is consistent across the entire program. Accessing the same file more than once must always yield the same source code, and accessing the same environment declaration more than once must always give the same value (if any).

Meta-programming works by adding files to the compilation environment during compilation. A file can be added at most once, and it cannot be changed after it has been added. Source files may reference missing files, and the program is considered incomplete until all required source files have been provided.

This requires some tools to be able to handle partial, incomplete programs until the program has been completed. At least the compiler needs to be able to detect, trigger and run the macros that should generate the missing files.

Language extensions

Top support this feature, the language also needs:

Partial class/mixin/extension declarations. Multiple declarations of the same class/mixin/extension are allowed in the same library, as long as at most one is not marked partial (or part to reuse a built-in identifier). The same member can be declared in multiple parts, as long as at most one is non-abstract and non-external (and exactly one if the class is not abstract).
```
class Foo {
  int get x;
}
extension Ext<R> on something<R> {
  int get y;  // whoa, abstract extension member!
}
mixin Mix<E> on List<E>, Queue<E> {
  int get z;
}
// in another part file
part class Foo implements Bar { // Add extra interface
  int get x => 42; // Implement abstract from other part
  int bar() => x; // Add new member.
}
part extension Ext<T> { // At least one needs an `on` type, all must agree
  int get y => 42;
}
part mixin Mix<T> implements QueueList<T> {
  int get z => 42;  
}
```
Needs the part prefix in order to omit otherwise required parts of the declaration. Part classes cannot apply mixins, only members (because if two part classes both apply mixins, they have no natural application order).
Part files with local import and part directives (so generated files can trigger more generated files).

Macros

A macro is a Dart script. That is, a Dart library with a main method. The macro script is triggered by the compiler when encountering a special annotation in the program being compiled. The annotation includes the URL of the script to be run.

The compiler then compiles that script in the same compilation environment as the annotated program. It is, as usual, an error if the macro script cannot be compiled for any reason, including missing source files which haven't been generated yet. The macro script, or the libraries it depends on, can contain other macro annotations, and trigger other macros to be compiled and run.

It's a compile-time error if a macro annotation with a URL denoting the same macro script is encountered during compilation of the macro script (which includes all the libraries transitively imported by the script library, and the libraries of all macros transitively used by the macro script itself). That is, you cannot have a cyclic dependency between macro scripts, with macro annotations being a new kind of dependency.

When the macro script has been compiled, it is run. The script's main method is invoked with a list containing a single String argument pointing out the annotation which originally triggered it, which is a string representation of the source URL containing the annotation followed by a fragment pointing to the position, like #offset=4123&line=124&column=2, and the second argument is a "macro context" object. The type of this object is defined in dart:macros as an abstract interface. The implementation is implicitly included in the macro script compilation. (Providing the macro context as an argument makes it possible to mock it for testing, instead of, say, getting it as Macros.currentContext.)

The goal of running a macro is to provide one or more missing files to the compilation environment. It's technically possible to not generate any code at all, and just act as an extra validation step, but most macros will want to actually do something.

The macro context object provides functionality to:

Read any file in the compilation environment.
Parse and perform (preliminary) type inference on any existing library, providing a type-annotated AST. The type inference is preliminary because some types might still be missing until all missing files have been created. This is a default implementation, the macro can choose to use its own instead.
Access the annotation itself. Either the annotation object can be accessed directly (which requires it to be deserializable in the macro isolate, so the types of all objects making up the annotation must exist in the macro isolate) or indirectly through a mirrors-like introspection feature (which always work, but might be asynchronous).
Write files to the compilation environment (and create directories if necessary). These files, usually only one, can be cached in-memory by the compiler. It might be possible to tell the compiler to do a write-through. It's possible for a macro to check if the file already exists, and thereby support pre-generating, or even have the macro not get called if the file exists.
Tell when the macro is done. That will kill the isolate (using the sendAndExit feature to report back to the compiler).

We can provide helper libraries for analyzing source files, for generating source files, but in the end, the macro just writes a file. They can generate the content in whatever way they want, as long as they write it back through the macro context.

Example macro:

library my_enum_macro;

import "dart:macros";
import "my_enum.dart" show MyEnum;

void main(List<String> location, MacroContext context) async {
  var annotationLocation = location.first;
  var annotatedFile = Uri.parse(annotationLocation).removeFragment();
  var annotation = (await context.annotation) as MyEnum;
  assert(annotation.scriptUrl == "package:my_enum/macro/my_enum_macro.dart");
  // Example annotation:
  //  @MyEnum("foo-enum.g.dart", name: "Foo", elements: ["foo", "bar", "baz"])
  //  part "foo-enum.g.dart";
  var targetFileUrl = annotatedFile.resolve(annotation.targetUrl); // foo-enum.g.dart
  var targetFile = context.file(targetFileUrl); // A `File` object, but not necessarily one from `dart:io`.
  if (targetFile.existsSync()) {
    return context.close(noFilesWritten: true); // Avoid warning for not writing any files.
  }
  var index = 0;  
  var name = annotation.name;
  var elements = annonation.elements;
  targetFile.writeAsStringSync("""
    class $name {
       ${[for(var element in elements) 
            "static const ${element} = $name._(${index++});"}].join("\n")}
      static const List<$name> values = [${elements.join(", ")}];
      final int index;
      const ${annotation.name}._(this.index);
    }
  """); // Written "synchronously" but not commited until calling `context.close()`
  context.close();         
}

If a macro script ends (isolate terminates) without it calling context.close, then it's assumed to have failed. That causes a compile-time error, and none of its files become visible to other compilation phases.

If the macro calls close without writing any files, it causes a warning to be printed, unless it passes noFilesWritten: true to say that it's deliberate.

Parallelism

The compiler tries to parallelize macros as much as possible. Macros are run in turns, in topological dependency order. When a program has been read as far as possible given the available source files, than all its macro annotations are found.

Then each of these macros are compiled as well, transitively (if necessary, if the macro has been seen before, it might already be ready to run).

Then the compiler checks if this compilation has added any files needed by the current program (possible, but unlikely, and probably something we should warn about if it happens. If there are no cyclic dependencies between scripts, then it's odd that a file needed by the current program is also needed by something it depends on).

Then all the macro annotations are processed by running their associated macro. These macros only see the files as they existed before this step. They can, and will, run in parallel. If two macros try to write the same file, it causes a compile-time error when they have both closed.

After this step (or during, as soon as files are available), the compiler parses the newly generated files that are imported by the existing program, and incorporated them into the program. If they contain new macro annotations, those are scheduled for the next step.

When the first turn is completely done, the next turn starts running all new annotations added in the files generated by the first turn. And so on. (Should we start earlier, immediately when one macro completes, scan its files, add just those to the available files for its new macro invocations?)

Algorithm

To compile a script:

Read main library.
Read transitively imported libraries. Some files might be missing.
Scan them for annotations which extends Macro from dart:macros (which has a constructor taking a String as argument which is the URI of the macro script). Resolve and create those constants, if possible. If any argument contains an unresolved identifier, delay trying to resolve it until the next iteration.
For each macro URI, compile it as a script. Then run it to generate new files, giving it access to only the files existing at step 2.
If any new files have been created, goto 2. If the generated files have introduced declarations which would change the resolution of existing macro annotations, it's a compile-time error.
Otherwise the program is considered complete, compile it as normal. Any missing files or unresolved identifiers is now a compile-time error.
The compiler writes statistics telling which macros used how much time.

To run a macro:

Compile its script (if not already done).
Create a new isolate which includes an implementation of MacroContext from dart:macros (which can be a thin wrapper around calls into to the compiler), then call the main of the macro script with the location of the annotation to process and the macro context implementation.
The code can introduce new files using context.file(...) or similar API.
When the code class context.close, the macro shuts down and commits the files written so far to the compiler.
If the macro isolate ends without calling context.close, then it's a compile-time error. The isolate treats uncaught errors as fatal by default.
If the isolate manages to hang, not exiting and not closing, then the compiler will kill it after "some time" and report it as a compile-time error. A long-running macro can call context.keepAlive() occasionally to tell the macro system that it knows that it's slow and it wants more time.

Reusable macros

The API allows a list of locations to be passed to the main script of a macro. A macro annotation might be able to request handling multiple annotations, if so, if a step contains more than one occurrence of the same macro, and that macro has passed a number larger than one to the Macro constructor's optional {int maxTasks = 1} parameter, it can gets than one annotation URL pass to it.

The Macro Annotation

/// Class that macro triggering annotations must *extend*.
abstract class Macro {
  /// URI of the macro script.
  final String macroScriptUri;
    
  /// Maximal number of annotations given to each instance of the macro.
  ///
  /// Defaults to one, meaning that each macro instance processes 
  /// just one annotation.
  /// Having the same macro instance process multiple occurrences of 
  /// its macro annotations reduces the number of separate isolates being run,
  /// but may increase latency due to the lower parallelism. 
  /// Fast macros that occur often are most likely to benefit from increasing 
  /// this number.
  final int maxTasks;
  
  /// A specific file that this macro is intended to generate.
  ///
  /// If this file *already exists*, the macro script is not compiled and run.  
  /// If the macro generates this file, it is written to disk (if possible) and
  /// retained for later compilations.
  final String? permanentFile;  
    
  const Macro(this.macroScriptUri, {this.maxTasks = 1, this.permanentFile}) : assert(maxTasks > 0);
}

Example macro annotation:

class MyEnum extends Macro {
  final String name;
  final List<String> elements; 
    
  MyEnum(String partFile, this.name, this.elements) 
      : super("package:my_enum/macro/my_enum_macro.dart", permanentFile: partFile);
    
  String targetFile => permanentFile!;  
}

The text was updated successfully, but these errors were encountered:

rrousselGit · 2021-09-20T12:57:34Z

That sounds interesting.

Am I correct in thinking that this proposal is complementary to https://github.com/jakemac53/macro_prototype? As in this describes a low level API, and the prototype describes a simplified API

Or is this a separate proposal with different constraints?

If a macro script ends (isolate terminates) without it calling context.close, then it's assumed to have failed. That causes a compile-time error, and none of its files become visible to other compilation phases.

I'm sure there's a good reason for this, but what is it?
I would naturally expect the code that invokes our macro function to be able to perform the "close" when the isolate stops.

lrhn · 2021-09-20T13:02:13Z

I'd say this is basically a strawman, so that I can ask "how is this-or-that better than just using a file?" :)

For the required close call, I do worry about detecting errors early. A macro hanging (or just dying due to an asynchronous deadlock) may mean that we only have half the files we need. Or we might even end up with partially written files (depending on the API). The error message for that is not going to be helpful, so I want to force the macro to report that it has successfully completed. (And also, if it doesn't add any files, it has to say that it wasn't a mistake).
Doesn't catch all errors, but it makes some particularly hard-to-detect error cases easier to spot and explain to users.

rrousselGit · 2021-09-20T13:32:23Z

I see thanks!

As a potential challenge to this approach, how would we deal with expression macros? jakemac53/macro_prototype#29
A part of me really want the ability to use macros within function bodies such that we can write things like:

fn() {
  final value = @macro(expression);
  final value2 = @macro(expression2);
}

But with a file based approach, and especially the "a file cannot be changed after addition" constraints, I don't see an obvious way to update the function

In particular, I wish to be able to define:

@provider
State another(...) {}
@provider
Again oneMoreTime(...) {}

@provider
int $count(CountRef ref) {
  State value = @watch(another);
  num value2 = @watch(another.select((Again a) => a.number));
  @listen<State>(another, (State value) {
    print('another changed $value');
  });
}

and at compilation transform it into:

@provider
int $count(CountRef ref) {
  State value = ref.another.watch();
  num value2 = ref.oneMoreTime.select((Again a) => a.number));
  ref.another.listeners1 = (State value) {
    print('another changed $value');
  };
}

// generated
final count = Provider((ref) => $count(ref));

class CountRef {
  get another;
  get oneMoreTime;
  ...
}

But it's not obvious how part would allow us to achieve that.

Levi-Lesches · 2021-09-20T14:22:19Z

@lrhn, aside from using main instead of an abstract Macro.generate function, and this proposal including more concrete implementation details, is this much different than #1565 in terms of user experience?

lrhn · 2021-09-20T16:06:19Z

Using main gives me one of the goals I had with this: To keep a very strict separation between code being run by the macro and the code of the program using the macro. The two are fundamentally separate, one does not yet compile while the other is running. And the macro implementation code should not exist at all in the resulting program.

Having the macro implementation code exist on the same object use as annotation means that that separation becomes much harder. We'll need special rules to recognize macro annotations and treat them differently from other annotations. My approach does not treat them differently, they're just normal annotations. They get recognized by the tool (the compiler), but that's what's annotations are for.

I also try to define which environment the macro runs in, without inventing a completely new kind of execution mode. It runs as a fresh isolate, same as every other isolate. You can test it by simply importing the library and calling main with a string list and mock macro context.

As for user experience, possibly not that much difference. The goal was to actually specify a consistent model for source and execution, keep it simple (no need to add lots of new language features just to support it, the ones suggested are generally useful features), keep it unrestricted (don't restrict the code you run in the macros, don't try to prevent errors - just detect them).

And no plans for expression macros. I am wondering if there is a way to apply a "mixin"/wrapper to other declarations, which is again something which could be generally useful.

Levi-Lesches · 2021-09-20T18:11:55Z

Having the macro implementation code exist on the same object use as annotation means that that separation becomes much harder. We'll need special rules to recognize macro annotations and treat them differently from other annotations.

Would it be possible to detect an annotation is a macro because it extends the Macro abstract class (or a subclass of that), in the same way that await only applies to Futures and await for only works on streams?

lrhn · 2021-09-21T10:25:47Z

Since macro detection happens at compile-time, we can do whatever the compiler can do. For a constant object, it can certainly check whether the annotation implements a specific Macro interface.

It can even check whether the runtime type extends the Macro class, not just implements it. That's what I require here, and it's actually stricter than the type checking we do for await. The reason that I require a subclass, not just an implementation, is that it ensures that the "macro script URL" is not a getter, so the compiler is guaranteed to be able to read the URL out of the constant without running user code.

Heck, we could even make the macro trigger pragme("macro", ...) and define:

class Macro extends pragma {
  Macro(...) : super("macro", {...});
}

as a shorthand. The compiler can figure that out.
However, it requires more processing, and it requires resolving types, not just names, before recognizing macro annotations, which is an issue the current specification has: It allows an annotation triggering a macro before resolving types (the first of the three steps), which means it needs something special to resolve macro annotation types before resolving other program types. That's not necessarily a problem, but it needs to be specified how you recognize a macro annotation. (I think the easiest approach is to not use the same syntax for macros as for annotations, because they are not the same anyway, then the special-casing isn't surprising.)

That await only works for futures just means that the operand must implement Future<X> for some X, which means being any subtype of Future<dynamic>. It's just like await is a function with type T await<T>(FutureOr<T> value). The actual future can be a user-type which has no relation to the platform _Future implementation class. Similarly for await for, the stream operand must implement Stream<dynamic>, but can be any class doing so.

munificent · 2021-09-21T22:42:57Z

This is a really interesting strawman and I'm glad we're exploring how we could make static metaprogramming simpler and more easily implementable.

I'm definitely interested in some kind of layered approach where static metaprogramming is a combination of:

A set of lower-level file-based language features that let you compose a Dart program out of a mixture human-authored and generated files. Think "partial classes" or more powerful part files. That's your "language extensions" here.
A macro system that detects some annotation-like syntax on declarations, finds some associated imperative Dart code that defines what those annotations do, executes them, and produces some files for 1.

I like what you have here for 1 (the easy part), but I'm not sold on 2. A couple of specific concerns:

Introspecting on incomplete libraries

If any new files have been created, goto 2. If the generated files have introduced declarations which would change the resolution of existing macro annotations, it's a compile-time error.

This touches on one of the core challenges. Before static metaprogramming has run, you have some human-authored code that may refer to code that hasn't been generated yet. How do you handle it gracefully?

Here, you only talk about incomplete code in the macro annotations themselves, but macros also need to introspect over the in-progress program and the results of that introspection also need to be well-defined. You briefly mention:

Parse and perform (preliminary) type inference on any existing library, providing a type-annotated AST.

But I think this is pretty hard and probably needs more attention. Consider:

import 'generated_by_macro.dart';

class A {
  var foo = 'field';
  var bar = foo;
}

Here, it is possible to infer bar to have type String before the macro runs and generates that file. But the generated file could end up containing:

var foo = 123;

So now the previous preliminary type inference has changed.

We might be able to extend what you said above and say it's a compile-time error if a macro generates code that changes the resolution of any identifiers, but that's probably too brittle too. A user might intend for the macro to generate code that shadows some other code. Or they might get themselves into that state by what should be a non-breaking refactoring, but would become breaking if the macro system conservatively makes it an error.

Macro application syntax

The goal with static metaprogramming isn't to allow users to completely seamlessly extend the Dart language in new and unforeseen ways, but we do want uses of it to look fairly graceful. My bellwether for doing this feature right is if users can use it to define data classes using macros without needing built-in language support.

Given that data classes are a native feature in Kotlin, our benchmark is pretty high. I don't think users would think we "solved" data classes if they had to write:

// my_library.dart:

@Data("my_library_some_data_type.g.dart")
class SomeDataType { ... }

@Data("my_library_another_thing.g.dart")
class AnotherThing { ... }

In particular, requiring a unique URL in every macro application means macro authors can't provide simple "copy and paste this into your app" examples on how to use the macro.

This just looks like exposing too much plumbing to me, but maybe I'm missing something.

lrhn · 2021-09-22T09:39:45Z

@munificent
ACK. The type of bar changing (it won't in this example, but would if bar was defined in class B extend A) is a problem.

One solution, which has been discussed before during the conditional imports design ~~phase~~years, is to have part files which are not allowed to introduce new top-level names. It's patch files, essentially. They can introduce members into existing (partial) classes, or provide implementation for existing external declarations, but they cannot introduce top-level names that weren't there before. If we say that initially missing part files cannot introduce new top-level names, then ... it's probably useless. It definitely won't work for generated libraries, and it won't allow you to generate the builders of built_values.

It's also very dangerous to say that a generated file will somehow have lower precedence than an implicit this. reference. That means that pre-generating the file (which I do want to be possible) will change the meaning. Not great.

Another solution is to treat unqualified identifiers that are not in the lexical scope as unknown as long as a library has any missing parts (which includes any imported libraries having missing parts). The unknown is what we would already do with unresolved identifiers that may be filled in by a later generated file, similarly to class A { var foo = toBeGenerated; } being partially anayzed.
It is annoying, because it would mean you would have to write var b = this.foo; if your macro needs the type of b (but we also do have a lint for type-annotating public API, so it's not a big step).
It's probably fair since even if we don't introduce a top-level declaration with the same name, later code generation might introduce a new superclass member with that name and a more specific type:

import "c.dart" show C;
class A {
  final num foo;
  A(this.foo);
}
class B extends C /* which extends A */ {
  var bar = foo;
}

Then c.dart generates a new part which introduces int get foo; to C and that changes the type of bar as well.
That would also apply to var b = this.foo, so it's not even only unqualified identifiers, it's also instance members.

For the macro application syntax, I think we can do things to help it, like allowing it to request all annotations of the same macro in one library to be passed to the same macro execution. That would allow it to have a default name, so you could just write:

// my_library.dart
part "my_library.data.g.dart"; // or whatever the default would be.

@Data() 
class SomeDataType { ... }

@Data()
class AnotherThing { ... }

Then the data-class macro would be invoked with both annotations (because the Data class's super constructor invocation includes something like perLibrary: true), both with a null target name since no argument is passed, which means it uses the default of currentLibraryUrl.replaceFirst(RegExp(r".dart$", ".data.g.dart")).

So, I think that's something that can be addressed, while also being usable.
(I'm a little worried about how much "dispatch logic" I'm building into the Macro annotation, though. Maybe it needs some kind of groupBy functionality, but I want to avoid running user code on the annotation object.)

Levi-Lesches · 2021-09-24T00:06:56Z

Then the data-class macro would be invoked with both annotations

Would it be simpler if the macro was called once per annotation, and all the outputs of macro annotations within the same file are collected by the compiler and put into the same file? So you'd have Data(SomeDataType), Data(AnotherThing), and any other annotation within my_library.dart output to some temporary location, which are then colelcted and put into my_library.g.dart.

lrhn · 2021-09-24T07:55:32Z

"Collecting" source sounds tricky. I guess you can collect all the import/part declarations first and the member declarations afterwards, then check for duplicates. It still means that the generator is not writing one file, entirely controlled by itself. It's writing "Dart snippets" that will be combined by someone else into a file. The generator doesn't control the scope. If it wants to inject a int _fooHelper(String x) => ...; helper function, it needs to know that it only happens once.

I'd rather tell the macro code how much it needs to generate, then let it create the one file (or more files, if necessary). If it needs parallelism, it can spawn its own isolates as needed.

munificent · 2021-10-26T22:07:24Z

To be able to do so, the macro has to call a compiler to parse the class definition (string parameter) which returns AST.

This is the sticking point. In most of the use cases we have, the macro doesn't just need to see the class definition as a raw AST (pure syntax), it needs to see it in actually resolved form so that it can introspect over the types of fields and methods, walk the superclass hierarchy, etc.

davidmorgan · 2024-06-27T09:20:23Z

This looks obsolete, if not please reopen. Thanks!

lrhn added feature Proposed language feature that solves one or more problems static-metaprogramming Issues related to static metaprogramming labels Sep 20, 2021

davidmorgan closed this as completed Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File based metaprogramming #1864

File based metaprogramming #1864

lrhn commented Sep 20, 2021

rrousselGit commented Sep 20, 2021

lrhn commented Sep 20, 2021 •

edited

Loading

rrousselGit commented Sep 20, 2021 •

edited

Loading

Levi-Lesches commented Sep 20, 2021

lrhn commented Sep 20, 2021

Levi-Lesches commented Sep 20, 2021

lrhn commented Sep 21, 2021

munificent commented Sep 21, 2021 •

edited

Loading

lrhn commented Sep 22, 2021

Levi-Lesches commented Sep 24, 2021

lrhn commented Sep 24, 2021

munificent commented Oct 26, 2021

davidmorgan commented Jun 27, 2024

File based metaprogramming #1864

File based metaprogramming #1864

Comments

lrhn commented Sep 20, 2021

Dart File-Based Macros

Goals

Fundamentals

Language extensions

Macros

Parallelism

Algorithm

Reusable macros

The Macro Annotation

rrousselGit commented Sep 20, 2021

lrhn commented Sep 20, 2021 • edited Loading

rrousselGit commented Sep 20, 2021 • edited Loading

Levi-Lesches commented Sep 20, 2021

lrhn commented Sep 20, 2021

Levi-Lesches commented Sep 20, 2021

lrhn commented Sep 21, 2021

munificent commented Sep 21, 2021 • edited Loading

Introspecting on incomplete libraries

Macro application syntax

lrhn commented Sep 22, 2021

Levi-Lesches commented Sep 24, 2021

lrhn commented Sep 24, 2021

munificent commented Oct 26, 2021

davidmorgan commented Jun 27, 2024

lrhn commented Sep 20, 2021 •

edited

Loading

rrousselGit commented Sep 20, 2021 •

edited

Loading

munificent commented Sep 21, 2021 •

edited

Loading