Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[native_assets_builder] Add hook/link.dart // Tree shaking #153

Open
dcharkes opened this issue Oct 5, 2023 · 13 comments
Open

[native_assets_builder] Add hook/link.dart // Tree shaking #153

dcharkes opened this issue Oct 5, 2023 · 13 comments
Assignees
Labels
P1 A high priority bug; for example, a single project is unusable or has many test failures package:native_assets_builder

Comments

@dcharkes
Copy link
Collaborator

dcharkes commented Oct 5, 2023

The hook/build.dart is run before kernel compilation.

We should consider running a hook/link.dart after kernel compilation. In AOT we would have access to tree-shaking information that would enable us to:

  1. Not include an asset at all if it's not referenced from Dart code.
  2. Shrinking the contents of an asset if only a subset of its contents are used.

Use cases:

  1. If a dynamic library (native code asset) is not used at all, don't bundle it.
  2. For localization, we want to bundle a message file, and we want to tree-shake that message file based on what is reachable from the code. (If we would have icons, it would probably follow something similar to locazation.) https://github.com/dart-lang/sdk/blob/main/pkg/compiler/doc/resource_identifiers.md

For use case 2, we want to standardize the annotations and output format for what is reachable, so that any user can implement tree-shaking for any type of resource they want to bundle (e.g. icons, images, localization, nothing is special.)

Flutter tracking issue:

Dart tracking issue:

@dcharkes
Copy link
Collaborator Author

dcharkes commented Oct 11, 2023

Notes from discussion with @mkustermann:

Maybe we should not

  1. conflate user-defines and tree-shaking "inverse meta data", and
  2. run builds after kernel compilation.

Instead, we should consider

  1. having user-defines being fully global (not scoped by only having them visible to dependencies),
  2. doing "build" in order from dependencies to dependents before the Dart build, and
  3. doing "tree-shaking" in reverse order after the Dart build.

Step 2 and 3 would be similar to C++ builds where the build-step is separate from the linking step.

The data-flow in 2 would be as meta-data currently works.
The data-flow in 3 would be in reverse order ("reverse meta data").

A reason to make it work similar to a C++ build is that it would be easier to migrate the logic in Bazel, if Dart were to ever move to the Bazel build system. Also g3 requires custom build rules instead of a Dart script, so staying closer to what other build systems do is beneficial there as well.

This approach should also cover the use case for package:messages. Packages with a data/messages.json would declare it in the the tree-shaking phase, and package:messages would pick up this inverse metadata and output the data assets.

We have multiple options for what file this tree-shaking script should be in:

  • link.dart / tree_shake.dart (next to build.dart)
  • build.dart and introduce a --linking phase.

@mosuem Any thoughts about this approach?

@dcharkes
Copy link
Collaborator Author

dcharkes commented Oct 11, 2023

Having a build phase and tree-shake phase does raise the question what the CLI interface should be for the tree-shake phase.

  • Should both the build phase and tree-shake phase be able to output code assets and data assets? Or should the code and data assets output in the build phase be input to the tree-shake phase?
  • Should the build meta-data also flow to the tree-shake phase? Or should info from build -> tree-shake be a separate data structure? Or should it only be data/code assets? (Data assets can be arbitrary files, so it could contain anything.)

Theoretically, we could do the code tree-shaking also in the tree-shake phase instead of baking it in to the SDK.

@mosuem
Copy link
Member

mosuem commented Oct 11, 2023

What does in reverse order mean?

But generally, I believe this would correspond to the initial idea of having a build_after.dart.

I have no thoughts on whether having a second script is better or worse than different calls to build.dart. Also, there is the still the question of whether the latter would be implemented through a CLI or an implementation of an interface having build(); and tree_shake(); methods.
An advantage to different scripts we discussed before would be the lower overhead of running only the available scripts, instead of running a build.dart with possibly empty instructions.

@dcharkes
Copy link
Collaborator Author

What does in reverse order mean?

Dependencies get the "meta data" / "config" from the dependants. E.g. package:foo depends on package:messages, so the config from package:foo data/messages.json flows to package:messages.

@dcharkes
Copy link
Collaborator Author

dcharkes commented Oct 12, 2023

Notes from discussion with @mosuem:

The interaction between build.dart and link.dart should probably be as follows:

  1. build.dart outputs code/data assets if they should not be tree-shaken
  2. build.dart outputs "meta data" which are keyed on target package link.dart for tree-shaking by that package link.dart
  3. link.dart outputs code/data assets

build.dart meta data output is keyed on target package, and link.dart of the target package gets the meta data.
link.dart output from one package is not visible to other link.dart (We don't see any any situation in which one package's link.dart depends on another package's link.dart.)

BuildOutput should contain Map<String packageName, Object config> linkInput (and BuildOutput should have what it currently already has).
LinkInput should contain List<Object> inputFromBuild in addition to some of what BuildInput already has (target OS, etc.)
LinkOutput should contain data & code assets.

Data assets output by either build.dart or link.dart should be bundled in Dart and Flutter apps and accessible by https://api.flutter.dev/flutter/services/AssetBundle-class.html. TODO: figure out what the different bundles mean in Flutter.

The implementation plan would have multiple stages:

  1. Add build.dart to dart2js and dart2wasm. It can't do anything yet because we don't support js/wasm code assets.
  2. Add data assets to build.dart, add AssetBundle to a dart: library, and consume build.dart data assets in Flutters AssetBundle.
  3. Add link.dart CLI to this package, and introduce a linking phase in both DartDev/FlutterTools for tree shaking.

edit: This plan only works when deferred loading is not considered. Deferred loading means something else in dart2js than in Flutter, which makes it hard to abstract over it.

@mosuem
Copy link
Member

mosuem commented Oct 17, 2023

Sounds good. The deferred loading can be just delegated to the packages own AssetBundle implementation based on a the mapping between loading units and call sites reported from that packages link.dart.

@dcharkes
Copy link
Collaborator Author

Sounds good. The deferred loading can be just delegated to the packages own AssetBundle implementation based on a the mapping between loading units and call sites reported from that packages link.dart.

If I remember correctly, the shaker would output:

  • splitting_info.json contains K OR dart-library-uri OR package-name -> file-name
  • messages_1.json K -> V
  • messages_2.json K -> V

We still need to flash out how that works with the different deferred loading mechanisms in Flutter and dart2js.

  • Putting the actual keys in splitting_info.json is going to blow up that file, which nullifies the benefits of deferred loading.
  • package-name would be preferential, because that would only be one const string. However, I believe flutter splits on dart library. I don't know about dart2js.
  • library uri (dart file name) seems to be a middle ground, but you'd really only want to use that with code-gen. You don't want users to write their library uri inside the dart file the whole time. (With @Natives we default the optional assetId argument to default to library uri to circumvent this issue.)

Did you give this aspect more thought yet @mosuem?

@dcharkes
Copy link
Collaborator Author

Or did we want to depend on "component name"? But also that requires code generation. E.g. a package author does not know what the component name is where their file ends up in when that package is used inside an app that defines a split in components.

@mosuem
Copy link
Member

mosuem commented Oct 17, 2023

If I remember correctly, the shaker would output:

  • splitting_info.json contains K OR dart-library-uri OR package-name -> file-name
  • messages_1.json K -> V
  • messages_2.json K -> V

The shaker would output whatever the package needs to load the component. So for package:messages, the link.dart would would build a MessagesResourcesAssets object using the information from the LinkInput/BuildOutput. This object could then be serialized into the AssetBundle. See the example at the example PR.
In this case, it builds a Map<int, int> from message indices to parts. Probably doing it the other way around, Map<int, Set<int>> parts to set of message indices, is smaller (but has a worse lookup time, so we might want to build the Map<int, int> at startup).

  • Putting the actual keys in splitting_info.json is going to blow up that file, which nullifies the benefits of deferred loading.

I don't think it is such a large file, it should be basically number of used message indices*sizeof(int).

@dcharkes
Copy link
Collaborator Author

Having build.dart specify the linker that an asset is destined for covers our initial use cases.
We can later extend it to have applications overwrite that. #994 (comment)

@dcharkes dcharkes changed the title [native_assets_builder] Tree shaking [native_assets_builder] Add link.dart // Tree shaking Mar 15, 2024
@dcharkes dcharkes changed the title [native_assets_builder] Add link.dart // Tree shaking [native_assets_builder] Add hook/link.dart // Tree shaking Apr 4, 2024
@dcharkes dcharkes added the P1 A high priority bug; for example, a single project is unusable or has many test failures label Apr 4, 2024
@dcharkes
Copy link
Collaborator Author

  • Not include an asset at all if it's not referenced from Dart code.
  • Shrinking the contents of an asset if only a subset of its contents are used.

Both of these can be done in a link hook.

And the first one of these can be done by the Dart/Flutter SDK if all uses of a certain asset ID are const.

@mosuem suggested we should have some kind of enum for BundlingBehavior to control whether the Dart/Flutter SDK throw assets out if their assetId does not occur as a const in the program:

  • alwaysKeep The link hook writer will take care of throwing the full asset out if it is not used based on the tree shaking information
  • onlyKeepIfConstReferenced The Dart and Flutter SDK may throw this asset out if there is no reference to it with a const assetId. (NativeCodeAssets currently do not support an API that enables dlopen with a dynamic asset id, @Native is always const asset id, so NativeCodeAssets always work like this.)

@mkustermann suggested to tie this behavior to what hook the asset is output from:

  • build hooks are onlyKeepIfConstReferenced by default.
  • link hooks are alwaysKeep by default. The link hook writer will have thought about whether the asset needs to be included already.

@mosuem
Copy link
Member

mosuem commented May 24, 2024

I would even introduce two modes instead of onlyKeepIfConstReferenced, for a total of three possible values.

enum Treeshake {
  /// Never treeshake by the SDK, only manually in link hooks.
  never,
  /// Treeshake if `loadBytes` never has a non-const arguments passed
  /// and no const argument matches the asset id.
  safe,
  /// Treeshake even if `loadBytes` has non-const arguments passed, but
  /// none of the known const ones matches the asset id.
  unsafe;
}

When using the third mode, it is entirely possible that assets are treeshaken which are actually used, which would only be noticed at runtime. But in the safe mode, any package in the dependency tree which does a single loadBytes with a non-const argument poisons the treeshaking information, and results in nothing being treeshaken.

+1 on the defaults.

@dcharkes
Copy link
Collaborator Author

safe

And then we should probably add a warning on the stderr or stdout when building and one of the hooks reports assets with safe but there are dynamic uses of loadBytes somewhere in the app. "Warning: your app size might have significantly increased due to ....".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 A high priority bug; for example, a single project is unusable or has many test failures package:native_assets_builder
Projects
None yet
Development

No branches or pull requests

2 participants