Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a custom steps API to run logic on mark #1774

Merged
merged 12 commits into from
Mar 2, 2021

Conversation

sbomer
Copy link
Member

@sbomer sbomer commented Jan 22, 2021

This adds a new API which lets custom steps register an action to be called when a member gets marked.

Older notes below - but note that the design has changed from a pipeline that runs per-assembly to a registration-based model.


These steps are initialized once when the pipeline processing begins, and they get called for each marked assembly.

This includes a change that allows per-assembly custom steps, and makes SubStepDispatcher a per-assembly step. The per-assembly custom steps do not currently support insertion at arbitrary points in the pipeline. They are run in the order added. (Do we need to support arbitrary insertion?)

Steps which use SubStepsDispatcher will no longer be called on the entire closure of loaded assemblies.

#1666 will turn more of the built-in steps into per-assembly steps, but here the goal is just to make the API available.

The per-assembly step initialization happens before any of the normal pipeline steps run. (A per-assembly step could in theory load assemblies and operate on the world at this point, but the expected use is just to do some simple setup of global state.)

The --custom-step option still exists, but with #1666, "global" custom steps will no longer get access to all assemblies, unless they do the work to load assemblies explicitly, or run after MarkStep (in which case they will see everything referenced from used code and possibly more).

Contributes to #1735.

src/linker/Linker.Steps/IPerAssemblyStep.cs Outdated Show resolved Hide resolved
namespace Mono.Linker.Steps
{

public interface IPerAssemblyStep
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs much better name. Looking at the code this runs during Mark step only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestions? I tentatively changed it to IMarkAssemblyStep.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the registration-based model I instead called it IMarkHandler

src/linker/Linker.Steps/SubStepsDispatcher.cs Outdated Show resolved Hide resolved
src/linker/Linker/Driver.cs Outdated Show resolved Hide resolved
/// Extensibility point for custom steps that run during MarkStep,
/// for every marked assembly.
/// </summary>
public interface IMarkAssemblyStep
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the right step forward. You are adding ondemand step in the way it's not very clear to me how it should be used. I'm also not sure OnAssemblyMarked is actually the point where the custom steps would like to hook up. I'd expect them to ignore assembly-mark completely and hook up into OnTypeMarked or OnMethodMarked instead.

Secondly, does this also mean there is no way to actually mark assembly when it's loaded via custom step?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current thinking:

  • We want the ability for custom steps to react to marking of certain things, for starters assemblies, but eventually also types and maybe even methods.
  • Each item type will need its pipeline at least internally - we simply need a list of custom steps to run for each item type - for example for each assembly. Whether we explicitly expose the notion of pipelines or not is currently a secondary concern (I think we might as well)
  • To not hurt perf, if a custom step wants to react to only assemblies, it should not need to in any way subscribe to type/methods. SubStepDispatcher solves this by making all sub-steps implement everything, but then it uses an enum to tell the system which of these the step is interested in. I would choose a more declarative approach. Have an interface for each, so (names obviously pending discussion):
    • IMarkAssemblyStep
    • IMarkTypeStep
    • IMarkMethodStep
  • If we want to we could also create base interface IMarkStep which would have the Initialize method and all of the above would "inherit" from it.
  • Each interface would have basically one method, something like OnAssemblyMarked/OnTypeMarked/OnMethodMarked. The semantics is rather obvious.
  • Note that there's a difference between loading and assembly and marking an assembly - steps are allowed to load assemblies as they please (they can also freely call .Resolve()) and they are even allowed to mark additional assemblies. Once assembly is marked and the MarkStep gets to it, it would call all custom steps implementing IMarkAssemblyStep and "notify" them.
  • As discussed in other PRs, marking in custom steps would be always deferred (using Annotations and actual marking happens once the MarkStep gets to it).
  • I think we're doing OnMarkAssembly first because we think we'll need it to implement some of existing steps in the linker. Once we start porting over Xamarin steps we should definitely consider adding OnMarkType or maybe even OnMarkMethod (or maybe OnMarkMember ? not sure) - I think assembly and type should suffice for .NET 6, but we'll see.

Other considerations:

  • Existing IStep infra would remain basically intact. The main difference is that it won't automatically get the entire assembly closure. It could load the closure on demand, but it won't ever be guaranteed to be complete (that's why we're doing Enable processing of dynamically referenced assemblies #1666) if the step us running before MarkStep.
  • Even the new custom steps could in theory load the closure aggressively, but it's not recommended for perf reasons. That said, linker won't react to assembly loading as an event with anything. The only event which has meaning is marking an assembly.
  • "marking an assembly" in the above means that linker will preserve the assembly in some form in the output (trimmer or not). We will need to make sure that the OnMarkAssembly gets called for all assemblies written to the output (I don't know if today we internally actually mark copy assemblies for example, if not we should fix that).
  • There will be certain overlap between the "preserve" mechanics in annotations (SetPreserve and AddPreservedMethod) and the OnMarkType. It will be possible to implement the preserve semantics with OnMarkType and simple marking. I don't think this is inherently bad, just two options which will end up doing the same thing.
  • We could allow one step to potentially implement more than one such interface - they would all share the same instance (we would ever only create one instance of the custom step), but it could belong to multiple pipelines (this gets a bit tricky with ordering though, not sure yet)

Copy link
Member Author

@sbomer sbomer Jan 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pretty much agree with Vitek's thoughts - I see us moving in a direction where custom logic will just extend the marking logic by adding conditional dependencies like "source -> destination" which means "if source is marked, destination should be marked". The current mechanism for this is AddPreservedMethod, which requires custom steps to discover the "source" types by scanning for them eagerly, but in the future extension points like IMarkTypeStep/OnMarkType would be better since they avoid an extra scan of unmarked types.

IMarkAssemblyStep (or whatever we name it) would let most existing steps work with little to no modification, even when we add lazy loading, as long as they only use AddPreservedMethod/SetPreserve. The scope of the scanning will be reduced to marked assemblies only.

Any steps which need to scan for types to Mark unconditionally (rather than conditionally AddPreservedMethod/SetPreserve) will have two options:

  • If it is enough to discover types in marked assemblies, they can use IMarkAssemblyStep without further modification
  • If they need to discover types in otherwise unmarked assemblies, they can continue using IStep and look at the loaded closure (or explicitly load it, when we turn on lazy loading).

I don't have a strong preference on whether the hooks should be declared using interfaces or an enum similar to SubStepTargets. Either way, if we allow one custom step to hook into multiple events, we need to decide how to handle the filtering capability of ISubStep. The IsActiveFor check is nice because it currently prevents unnecessary calls to the substep for types/members/etc for assemblies where it's not active. With IMarkAssemblyStep it just works the same as it does today. But if we add, say IMarkMethod, then we would want to avoid calling the custom step method logic for a marked method inside of an assembly for which it's not active. Essentially some of the SubStepDispatcher logic would become built-in to MarkStep or the linker pipeline. I think just adding IMarkAssemblyStep is a good way to make progress, while leaving that possibility open.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two needs which might be better to handle separately.

  1. Run additional processing after assembly load

It's something that is needed primarily by linker steps. I'm not convinced we need to expose this functionality publicly at this point to custom steps therefore this can be implemented as simple
method calls. The important part is that this logic can be triggered from any step.
Something like this might be enough.

class AssemblyResolver
{
    public override AssemblyDefinition Resolve (AssemblyNameReference name, ReaderParameters parameters)
    {
        // some existing code which loads assembly

        // This could be even refactored into single class, etc.
        BlacklistStep.Process (context, assembly);
        LinkAttributesStep.Process (context, assembly);
    }
}
  1. Conditional Marking

Many of the existing custom steps include logic which marks additional code based on a specific
pattern. This API should be exposed but I think we need to validate it first by trying to
rewrite a few of the existing custom steps. I'd like to see API along following the quick draft.

public interface IMarkStepProcessing
{
    void Initialize (LinkContext context, MarkProcessingRegistration register);
}

public class MarkProcessingRegistration
{
    public void AsemblyProcessed (Action<AssemblyDefinition> action);
    public void TypeProcessed (Action<TypeDefinition> action);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should try to convert the steps which linker does always into non-step code - just like we've converted const prop and basically all of XML loading. Maybe there will be something which would still make sense as a "step" - we'll see as we go through these.

I also agree that it sounds like a good idea to try to convert a couple of Xamarin steps and design the new extensibility along the way.

  1. I would like to avoid linker doing almost anything on assembly load. I think the current idea is that linker would do nothing eagerly. LinkAttributes will be loaded on demand when somebody asks for linker attributes on something from an assembly which didn't load them yet. Substitutions will be loaded also on demand once the constant prop touches something from a new assembly. Descriptors should only be processed when the assembly is marked so it will be called from the MarkAssembly (or nearby). There might be some tweaks to this with regard to copy assemblies and their dependencies, but hopefully not.

  2. This looks good - I'm not sure I like the word "processed", but that's naming - and I suck at it 😉

Base automatically changed from master to main February 4, 2021 22:14
@sbomer
Copy link
Member Author

sbomer commented Feb 22, 2021

I made some local changes to test xamarin-android with the new registration-based API: dotnet/android@master...sbomer:illinkMarkHandler. In my analysis of the custom steps, most of them "just work" if we call them only for marked members, because they ultimately either:

  • Just call AddPreserved* to conditionally keep members of types (which with the new API will just happen when the type is marked)
  • In the case of FixAbstractMethodsStep, inject missing interface implementations which will only be kept if they are marked for some other reason.

Most of the steps have been updated in a straightforward way based on the above.

The exceptions are:

  • ApplyPreserveAttribute: this step globally marks members with PreserveAttribute.
    • The step is only active for non-SDK link assemblies. Usually we root non-SDK assemblies, but it will be a problem if linking all assemblies.
    • For now, I updated it to use the new custom step API on assembly mark - so it will scan for the attribute in all marked assemblies - but we should investigate whether this is good enough.
  • PreserveExportedTypes: similar to the above, this globally marks members with Export*Attribute. It's used for java interop in cases where the java methods aren't referenced statically - like from xml, or for special serialization methods.
    • It's only active for non-SDK assemblies, so like above it will be a problem only if linking all assemblies.
    • Like above, I've made it scan marked assembies.
  • PreserveApplications: globally scans for ApplicationAttribute on types/assemblies, and sets the TypePreserve for any types referenced by the attribute.
    • This step technically does a global scan, but it's likely to work on marktype/markassembly - since presumably, ApplicationAttribute is only used on types/assembies that are already kept.
    • I've updated it to use the new customstep API based on this assumption.

@sbomer sbomer changed the title Add per-assembly pipeline Add a custom steps API to run logic on mark Feb 22, 2021
src/linker/ref/Linker.Steps/MarkContext.cs Show resolved Hide resolved
Comment on lines 13 to 15
protected MarkHandlerDispatcher (IEnumerable<ISubStep> subSteps) => throw null;

public void Add (ISubStep substep) => throw null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, do we have any use case for this? It seems to me this is a more complicated version of MarkContext methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a convenience helper for steps that want to scan for all members (not just marked members) in marked assemblies. It is basically a copy of SubStepsDispatcher that works only for marked assemblies, to make it easier for existing substeps to transition - this way they don't have to re-write the code that scans all members in an assembly.

It's really only used for ApplyPreserveAttribute and PreserveExportedTypes (here) - so maybe it's not worth keeping around.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather see the public interface simple. Looking quickly at the XA code is seems that both are not needed

ApplyPreserveAttribute - that's normal step that should just run before MarkStep, it does not need any of the changes
PreserveExportedTypes - should run only when a type is marked

Copy link
Member Author

@sbomer sbomer Feb 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ApplyPreserveAttribute runs before MarkStep it won't see the reference closure and it won't run for dynamically referenced assemblies - is it safe to look for the attribute only in root assemblies? (Do we not care about the link-all case?)

My understanding of ExportAttribute (from https://docs.microsoft.com/en-us/xamarin/android/platform/java-integration/working-with-jni#exportattribute-and-exportfieldattribute) was that it is used on members which are reachable from java interop, but maybe that link doesn't tell the whole story. What is it that marks the required types?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, at a closer look, they need to run once assembly is marked. Thinking about it more it's probably fine to have both approaches available.

Few suggestions though.

  • I'd rename this type to be MarkSubStepsDispatcher to make it clear this type should be used to batch all custom mark substeps.
  • Ensure MarkHandlerDispatcher.Initialize is always called (the user type can implement explicitly IMarkHandler again).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the second suggestion - not sure if this is what you meant, but I changed Initialize to be a virtual method so that the user type can call base.Initialize.

src/linker/Linker/Driver.cs Outdated Show resolved Hide resolved
sbomer added 11 commits March 1, 2021 12:33
These steps are initialized once when the pipeline processing begins, and
they get called for each marked assembly.

This includes a change that allows per-assembly custom steps, and makes
SubStepDispatcher a per-assembly step. The per-assembly custom steps do
not currently support insertion at arbitrary points in the pipeline.
They are run in the order added.

Steps which use SubStepsDispatcher will no longer be called on the
entire closure of loaded assemblies.
- Rename IPerAssemblyStep to IMarkAssemblyStep
- Keep SubStepsDispatcher
- Add test for IMarkHandler
- Add missing license header
- MarkHandlerDispatcher -> MarkSubStepsDispatcher
- Make Initialize virtual so that it can be called by derived types
Copy link
Contributor

@marek-safar marek-safar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more comments and LGTM

src/linker/Linker.Steps/MarkSubStepsDispatcher.cs Outdated Show resolved Hide resolved
substeps.Add (substep);
}

public void Initialize (LinkContext context, MarkContext markContext)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be virtual (in ref code as well)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually made it virtual this time... thanks.

src/linker/Linker.Steps/MarkSubStepsDispatcher.cs Outdated Show resolved Hide resolved
- Always initialize substep lists
- Actually make Initialize virtual (and test this)
- Remove default ctor and Add method
@sbomer sbomer merged commit 40abde7 into dotnet:main Mar 2, 2021
jonpryor pushed a commit to dotnet/android that referenced this pull request Jun 2, 2021
Context: dotnet/linker#1953
Context: dotnet/linker#1774
Context: dotnet/linker#1774 (comment)
Context: dotnet/linker#1979
Context: dotnet/linker#2019
Context: dotnet/linker#2045
Context: dotnet/java-interop@2573dc8
Context: #5870
Context: #5878

Update the custom linker steps to use the new `IMarkHandler` API
which runs logic during "MarkStep".

(See [this list][0] of pipeline steps for additional context.)

As part of this, I've removed the workaround that loads all referenced
assemblies up-front and simplified some of the linker MSBuild targets.
Some of the `MonoDroid.Tuner` steps were duplicated and changed to
implement the new `IMarkHandler` interface, to avoid touching the
`MonoDroid.Tuner` code.

In my analysis of the custom steps, most of them "just work" if we
call them only for marked members, because they ultimately either:

  - Just call `AddPreserved*()` to conditionally keep members of types
    (which with the new API will just happen when the type is marked)

  - In the case of `FixAbstractMethodsStep()`, inject missing interface
    implementations which will only be kept if they are marked for some
    other reason.

Most of the steps have been updated in a straightforward way based on
the above.

The exceptions are:

  - `AddKeepAlivesStep` needs to run on all code that survived
    linking, and can run as a normal step.

  - `ApplyPreserveAttribute`: this step globally marks members with
    `PreserveAttribute`.
    - The step is only active for non-SDK link assemblies.  Usually we
      root non-SDK assemblies, but it will be a problem if linking all
      assemblies.
    - For now, I updated it to use the new custom step API on assembly
      mark -- so it will scan for the attribute in all marked
      assemblies -- but we should investigate whether this is good
      enough.
    - This can't be migrated to use `IMarkHandler` because it needs
      to scan code for attributes, including unmarked code.

  - `PreserveExportedTypes`: similar to the above, this globally marks
    members with `Export*Attribute`.  It's used for java interop in
    cases where the java methods aren't referenced statically, like
    from xml, or for special serialization methods.
    - It's only active for non-SDK assemblies, so like above it will
      be a problem only if linking all assemblies.
    - Like above, I've made it scan marked assemblies.

  - `PreserveApplications`: globally scans for `ApplicationAttribute`
    on types/assemblies, and sets the `TypePreserve` annotation for
    any types referenced by the attribute.
    - This step technically does a global scan, but it's likely to work
      on "mark type"/"mark assembly", as `ApplicationAttribute` is only
      used on types/assembies that are already kept.
    - I've updated it to use the new `IMarkHandler` API.

Additionally, as per dotnet/java-interop@2573dc8c, stop using
`TypeDefinitionCache` everywhere and instead use `IMetadataResolver`
to support caching `TypeDefinition`s across shared steps.
For .NET 6, `LinkContextMetadataResolver` implements the
`IMetadataResolver` interface in terms of `LinkContext`.

[0]: https://github.com/mono/linker/blob/main/src/linker/Linker/Driver.cs#L714-L730
agocke pushed a commit to dotnet/runtime that referenced this pull request Nov 16, 2022
This adds a new API which lets custom steps register an action to be called when a member gets marked.

This includes a helper, MarkSubStepsDispatcher, which behaves similarly to SubStepDispatcher and is used to scan in marked assemblies.

* PR feedback

- Rename IPerAssemblyStep to IMarkAssemblyStep
- Keep SubStepsDispatcher

* Use --custom-step for both step interfaces

* Change to a registration-based model

* Rename test file

* Cleanup

* Add another handler for marked methods

* PR feedback

- Add test for IMarkHandler
- Add missing license header

* Add a warning to the API docs

* PR feedback

- MarkHandlerDispatcher -> MarkSubStepsDispatcher
- Make Initialize virtual so that it can be called by derived types

* Fix analyzer warning

* PR feedback

- Always initialize substep lists
- Actually make Initialize virtual (and test this)
- Remove default ctor and Add method

Commit migrated from dotnet/linker@40abde7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants