Additional Target Hooks RFC #10

Mousius · 2021-07-14T14:47:06Z

This is the an initial RFC for adding additional hooks onto the Target to
allow splitting up some of the compile flow but also unifying the
registration of these additional functions.

This is the an initial RFC for adding additional hooks onto the `Target` to allow splitting up some of the compile flow but also unifying the registration of these additional functions.

Mousius · 2021-07-16T13:47:22Z

CC: @tqchen @jroesch

I think this overlaps with the TE Compiler work, would be good to get your feedback 😸

jroesch · 2021-08-02T20:00:08Z

Thanks for the work on this, after thinking about it for a while I believe the proposed hooks runs counter to goals of the TECompiler and "unified lowering" refactor that we've been working on in pieces. Our design goal is to not allow arbitrary customization of "lowering" but instead seal it behind a stable interface. The lowering process should be a straight forward mapping from TE -> TIR, and then any necessary customization should be possible in resulting passes which are allowed to view the entire program including both the Relay code and TIR code.

The goal is to remove the need for many slightly different code paths through the central parts of the compiler and instead provide standard interfaces that allow targets, and users to customize the compiler without having to touch any of the flows or modify the data structures. The proliferation of customizations in the old compile engine for example makes it nearly impossible to refactor the compiler in its current state, and has lead to the world where we need lots of per-flow customization because the compiler is insufficiently factored into a common prefix of code.

I plan on providing a longer form document on this hopefully this week that argues this more in depth but I believe the tenants should be customization of the compiler happens via passes, passes only can produce IRModules, and all data is stored in the IRModule. Today's compiler requires a little more work to get to this world, but I think we should push in this direction.

Now that doesn't mean the desire to have per target behavior is necessarily bad, but I think instead of overriding the lowering code with arbitrary functions we could provide hooks that allow the insertion of target specific passes at the "correct" time in the compiler flow. In this world the proposed hooks should return a Pass object which can be inserted after the appropriate phases.

Happy to discuss more, and have others such as @electriclilies @csullivan @mbs-octoml

Mousius · 2021-08-03T22:30:58Z

Hi @jroesch,

It'd be great to discuss this further, as there's some interesting points you've raised.

The lowering process should be a straight forward mapping from TE -> TIR, and then any necessary customization should be possible in resulting passes which are allowed to view the entire program including both the Relay code and TIR code.

In the case of library adaptations such as CMSIS-NN, translating through Relay->TE->TIR isn't required as we can do a more straight forward translation directly from Relay->TIR without any of the other pass overheads. In effect, our hook here is a single Pass to run over any marked functions rather than aggregating other Passes so conceptually they're similar.

The goal is to remove the need for many slightly different code paths through the central parts of the compiler and instead provide standard interfaces that allow targets, and users to customize the compiler without having to touch any of the flows or modify the data structures.

I believe we're on the same page here, the eventual goal being to register and use the hooks directly as target properties that can be accessed as part of a standard flow. The hooks proposed here are slightly larger than you envisaged but should allow a standard and straight forward interface for users to customise the flow of the compiler. The initial implementation isn't as clean as I'd like due to the issues you've raised already with the current compile engine.

I plan on providing a longer form document on this hopefully this week that argues this more in depth but I believe the tenants should be customization of the compiler happens via passes, passes only can produce IRModules, and all data is stored in the IRModule. Today's compiler requires a little more work to get to this world, but I think we should push in this direction.

The hooks proposed could in fact be such Passes, going IRModule -> IRModule rather than taking a Relay Function. As stated above, it would be more optimal to be given a IRModule without TIR, just with the Relay attached.

Now that doesn't mean the desire to have per target behavior is necessarily bad, but I think instead of overriding the lowering code with arbitrary functions we could provide hooks that allow the insertion of target specific passes at the "correct" time in the compiler flow. In this world the proposed hooks should return a Pass object which can be inserted after the appropriate phases.

I'm interested to understand how this would appear to users in a way that allows them to compose the compiler passes efficiently, being able to write a pipeline of passes seems more intuitive than providing many hooks to insert at different times? Potentially the hooks should return an array of passes or one aggregate pass to ensure the contract is upheld?

manupak · 2021-08-04T11:52:42Z

Hi @jroesch @Mousius ,

I think having Pass'es (instead of functions) could also work.

I guess what we are after is the ability to assemble and re-use TVM's compilation passes on a target basis.

apache/tvm#8110 (comment)
Continuing the above comment, this also removes various places otherwise devs need to insert custom passes. Therefore, centralizing them closer to the target seems much cleaner to me. Therefore, it becomes straightforward to follow what passes are being run in the lowering for a given target while also not running passes that are not relevant (that are currently run anyway due to them having identity effect on IRModule in the absense of special attr/nodes/vars in the IRModule).

I believe the tenants should be customization of the compiler happens via passes, passes only can produce IRModules, and all data is stored in the IRModule. Today's compiler requires a little more work to get to this world, but I think we should push in this direction.

I think we broadly agrees on this. @Mousius ?

The proliferation of customizations in the old compile engine for example makes it nearly impossible to refactor the compiler in its current state, and has lead to the world where we need lots of per-flow customization because the compiler is insufficiently factored into a common prefix of code.

While we agree that its better to share the many passes that are required, but does it need to mandate a monolith of TIR passes being applied irrespective of the target ? We'd like to hear you thoughts (I think this has a connection to the so-called phases in lower implementation, right ?). While we could codify the requirement, is it absolutely necessary ? Also will it (having target based pass pipeline) preclude us from having the code factored into a common prefix ?

but I think instead of overriding the lowering code with arbitrary functions we could provide hooks that allow the insertion of target specific passes at the "correct" time in the compiler flow.

The arbitrary function mentioned here could be a pass pipeline (and we could make it a requirement by making it a Pass as you suggested). However, it is interesting know if we can define a universally 'correct' time for the flow considering targets beyond CPUs and GPUs (w/o needing to do inverse lowering to recreate high-level abstraction that was forced lowered due to mandatory passes). We would love to hear if you have an alternative suggestion to support target-based lowering

areusch

thanks @Mousius ! I did a thorough review in light of our discussion early last week. summarizing that discussion here for posterity:

Ultimately we have a desire to maintain a single lowering pipeline for AOT, Graph, VM executors
In the unified lowering pipeline, we hope to bring everything into a single IRModule, including the AOT main function. This allows most TIR-based analysis and transformation (e.g. memory planning, TIR-based operator fusion, etc) to be represented as compiler Passes
In order for these passes to have something solid to work against, we generally want to consolidate the number of ways people generate TIR.
However we realize the typical scheduling flow doesn't work for some Targets and agree with the shortcomings discussed here in requiring these Targets to reinvent the entire compliation pipeline. That seems worse than allowing some variance in generated TIR.
Therefore, while ultimately we think all TIR transforms should be Passes, this particular case is essentially an override to the LowerTE pass introduced in the TE-Compiler refactor PR. Therefore, we support it being a hook with a well-defined signature.

rfcs/0010-additional-target-hooks.md

Change-Id: I578145c37f9a10b4c15ed64fa86d6d7c2fade04e

xqdan · 2021-08-24T13:19:05Z

This is a great disscussion here. Actually, we are supporting a DSA with TVM, let me share my practice.

1, We only re-use some of tvm relay or tir passes, less than 10 passes, such storage flatten, we don't need most of tvm passes, keep them in our flow means wasting compilation time.
2, We develop passes, and enhance tvm passes for our target, such as stroage rewrite.
3, We develop a hybrid ir on tir, in which we can do unified memory allocation and schedule insnstruction across operators.

So we are excited to see tvm can do these in main line.

What we want to have:
1, customzation compilation flow for both relay and tir flow.
2, unified ir to view both graph and tir, supporting inter/intra primfunc pass developing.
3, also need to consider which place we should put our target code if we want to upstream in the future.

Mousius

Just coming back to this thread, I believe there’s a way to introduce the hooks in a less intrusive way to the LowerTEPass, by placing it just above it in build_module.cc. This should mean each Target can register a relay_to_tir Pass which is ran there rather than having to wire it via LowerTEPass and any PrimFunc will be skipped as they’re not relay functions anymore.

Any thoughts on this @jroesch / @csullivan / @mbs-octoml? I've updated this RFC from @areusch's comments and added the above proposal.

rfcs/0010-additional-target-hooks.md

mbs-octoml · 2021-08-26T16:17:51Z

One core distinction here is that the tir_to_runtime hook isn't a pass, it's an override of code generation.

Ah, excellent, that's where we've been trying to get to but when we started this conversation the refactoring to support it seemed to much to foist upon you. But we've now (thanks Lily!) extended IRModule to be able to support a mix of Functions and target-specific PrimFuncs, so it is now conceptually possible to run multiple lowing passes in series and have each only focus on lowering the Functions they care about based on the compiler tagging. With luck it's just a matter of working through the usual surprises and glitches.

mbs-octoml · 2021-08-26T16:50:13Z

And I think the just-another-pass approach implies some of the private machinery in te_compliler needs to be hoisted to be reusable for all lowering-like passes. Eg LowerTensorExprMutator. So instead of
monolithic lowering + target-specific callbacks
we have
target-specific lowering passes + built-in lowering pass
which share impl via, hopefully, conventional subclassing.

mbs-octoml · 2021-09-01T17:12:27Z

This LGTM but I'm thinking going back to relay_to_tir being a Function->PrimFunc packed func as you implemented in the companion PR is better since it makes it easier for all the caching, call rewriting, dynamic shape handling and other bookkeeping to be shared. Have you tried out the pass approach and convinced youself that reuse is still possible?

Mousius · 2021-09-06T13:55:53Z

Hi @mbs-octoml, sorry I missed a few replies!

The reason I hoisted it outside of the LowerTE pass is that it effectively bypasses it anyway, LowerExternalFunctions in te_compiler.cc is a massive bypass which starts there and goes around most of TVM before it ends up as a runtime::Module right at the very end. Effectively what's happening is the "default" relay_to_tir is LowerTE and then Targets register their own variants. I think where @jroesch was going with the (IRModule, Function) -> (IRModule, GlobalVar) is that you will want to produce many PrimFuncs to implement one Function and thus you need the entire module, by just introducing this as a Pass we won't have to introduce a new signature specifically for this hook.

mbs-octoml · 2021-09-10T16:19:57Z

(IRModule, Function) -> (IRModule, GlobalVar)
I'm still is favor of this signature since it's a cheep and cheerful way to ensure we don't end up with N ways to implement the lower-and-rewrite-calls machinery embedded in te_complier. I think my earlier question still stands:
Have you tried out the pass approach and convinced youself that reuse is still possible?

Based on the discussions around these hooks, we now have a better idea of how to introduce them into the codebase.

Mousius · 2021-09-22T11:31:40Z

@mbs-octoml / @areusch apologies for the delay, there was a fair amount to tweak in here to reflect all the discussions around this RFC. Please could you take another look as I think this is lined up now 😸

mbs-octoml

LGTM, just some nits for clarity.

We're working on cleaning up the py/cpp build split so your TIRToRuntime hook should indeed be on the unique choke point.

rfcs/0010-target-registered-compiler-flow-customisation.md

Mousius · 2021-09-23T09:38:07Z

Thanks @mbs-octoml, I've incorporated this back into the document 😸

mbs-octoml · 2021-09-23T21:14:50Z

LGTM again (CI will go faster that way, right?)

Mousius · 2021-09-27T10:54:43Z

@areusch is there anything outstanding from you on this RFC? It seems ready to merge after the changes you've requested.

areusch · 2021-09-27T20:20:35Z

thanks for working through this with us @Mousius ! excited to see this land in TVM!

Update target registration and add pass phases

Additional Target Hooks RFC

f37b9bf

This is the an initial RFC for adding additional hooks onto the `Target` to allow splitting up some of the compile flow but also unifying the registration of these additional functions.

Mousius mentioned this pull request Jul 30, 2021

Implementation of relay_to_tir target hook apache/tvm#8423

Merged

Update signatures and integration points for Target Hooks

ba40aa1

areusch requested changes Aug 19, 2021

View reviewed changes

Update hooks definition and lowering process

a90c4af

Change-Id: I578145c37f9a10b4c15ed64fa86d6d7c2fade04e

Mousius commented Aug 26, 2021

View reviewed changes

rfcs/0010-additional-target-hooks.md Outdated Show resolved Hide resolved

rfcs/0010-additional-target-hooks.md Outdated Show resolved Hide resolved

rfcs/0010-additional-target-hooks.md Outdated Show resolved Hide resolved

Re-draw API design for RelayToTIR and RelayToRuntime

6eb2695

Based on the discussions around these hooks, we now have a better idea of how to introduce them into the codebase.

mbs-octoml approved these changes Sep 22, 2021

View reviewed changes

Provide further clarification on hooks and minor text fixes

47aff33

areusch approved these changes Sep 27, 2021

View reviewed changes

areusch merged commit a6f9a25 into apache:main Sep 27, 2021

Mousius deleted the additional-target-hooks branch September 28, 2021 08:51

ccjoechou mentioned this pull request Feb 2, 2022

[RFC][BYOC] Marvell ML/AI Accelerator Integration #48

Open

MichaelJKlaiber added a commit to MichaelJKlaiber/tvm-rfcs that referenced this pull request May 24, 2022

Merge pull request apache#10 from PaulPalomeroBernardo/patch-1

ac4dc76

Update target registration and add pass phases

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional Target Hooks RFC #10

Additional Target Hooks RFC #10

Mousius commented Jul 14, 2021

Mousius commented Jul 16, 2021

jroesch commented Aug 2, 2021

Mousius commented Aug 3, 2021

manupak commented Aug 4, 2021

areusch left a comment

xqdan commented Aug 24, 2021 •

edited

Loading

Mousius left a comment

mbs-octoml commented Aug 26, 2021

mbs-octoml commented Aug 26, 2021

mbs-octoml commented Sep 1, 2021

Mousius commented Sep 6, 2021

mbs-octoml commented Sep 10, 2021

Mousius commented Sep 22, 2021

mbs-octoml left a comment

Mousius commented Sep 23, 2021

mbs-octoml commented Sep 23, 2021

Mousius commented Sep 27, 2021

areusch commented Sep 27, 2021

Additional Target Hooks RFC #10

Additional Target Hooks RFC #10

Conversation

Mousius commented Jul 14, 2021

Mousius commented Jul 16, 2021

jroesch commented Aug 2, 2021

Mousius commented Aug 3, 2021

manupak commented Aug 4, 2021

areusch left a comment

Choose a reason for hiding this comment

xqdan commented Aug 24, 2021 • edited Loading

Mousius left a comment

Choose a reason for hiding this comment

mbs-octoml commented Aug 26, 2021

mbs-octoml commented Aug 26, 2021

mbs-octoml commented Sep 1, 2021

Mousius commented Sep 6, 2021

mbs-octoml commented Sep 10, 2021

Mousius commented Sep 22, 2021

mbs-octoml left a comment

Choose a reason for hiding this comment

Mousius commented Sep 23, 2021

mbs-octoml commented Sep 23, 2021

Mousius commented Sep 27, 2021

areusch commented Sep 27, 2021

xqdan commented Aug 24, 2021 •

edited

Loading