-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional Target Hooks RFC #10
Conversation
This is the an initial RFC for adding additional hooks onto the `Target` to allow splitting up some of the compile flow but also unifying the registration of these additional functions.
Thanks for the work on this, after thinking about it for a while I believe the proposed hooks runs counter to goals of the TECompiler and "unified lowering" refactor that we've been working on in pieces. Our design goal is to not allow arbitrary customization of "lowering" but instead seal it behind a stable interface. The lowering process should be a straight forward mapping from TE -> TIR, and then any necessary customization should be possible in resulting passes which are allowed to view the entire program including both the Relay code and TIR code. The goal is to remove the need for many slightly different code paths through the central parts of the compiler and instead provide standard interfaces that allow targets, and users to customize the compiler without having to touch any of the flows or modify the data structures. The proliferation of customizations in the old compile engine for example makes it nearly impossible to refactor the compiler in its current state, and has lead to the world where we need lots of per-flow customization because the compiler is insufficiently factored into a common prefix of code. I plan on providing a longer form document on this hopefully this week that argues this more in depth but I believe the tenants should be customization of the compiler happens via passes, passes only can produce IRModules, and all data is stored in the IRModule. Today's compiler requires a little more work to get to this world, but I think we should push in this direction. Now that doesn't mean the desire to have per target behavior is necessarily bad, but I think instead of overriding the lowering code with arbitrary functions we could provide hooks that allow the insertion of target specific passes at the "correct" time in the compiler flow. In this world the proposed hooks should return a Happy to discuss more, and have others such as @electriclilies @csullivan @mbs-octoml |
Hi @jroesch, It'd be great to discuss this further, as there's some interesting points you've raised.
In the case of library adaptations such as CMSIS-NN, translating through Relay->TE->TIR isn't required as we can do a more straight forward translation directly from Relay->TIR without any of the other pass overheads. In effect, our hook here is a single
I believe we're on the same page here, the eventual goal being to register and use the hooks directly as target properties that can be accessed as part of a standard flow. The hooks proposed here are slightly larger than you envisaged but should allow a standard and straight forward interface for users to customise the flow of the compiler. The initial implementation isn't as clean as I'd like due to the issues you've raised already with the current compile engine.
The hooks proposed could in fact be such
I'm interested to understand how this would appear to users in a way that allows them to compose the compiler passes efficiently, being able to write a pipeline of passes seems more intuitive than providing many hooks to insert at different times? Potentially the hooks should return an array of passes or one aggregate pass to ensure the contract is upheld? |
I think having Pass'es (instead of functions) could also work. I guess what we are after is the ability to assemble and re-use TVM's compilation passes on a target basis. apache/tvm#8110 (comment)
I think we broadly agrees on this. @Mousius ?
While we agree that its better to share the many passes that are required, but does it need to mandate a monolith of TIR passes being applied irrespective of the target ? We'd like to hear you thoughts (I think this has a connection to the so-called phases in lower implementation, right ?). While we could codify the requirement, is it absolutely necessary ? Also will it (having target based pass pipeline) preclude us from having the code factored into a common prefix ?
The arbitrary function mentioned here could be a pass pipeline (and we could make it a requirement by making it a Pass as you suggested). However, it is interesting know if we can define a universally 'correct' time for the flow considering targets beyond CPUs and GPUs (w/o needing to do inverse lowering to recreate high-level abstraction that was forced lowered due to mandatory passes). We would love to hear if you have an alternative suggestion to support target-based lowering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @Mousius ! I did a thorough review in light of our discussion early last week. summarizing that discussion here for posterity:
- Ultimately we have a desire to maintain a single lowering pipeline for AOT, Graph, VM executors
- In the unified lowering pipeline, we hope to bring everything into a single IRModule, including the AOT main function. This allows most TIR-based analysis and transformation (e.g. memory planning, TIR-based operator fusion, etc) to be represented as compiler Passes
- In order for these passes to have something solid to work against, we generally want to consolidate the number of ways people generate TIR.
- However we realize the typical scheduling flow doesn't work for some Targets and agree with the shortcomings discussed here in requiring these Targets to reinvent the entire compliation pipeline. That seems worse than allowing some variance in generated TIR.
- Therefore, while ultimately we think all TIR transforms should be Passes, this particular case is essentially an override to the
LowerTE
pass introduced in the TE-Compiler refactor PR. Therefore, we support it being a hook with a well-defined signature.
Change-Id: I578145c37f9a10b4c15ed64fa86d6d7c2fade04e
This is a great disscussion here. Actually, we are supporting a DSA with TVM, let me share my practice. 1, We only re-use some of tvm relay or tir passes, less than 10 passes, such storage flatten, we don't need most of tvm passes, keep them in our flow means wasting compilation time. So we are excited to see tvm can do these in main line. What we want to have: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just coming back to this thread, I believe there’s a way to introduce the hooks in a less intrusive way to the LowerTEPass
, by placing it just above it in build_module.cc. This should mean each Target can register a relay_to_tir
Pass which is ran there rather than having to wire it via LowerTEPass
and any PrimFunc
will be skipped as they’re not relay functions anymore.
Any thoughts on this @jroesch / @csullivan / @mbs-octoml? I've updated this RFC from @areusch's comments and added the above proposal.
Ah, excellent, that's where we've been trying to get to but when we started this conversation the refactoring to support it seemed to much to foist upon you. But we've now (thanks Lily!) extended IRModule to be able to support a mix of Functions and target-specific PrimFuncs, so it is now conceptually possible to run multiple lowing passes in series and have each only focus on lowering the Functions they care about based on the compiler tagging. With luck it's just a matter of working through the usual surprises and glitches. |
And I think the just-another-pass approach implies some of the private machinery in te_compliler needs to be hoisted to be reusable for all lowering-like passes. Eg LowerTensorExprMutator. So instead of |
This LGTM but I'm thinking going back to relay_to_tir being a Function->PrimFunc packed func as you implemented in the companion PR is better since it makes it easier for all the caching, call rewriting, dynamic shape handling and other bookkeeping to be shared. Have you tried out the pass approach and convinced youself that reuse is still possible? |
Hi @mbs-octoml, sorry I missed a few replies! The reason I hoisted it outside of the |
|
Based on the discussions around these hooks, we now have a better idea of how to introduce them into the codebase.
@mbs-octoml / @areusch apologies for the delay, there was a fair amount to tweak in here to reflect all the discussions around this RFC. Please could you take another look as I think this is lined up now 😸 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just some nits for clarity.
We're working on cleaning up the py/cpp build split so your TIRToRuntime hook should indeed be on the unique choke point.
Thanks @mbs-octoml, I've incorporated this back into the document 😸 |
LGTM again (CI will go faster that way, right?) |
@areusch is there anything outstanding from you on this RFC? It seems ready to merge after the changes you've requested. |
thanks for working through this with us @Mousius ! excited to see this land in TVM! |
Update target registration and add pass phases
This is the an initial RFC for adding additional hooks onto the
Target
toallow splitting up some of the compile flow but also unifying the
registration of these additional functions.