-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC][DISCUSS] Tuple-related Fusion #3039
Comments
@tqchen where is %2? |
There might be some code omitted, but the idea is to show the problem when dealing with duplicate values in return tuples.
The output tensor is scheduled twice in compute_engine here: There are some problems and discussions in #2932 |
We can update TOPI to prevent scheduling the same op twice. It is partially done in #1556 |
@masahi Can we prevent from passing duplicated tensors instead? It looks that we otherwise need to change all schedules for all targets in topi, right? |
It seems #2412 is also related to this issue. @srkreddy1238 mentioned that enabling fusion caused a LLVM error. When fusion support for tuple was added in #2187, I handled the case where
by making a new function that returns the tuple (see my comment). Is this what needs to be changed? @tqchen @zhiics @vinx13 |
one possible way to make the fusor smarter. So that we forbid fusing of a node into TupleNode if that is the last in the group, but allow fuse TupleNode into subsequent injective ops, then in the second iteration the other node can fuse into tuple node because the group no longer have the tuple as return value |
We should be aware that if we disable tuple fusion when tuple is the return value, we might lose some efficiency gain. So this function, taken from our fusion test cases,
will become this
|
BTW, I am not certain that stopping fusing return tuple will fully solve the problem because it looks to me that we will still have two identical tensor in the tuple, right? Am I missing something? |
@zhiics It looks like the tuple with duplicated tensors is only problematic if it is the return value of a subfunction (i.e. a function that is lowered to topi and compiled by TVM). If we lift the tuple out of a subfunction and put it under the global function, it seems to work fine. The test below works on my local.
The tuple is now lifted out of subfunction %0.
|
@masahi I see, thanks. Another option is probably using a copy operator if there are duplicates in return tuple. |
@tqchen I'm looking into the following approach:
I think this approach requires minimum code change. The patchwork only includes updating some pointers (parent, root_ref, etc). |
@masahi In your example of two strided_slice, given that we do not yet cooperatively fuse the two parallel strided_slice together atm, the second version is mostly as efficient as the first one. |
implemented in #3092 |
Recently, there are a few problems arise wrt to the fusor and tuples. The main problem is the incompatibility of the calling convention when dealing with tuples.
This creates tension between at which point we should run the fusion algorithm. Originally, both Tuple and TupleGetItem are opaque, which means we should not fuse them. Then throughout the course, we started to fuse tuple-related node because they are useful as intermediate values, and can be optimized away in primitive operations.
However, tuple itself is bad as return values and can cause a bunch of problems for low-level code gen. In particular, we can get functions as follows(note the duplication in return values)
Ideally, what we want is to fuse TupleNode to follow up consumer nodes if they are intermediate nodes, but not do so when they are the return values. So we won't suffer from this problem(as TupleNode itself cannot be the final master node).
Let us use this thread to do some consolidated discussion on the related topic
The text was updated successfully, but these errors were encountered: