Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC][DISCUSS] Tuple-related Fusion #3039

Closed
tqchen opened this issue Apr 17, 2019 · 14 comments
Closed

[RFC][DISCUSS] Tuple-related Fusion #3039

tqchen opened this issue Apr 17, 2019 · 14 comments

Comments

@tqchen
Copy link
Member

tqchen commented Apr 17, 2019

Recently, there are a few problems arise wrt to the fusor and tuples. The main problem is the incompatibility of the calling convention when dealing with tuples.

  • At the lowest level, the primitive function is not aware of the tuple, and we simply flatten everything including inputs and outputs when there is a tuple input or return values.
  • The function that calls into the primitive function is aware of tuples.

This creates tension between at which point we should run the fusion algorithm. Originally, both Tuple and TupleGetItem are opaque, which means we should not fuse them. Then throughout the course, we started to fuse tuple-related node because they are useful as intermediate values, and can be optimized away in primitive operations.

However, tuple itself is bad as return values and can cause a bunch of problems for low-level code gen. In particular, we can get functions as follows(note the duplication in return values)

fn (%data: Tensor[(1, 32, 32, 3), float32]) -> (Tensor[(1, 32, 32, 3), float32], Tensor[(1, 32, 32, 3), float32]) {
  %3 = fn (%p0: Tensor[(1, 32, 32, 3), float32], __dict__=meta[StrMap][0]) -> (Tensor[(1, 32, 32, 3), float32], Tensor[(1, 32, 32, 3), float32]) {
    %0 = log(%p0)
    %1 = (%0, %0)
    %1
  }
  %4 = %3(%data)
  %5
}

Ideally, what we want is to fuse TupleNode to follow up consumer nodes if they are intermediate nodes, but not do so when they are the return values. So we won't suffer from this problem(as TupleNode itself cannot be the final master node).

Let us use this thread to do some consolidated discussion on the related topic

@MarisaKirisame
Copy link
Contributor

MarisaKirisame commented Apr 17, 2019

@tqchen where is %2?
also, why is the example bad for codegen? I dont get it.

@zhiics
Copy link
Member

zhiics commented Apr 18, 2019

@MarisaKirisame

@tqchen where is %2?

There might be some code omitted, but the idea is to show the problem when dealing with duplicate values in return tuples.

why is the example bad for codegen

The output tensor is scheduled twice in compute_engine here:
https://github.com/dmlc/tvm/blob/552d4aa3587aa3a0443d050ceb324386489ff793/src/relay/backend/compile_engine.cc#L128

There are some problems and discussions in #2932

@masahi
Copy link
Member

masahi commented Apr 18, 2019

We can update TOPI to prevent scheduling the same op twice. It is partially done in #1556

@zhiics
Copy link
Member

zhiics commented Apr 18, 2019

@masahi Can we prevent from passing duplicated tensors instead? It looks that we otherwise need to change all schedules for all targets in topi, right?

@masahi
Copy link
Member

masahi commented Apr 18, 2019

@zhiics I don't know a solution for your problem without breaking other parts (like breaking calling convention as you and @vinx13 said). I'll trying fixing the fuser so that a tuple like (%0, %0) won't happen as a return value of a fused function.

@masahi
Copy link
Member

masahi commented Apr 18, 2019

It seems #2412 is also related to this issue. @srkreddy1238 mentioned that enabling fusion caused a LLVM error.

When fusion support for tuple was added in #2187, I handled the case where

  • the tuple is root of its group
  • the tuple is fused with other ops before it

by making a new function that returns the tuple (see my comment). Is this what needs to be changed? @tqchen @zhiics @vinx13

@tqchen
Copy link
Member Author

tqchen commented Apr 18, 2019

one possible way to make the fusor smarter. So that we forbid fusing of a node into TupleNode if that is the last in the group, but allow fuse TupleNode into subsequent injective ops, then in the second iteration the other node can fuse into tuple node because the group no longer have the tuple as return value

@masahi
Copy link
Member

masahi commented Apr 18, 2019

We should be aware that if we disable tuple fusion when tuple is the return value, we might lose some efficiency gain.

So this function, taken from our fusion test cases,

fn (%x: Tensor[(64, 64), float32]) -> (Tensor[(32, 64), float32], Tensor[(32, 64), float32]) {
  %2 = fn (%p0: Tensor[(64, 64), float32], __dict__=meta[StrMap][0]) -> (Tensor[(32, 64), float32], Tensor[(32, 64), float32]) {
    %0 = strided_slice(%p0, begin=[0, 0], end=[32, 64], strides=[1, 1])
    %1 = strided_slice(%p0, begin=[32, 0], end=[64, 64], strides=[1, 1])
    (%0, %1)
  }
  %2(%x)
}

will become this

fn (%x: Tensor[(64, 64), float32]) -> (Tensor[(32, 64), float32], Tensor[(32, 64), float32]) {
  %0 = fn (%p0: Tensor[(64, 64), float32], __dict__=meta[StrMap][0]) -> Tensor[(32, 64), float32] {
    strided_slice(%p0, begin=[0, 0], end=[32, 64], strides=[1, 1])
  }
  %1 = %0(%x)
  %2 = fn (%p01: Tensor[(64, 64), float32], __dict__=meta[StrMap][1]) -> Tensor[(32, 64), float32] {
    strided_slice(%p01, begin=[32, 0], end=[64, 64], strides=[1, 1])
  }
  %3 = %2(%x)
  (%1, %3)
}

@zhiics
Copy link
Member

zhiics commented Apr 18, 2019

BTW, I am not certain that stopping fusing return tuple will fully solve the problem because it looks to me that we will still have two identical tensor in the tuple, right? Am I missing something?

@masahi
Copy link
Member

masahi commented Apr 18, 2019

@zhiics It looks like the tuple with duplicated tensors is only problematic if it is the return value of a subfunction (i.e. a function that is lowered to topi and compiled by TVM). If we lift the tuple out of a subfunction and put it under the global function, it seems to work fine. The test below works on my local.

import tvm
from tvm import relay

data = relay.var("data", relay.ty.TensorType((1, 32, 32, 3), "float32"))
log = relay.log(data)
func = relay.Function([data],  relay.Tuple(tvm.convert([log, log])))
func = relay.ir_pass.infer_type(func)
with relay.build_config(opt_level=3):
    graph, lib, params = relay.build(func, target="llvm")

The tuple is now lifted out of subfunction %0.

fn (%data: Tensor[(1, 32, 32, 3), float32]) -> (Tensor[(1, 32, 32, 3), float32], Tensor[(1, 32, 32, 3), float32]) {
  %0 = fn (%p0: Tensor[(1, 32, 32, 3), float32], __dict__=meta[StrMap][0]) -> Tensor[(1, 32, 32, 3), float32] {
    log(%p0)
  }
  %1 = %0(%data)
  (%1, %1)
}

@zhiics
Copy link
Member

zhiics commented Apr 18, 2019

@masahi I see, thanks. Another option is probably using a copy operator if there are duplicates in return tuple.

@masahi
Copy link
Member

masahi commented Apr 18, 2019

@tqchen I'm looking into the following approach:

  • Initially, the tuple and its fields are marked as Injective, so that they can be fused as much as possible.
  • Later, if we detect that a tuple is the root of its group in this line, we do all necessary patchwork to make each tuple fields become root in its group and return the tuple itself as an isolated node.

I think this approach requires minimum code change. The patchwork only includes updating some pointers (parent, root_ref, etc).

@tqchen
Copy link
Member Author

tqchen commented Apr 18, 2019

@masahi In your example of two strided_slice, given that we do not yet cooperatively fuse the two parallel strided_slice together atm, the second version is mostly as efficient as the first one.

@tqchen
Copy link
Member Author

tqchen commented Apr 29, 2019

implemented in #3092

@tqchen tqchen closed this as completed Apr 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants