[RFC][DISCUSS] Tuple-related Fusion #3039

tqchen · 2019-04-17T19:03:20Z

Recently, there are a few problems arise wrt to the fusor and tuples. The main problem is the incompatibility of the calling convention when dealing with tuples.

At the lowest level, the primitive function is not aware of the tuple, and we simply flatten everything including inputs and outputs when there is a tuple input or return values.
The function that calls into the primitive function is aware of tuples.

This creates tension between at which point we should run the fusion algorithm. Originally, both Tuple and TupleGetItem are opaque, which means we should not fuse them. Then throughout the course, we started to fuse tuple-related node because they are useful as intermediate values, and can be optimized away in primitive operations.

However, tuple itself is bad as return values and can cause a bunch of problems for low-level code gen. In particular, we can get functions as follows(note the duplication in return values)

fn (%data: Tensor[(1, 32, 32, 3), float32]) -> (Tensor[(1, 32, 32, 3), float32], Tensor[(1, 32, 32, 3), float32]) {
  %3 = fn (%p0: Tensor[(1, 32, 32, 3), float32], __dict__=meta[StrMap][0]) -> (Tensor[(1, 32, 32, 3), float32], Tensor[(1, 32, 32, 3), float32]) {
    %0 = log(%p0)
    %1 = (%0, %0)
    %1
  }
  %4 = %3(%data)
  %5
}

Ideally, what we want is to fuse TupleNode to follow up consumer nodes if they are intermediate nodes, but not do so when they are the return values. So we won't suffer from this problem(as TupleNode itself cannot be the final master node).

Let us use this thread to do some consolidated discussion on the related topic

The text was updated successfully, but these errors were encountered:

MarisaKirisame · 2019-04-17T19:31:44Z

@tqchen where is %2?
also, why is the example bad for codegen? I dont get it.

zhiics · 2019-04-18T00:45:41Z

@MarisaKirisame

@tqchen where is %2?

There might be some code omitted, but the idea is to show the problem when dealing with duplicate values in return tuples.

why is the example bad for codegen

The output tensor is scheduled twice in compute_engine here:
https://github.com/dmlc/tvm/blob/552d4aa3587aa3a0443d050ceb324386489ff793/src/relay/backend/compile_engine.cc#L128

There are some problems and discussions in #2932

masahi · 2019-04-18T01:48:15Z

We can update TOPI to prevent scheduling the same op twice. It is partially done in #1556

zhiics · 2019-04-18T03:16:37Z

@masahi Can we prevent from passing duplicated tensors instead? It looks that we otherwise need to change all schedules for all targets in topi, right?

masahi · 2019-04-18T05:23:25Z

@zhiics I don't know a solution for your problem without breaking other parts (like breaking calling convention as you and @vinx13 said). I'll trying fixing the fuser so that a tuple like (%0, %0) won't happen as a return value of a fused function.

masahi · 2019-04-18T05:34:19Z

It seems #2412 is also related to this issue. @srkreddy1238 mentioned that enabling fusion caused a LLVM error.

When fusion support for tuple was added in #2187, I handled the case where

the tuple is root of its group
the tuple is fused with other ops before it

by making a new function that returns the tuple (see my comment). Is this what needs to be changed? @tqchen @zhiics @vinx13

tqchen · 2019-04-18T06:08:53Z

one possible way to make the fusor smarter. So that we forbid fusing of a node into TupleNode if that is the last in the group, but allow fuse TupleNode into subsequent injective ops, then in the second iteration the other node can fuse into tuple node because the group no longer have the tuple as return value

masahi · 2019-04-18T06:28:49Z

We should be aware that if we disable tuple fusion when tuple is the return value, we might lose some efficiency gain.

So this function, taken from our fusion test cases,

fn (%x: Tensor[(64, 64), float32]) -> (Tensor[(32, 64), float32], Tensor[(32, 64), float32]) {
  %2 = fn (%p0: Tensor[(64, 64), float32], __dict__=meta[StrMap][0]) -> (Tensor[(32, 64), float32], Tensor[(32, 64), float32]) {
    %0 = strided_slice(%p0, begin=[0, 0], end=[32, 64], strides=[1, 1])
    %1 = strided_slice(%p0, begin=[32, 0], end=[64, 64], strides=[1, 1])
    (%0, %1)
  }
  %2(%x)
}

will become this

fn (%x: Tensor[(64, 64), float32]) -> (Tensor[(32, 64), float32], Tensor[(32, 64), float32]) {
  %0 = fn (%p0: Tensor[(64, 64), float32], __dict__=meta[StrMap][0]) -> Tensor[(32, 64), float32] {
    strided_slice(%p0, begin=[0, 0], end=[32, 64], strides=[1, 1])
  }
  %1 = %0(%x)
  %2 = fn (%p01: Tensor[(64, 64), float32], __dict__=meta[StrMap][1]) -> Tensor[(32, 64), float32] {
    strided_slice(%p01, begin=[32, 0], end=[64, 64], strides=[1, 1])
  }
  %3 = %2(%x)
  (%1, %3)
}

zhiics · 2019-04-18T06:43:10Z

BTW, I am not certain that stopping fusing return tuple will fully solve the problem because it looks to me that we will still have two identical tensor in the tuple, right? Am I missing something?

masahi · 2019-04-18T07:09:04Z

@zhiics It looks like the tuple with duplicated tensors is only problematic if it is the return value of a subfunction (i.e. a function that is lowered to topi and compiled by TVM). If we lift the tuple out of a subfunction and put it under the global function, it seems to work fine. The test below works on my local.

import tvm
from tvm import relay

data = relay.var("data", relay.ty.TensorType((1, 32, 32, 3), "float32"))
log = relay.log(data)
func = relay.Function([data],  relay.Tuple(tvm.convert([log, log])))
func = relay.ir_pass.infer_type(func)
with relay.build_config(opt_level=3):
    graph, lib, params = relay.build(func, target="llvm")

The tuple is now lifted out of subfunction %0.

fn (%data: Tensor[(1, 32, 32, 3), float32]) -> (Tensor[(1, 32, 32, 3), float32], Tensor[(1, 32, 32, 3), float32]) {
  %0 = fn (%p0: Tensor[(1, 32, 32, 3), float32], __dict__=meta[StrMap][0]) -> Tensor[(1, 32, 32, 3), float32] {
    log(%p0)
  }
  %1 = %0(%data)
  (%1, %1)
}

zhiics · 2019-04-18T07:18:19Z

@masahi I see, thanks. Another option is probably using a copy operator if there are duplicates in return tuple.

masahi · 2019-04-18T07:29:57Z

@tqchen I'm looking into the following approach:

Initially, the tuple and its fields are marked as Injective, so that they can be fused as much as possible.
Later, if we detect that a tuple is the root of its group in this line, we do all necessary patchwork to make each tuple fields become root in its group and return the tuple itself as an isolated node.

I think this approach requires minimum code change. The patchwork only includes updating some pointers (parent, root_ref, etc).

tqchen · 2019-04-18T15:54:30Z

@masahi In your example of two strided_slice, given that we do not yet cooperatively fuse the two parallel strided_slice together atm, the second version is mostly as efficient as the first one.

tqchen · 2019-04-29T02:27:21Z

implemented in #3092

This was referenced Apr 17, 2019

[RELAY][FUSION] Enhance fusion rule that starts from elemwise and broadcast #2932

Merged

[Relay][RFC] FuseOps and Tuple #2931

Closed

[Relay] Fix Fuse #3035

Merged

[RELAY][PASS] Make Fusor to handle split gracefully #2890

Closed

This was referenced Apr 19, 2019

[Relay, OpFusion] Prevent returning a tuple from a fused function #3049

Closed

[Relay, OpFusion] Better tuple fusion implementation #3092

Merged

tqchen closed this as completed Apr 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC][DISCUSS] Tuple-related Fusion #3039

[RFC][DISCUSS] Tuple-related Fusion #3039

tqchen commented Apr 17, 2019 •

edited

Loading

MarisaKirisame commented Apr 17, 2019 •

edited

Loading

zhiics commented Apr 18, 2019 •

edited

Loading

masahi commented Apr 18, 2019

zhiics commented Apr 18, 2019

masahi commented Apr 18, 2019 •

edited

Loading

masahi commented Apr 18, 2019 •

edited

Loading

tqchen commented Apr 18, 2019

masahi commented Apr 18, 2019 •

edited

Loading

zhiics commented Apr 18, 2019

masahi commented Apr 18, 2019

zhiics commented Apr 18, 2019 •

edited

Loading

masahi commented Apr 18, 2019 •

edited

Loading

tqchen commented Apr 18, 2019 •

edited

Loading

tqchen commented Apr 29, 2019

[RFC][DISCUSS] Tuple-related Fusion #3039

[RFC][DISCUSS] Tuple-related Fusion #3039

Comments

tqchen commented Apr 17, 2019 • edited Loading

MarisaKirisame commented Apr 17, 2019 • edited Loading

zhiics commented Apr 18, 2019 • edited Loading

masahi commented Apr 18, 2019

zhiics commented Apr 18, 2019

masahi commented Apr 18, 2019 • edited Loading

masahi commented Apr 18, 2019 • edited Loading

tqchen commented Apr 18, 2019

masahi commented Apr 18, 2019 • edited Loading

zhiics commented Apr 18, 2019

masahi commented Apr 18, 2019

zhiics commented Apr 18, 2019 • edited Loading

masahi commented Apr 18, 2019 • edited Loading

tqchen commented Apr 18, 2019 • edited Loading

tqchen commented Apr 29, 2019

tqchen commented Apr 17, 2019 •

edited

Loading

MarisaKirisame commented Apr 17, 2019 •

edited

Loading

zhiics commented Apr 18, 2019 •

edited

Loading

masahi commented Apr 18, 2019 •

edited

Loading

masahi commented Apr 18, 2019 •

edited

Loading

masahi commented Apr 18, 2019 •

edited

Loading

zhiics commented Apr 18, 2019 •

edited

Loading

masahi commented Apr 18, 2019 •

edited

Loading

tqchen commented Apr 18, 2019 •

edited

Loading