[Bug.Relay.InferType] Type Inference Report Mismatch after One Operator Is Removed #8432

Johnson9009 · 2021-07-09T17:02:41Z

Standard Ouput and Error Message

[00:32:22] /home/zhaqia01/workspaces/tvm/src/ir/transform.cc:616: PrintIR():
#[version = "0.0.5"]
def @main(%x1: Tensor[(1, 224, 224, 3), int8], %x2: Tensor[(7, 7, 3, 64), int8]) -> Tensor[(1, 224, 224, 64), int32] {
  %0 = nn.pad(%x1, 0 /* ty=int32 */, pad_width=[[0, 0], [3, 3], [3, 3], [0, 0]]) /* ty=Tensor[(1, 230, 230, 3), int8] */;
  nn.conv2d(%0, %x2, padding=[0, 0, 0, 0], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 224, 224, 64), int32] */
}

The Relay type checker is unable to show the following types match.
In particular dimension 1 conflicts: 218 does not match 224.dimension 2 conflicts: 218 does not match 224.
The Relay type checker is unable to show the following types match.
In particular `Tensor[(1, 224, 224, 64), int32]` does not match `Tensor[(1, 218, 218, 64), int32]`
note: run with `TVM_BACKTRACE=1` environment variable to display a backtrace.

Reproduce Test Case

import tvm
from tvm import relay


class PadSimplifier(relay.ExprMutator):
    def __init__(self):
        super().__init__()

    def visit_call(self, call):
        call = super().visit_call(call)
        if ((call.op != relay.op.get("nn.conv2d")) or
            (not isinstance(call.args[0], relay.Call)) or
            (call.args[0].op != relay.op.get("nn.pad"))):
            return call

        conv2d = call
        pad, weight = conv2d.args
        data, pad_value = pad.args
        if ((pad.attrs.pad_mode != "constant") or
            (not isinstance(pad_value, relay.Constant)) or
            (pad_value.data.numpy() != 0)):
            return conv2d

        # For reproduce issue, so just return the conv2d operator.
        return relay.Call(conv2d.op, [data, weight], conv2d.attrs, conv2d.type_args, conv2d.span)


@relay.transform.function_pass(opt_level=0)
class SimplifyPad:
    def transform_function(self, func, ir_mod, pass_ctx):
        return PadSimplifier().visit(func)


dtype = "int8"
dshape = (1, 224, 224, 3)
kshape = (7, 7, 3, 64)

x1 = relay.var("x1", shape=dshape, dtype=dtype)
x2 = relay.var("x2", shape=kshape, dtype=dtype)
expr = relay.nn.pad(x1, [[0, 0], [3, 3], [3, 3], [0, 0]])
expr = relay.nn.conv2d(expr, x2, data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32")
nn_mod = tvm.IRModule.from_expr(expr)

passes = [
    relay.transform.InferType(),
    tvm.transform.PrintIR(),
    SimplifyPad(),
]

with tvm.transform.PassContext(opt_level=3):
    nn_mod = tvm.transform.Sequential(passes)(nn_mod)
print(nn_mod)

Current Clue

We can see the first InferType pass works well, so before the pass "SimplifyPad" the infered output shape of "nn.conv2d" is (1, 224, 224, 64), after the pass "SimplifyPad" the operator "nn.pad" is removed, because "SimplifyPad" is a function pass so the InferType pass will be executed automatically, the error happened in this 2nd InferType pass.

tvm/src/relay/ir/transform.cc

Lines 157 to 163 in 683c5eb

    
             pass_ctx->diag_ctx.value().Render(); 
        
             pass_ctx->diag_ctx = previous; 
        
             // TODO(@jroesch): move away from eager type checking for performance reasons 
        
             // make issue. 
        
             return transform::InferType()(updated_mod); 
        
           }

When the 2nd InferType call function Conv2DRel, the value of parameter "types" is "[TensorType([1, 224, 224, 3], int8), TensorType([7, 7, 3, 64], int8), TensorType([1, 224, 224, 64], int32)]", the last item of parameter "types" maybe wrong, because the value of this parameter during the 1st InferType is "[TensorType([1, 230, 230, 3], int8), TensorType([7, 7, 3, 64], int8), IncompleteTypeNode(0, 0x5d6e270)]".

tvm/src/relay/op/nn/convolution.h

Lines 133 to 136 in 683c5eb

    
           bool Conv2DRel(const Array<Type>& types, int num_inputs, const Attrs& attrs, 
        
                          const TypeReporter& reporter) { 
        
             ICHECK_EQ(types.size(), 3); 
        
             const auto* data = types[0].as<TensorTypeNode>();

The type solver relevant code is hard to understand, so I want to know is this a bug? or the pass I write missing something?
Thanks a lot.

The text was updated successfully, but these errors were encountered:

Johnson9009 · 2021-07-09T17:05:37Z

@jwfromm @jroesch @tqchen Can help to check this issue? Thanks.

Johnson9009 · 2021-07-12T15:02:54Z

After one day debugging, I find the root cause of this issue, it is the below "if" statement.

tvm/src/relay/transforms/type_infer.cc

Lines 557 to 559 in 1d7a9e9

    
           if (f->ret_type.defined()) { 
        
             rtype = this->Unify(f->ret_type, rtype, GetRef<Function>(f)->span); 
        
           }

With the help of git history, it is added by the PR #2437.

Just use the above test case to describe why the issue will happen, before the 1st invocation of "InferType" pass, the types of function are something like below.

def @main(%x1: Tensor[(1, 224, 224, 3), int8], %x2: Tensor[(7, 7, 3, 64), int8]) -> IncompleteTypeNode(0, 0xYYYYYY) {
  %0 = nn.pad(%x1, 0 /* ty=int32 */, pad_width=[[0, 0], [3, 3], [3, 3], [0, 0]]) /* ty=IncompleteTypeNode(0, 0xYYYYYY) */;
  nn.conv2d(%0, %x2, padding=[0, 0, 0, 0], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=IncompleteTypeNode(0, 0xYYYYYY) */
}

The return type of the function is not defined, so f->ret_type.defined() will be evaluated to "false", so everything is good, after the 1st invocation of "InferType" pass, the types of function are something like below.

def @main(%x1: Tensor[(1, 224, 224, 3), int8], %x2: Tensor[(7, 7, 3, 64), int8]) -> Tensor[(1, 224, 224, 64), int32] {
  %0 = nn.pad(%x1, 0 /* ty=int32 */, pad_width=[[0, 0], [3, 3], [3, 3], [0, 0]]) /* ty=Tensor[(1, 230, 230, 3), int8] */;
  nn.conv2d(%0, %x2, padding=[0, 0, 0, 0], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 224, 224, 64), int32] */
}

After my pass "SimplifyPad", the "nn.pad" operator is removed, and then the 2nd invocation of pass "InferType" will happen, the types of function at this time are something like below.

def @main(%x1: Tensor[(1, 224, 224, 3), int8], %x2: Tensor[(7, 7, 3, 64), int8]) -> Tensor[(1, 224, 224, 64), int32] {
  nn.conv2d(%x1, %x2, padding=[0, 0, 0, 0], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=IncompleteTypeNode(0, 0xYYYYYY) */
}

The key difference at this time is the return type of the function, it is defined and is "Tensor[(1, 224, 224, 64), int32]", because the last expression of this function is "nn.conv2d", so the return type of "nn.conv2d" is the return type of the function. With the code of line 557~559, the return type of "nn.conv2d" is changed from "IncompleteTypeNode(0, 0xYYYYYY)" to "Tensor[(1, 224, 224, 64), int32]" too.

Then the function "Conv2DRel" is called to infer the return type of this "nn.conv2d", but the item of its parameter "types" is "Tensor[(1, 224, 224, 64), int32]" instead of "IncompleteTypeNode(0, 0xYYYYYY)", the type inference logic of the function "Conv2DRel" give out the return type of "nn.conv2d" should be "Tensor[(1, 218, 218, 64), int32]", then the logic of function "tvm::relay::TypeSolver::Unifier::Unify" think the return type of "nn.conv2d" is infered to be 2 different one, so it report the error message.

After analyzing this issue will happen as long as your pass will change the shape of return type of Relay function, in another word, if your pass will not change the final shape of return type of Relay function, the issue will not be triggered.

@slyubomirsky @jroesch @tqchen I don't know whether we can just remove the "if" statement of L557-L559 to fix this issue, what's your opinions?

Thanks.

tqchen · 2021-07-20T20:19:20Z

@slyubomirsky @jroesch @altanh would be great if you can help followup on this

Lunderberg · 2022-10-28T18:06:12Z

Confirmed that the error still occurs in main with the example script.

sisleyli · 2022-11-10T02:06:42Z

Hi @Lunderberg @Johnson9009 , I had the same problem and I found a way to circumvent it.
When we inherit ExprMutator to modify the original model, we can overload VisitType to erase all type information except variable, and then use InferType to infer other nodes' types.
Does it make sense to you? I hope this method can help you.

areusch added the needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it label Oct 19, 2022

Lunderberg added flow:relay The overall lowering flow for tvm.relay.build, including BYOC core, excluding tvm.driver.build. relay:op src/relay/op and removed needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it labels Oct 28, 2022

tqchen closed this as completed Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug.Relay.InferType] Type Inference Report Mismatch after One Operator Is Removed #8432

[Bug.Relay.InferType] Type Inference Report Mismatch after One Operator Is Removed #8432

Johnson9009 commented Jul 9, 2021

Johnson9009 commented Jul 9, 2021

Johnson9009 commented Jul 12, 2021

tqchen commented Jul 20, 2021

Lunderberg commented Oct 28, 2022

sisleyli commented Nov 10, 2022

[Bug.Relay.InferType] Type Inference Report Mismatch after One Operator Is Removed #8432

[Bug.Relay.InferType] Type Inference Report Mismatch after One Operator Is Removed #8432

Comments

Johnson9009 commented Jul 9, 2021

Standard Ouput and Error Message

Reproduce Test Case

Current Clue

Johnson9009 commented Jul 9, 2021

Johnson9009 commented Jul 12, 2021

tqchen commented Jul 20, 2021

Lunderberg commented Oct 28, 2022

sisleyli commented Nov 10, 2022