Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug.Relay.InferType] Type Inference Report Mismatch after One Operator Is Removed #8432

Closed
Johnson9009 opened this issue Jul 9, 2021 · 5 comments
Labels
flow:relay The overall lowering flow for tvm.relay.build, including BYOC core, excluding tvm.driver.build. relay:op src/relay/op

Comments

@Johnson9009
Copy link
Contributor

Standard Ouput and Error Message

[00:32:22] /home/zhaqia01/workspaces/tvm/src/ir/transform.cc:616: PrintIR():
#[version = "0.0.5"]
def @main(%x1: Tensor[(1, 224, 224, 3), int8], %x2: Tensor[(7, 7, 3, 64), int8]) -> Tensor[(1, 224, 224, 64), int32] {
  %0 = nn.pad(%x1, 0 /* ty=int32 */, pad_width=[[0, 0], [3, 3], [3, 3], [0, 0]]) /* ty=Tensor[(1, 230, 230, 3), int8] */;
  nn.conv2d(%0, %x2, padding=[0, 0, 0, 0], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 224, 224, 64), int32] */
}

The Relay type checker is unable to show the following types match.
In particular dimension 1 conflicts: 218 does not match 224.dimension 2 conflicts: 218 does not match 224.
The Relay type checker is unable to show the following types match.
In particular `Tensor[(1, 224, 224, 64), int32]` does not match `Tensor[(1, 218, 218, 64), int32]`
note: run with `TVM_BACKTRACE=1` environment variable to display a backtrace.

Reproduce Test Case

import tvm
from tvm import relay


class PadSimplifier(relay.ExprMutator):
    def __init__(self):
        super().__init__()

    def visit_call(self, call):
        call = super().visit_call(call)
        if ((call.op != relay.op.get("nn.conv2d")) or
            (not isinstance(call.args[0], relay.Call)) or
            (call.args[0].op != relay.op.get("nn.pad"))):
            return call

        conv2d = call
        pad, weight = conv2d.args
        data, pad_value = pad.args
        if ((pad.attrs.pad_mode != "constant") or
            (not isinstance(pad_value, relay.Constant)) or
            (pad_value.data.numpy() != 0)):
            return conv2d

        # For reproduce issue, so just return the conv2d operator.
        return relay.Call(conv2d.op, [data, weight], conv2d.attrs, conv2d.type_args, conv2d.span)


@relay.transform.function_pass(opt_level=0)
class SimplifyPad:
    def transform_function(self, func, ir_mod, pass_ctx):
        return PadSimplifier().visit(func)


dtype = "int8"
dshape = (1, 224, 224, 3)
kshape = (7, 7, 3, 64)

x1 = relay.var("x1", shape=dshape, dtype=dtype)
x2 = relay.var("x2", shape=kshape, dtype=dtype)
expr = relay.nn.pad(x1, [[0, 0], [3, 3], [3, 3], [0, 0]])
expr = relay.nn.conv2d(expr, x2, data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32")
nn_mod = tvm.IRModule.from_expr(expr)

passes = [
    relay.transform.InferType(),
    tvm.transform.PrintIR(),
    SimplifyPad(),
]

with tvm.transform.PassContext(opt_level=3):
    nn_mod = tvm.transform.Sequential(passes)(nn_mod)
print(nn_mod)

Current Clue

We can see the first InferType pass works well, so before the pass "SimplifyPad" the infered output shape of "nn.conv2d" is (1, 224, 224, 64), after the pass "SimplifyPad" the operator "nn.pad" is removed, because "SimplifyPad" is a function pass so the InferType pass will be executed automatically, the error happened in this 2nd InferType pass.

pass_ctx->diag_ctx.value().Render();
pass_ctx->diag_ctx = previous;
// TODO(@jroesch): move away from eager type checking for performance reasons
// make issue.
return transform::InferType()(updated_mod);
}

When the 2nd InferType call function Conv2DRel, the value of parameter "types" is "[TensorType([1, 224, 224, 3], int8), TensorType([7, 7, 3, 64], int8), TensorType([1, 224, 224, 64], int32)]", the last item of parameter "types" maybe wrong, because the value of this parameter during the 1st InferType is "[TensorType([1, 230, 230, 3], int8), TensorType([7, 7, 3, 64], int8), IncompleteTypeNode(0, 0x5d6e270)]".

bool Conv2DRel(const Array<Type>& types, int num_inputs, const Attrs& attrs,
const TypeReporter& reporter) {
ICHECK_EQ(types.size(), 3);
const auto* data = types[0].as<TensorTypeNode>();

The type solver relevant code is hard to understand, so I want to know is this a bug? or the pass I write missing something?
Thanks a lot.

@Johnson9009
Copy link
Contributor Author

@jwfromm @jroesch @tqchen Can help to check this issue? Thanks.

@Johnson9009
Copy link
Contributor Author

After one day debugging, I find the root cause of this issue, it is the below "if" statement.

if (f->ret_type.defined()) {
rtype = this->Unify(f->ret_type, rtype, GetRef<Function>(f)->span);
}

With the help of git history, it is added by the PR #2437.

Just use the above test case to describe why the issue will happen, before the 1st invocation of "InferType" pass, the types of function are something like below.

def @main(%x1: Tensor[(1, 224, 224, 3), int8], %x2: Tensor[(7, 7, 3, 64), int8]) -> IncompleteTypeNode(0, 0xYYYYYY) {
  %0 = nn.pad(%x1, 0 /* ty=int32 */, pad_width=[[0, 0], [3, 3], [3, 3], [0, 0]]) /* ty=IncompleteTypeNode(0, 0xYYYYYY) */;
  nn.conv2d(%0, %x2, padding=[0, 0, 0, 0], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=IncompleteTypeNode(0, 0xYYYYYY) */
}

The return type of the function is not defined, so f->ret_type.defined() will be evaluated to "false", so everything is good, after the 1st invocation of "InferType" pass, the types of function are something like below.

def @main(%x1: Tensor[(1, 224, 224, 3), int8], %x2: Tensor[(7, 7, 3, 64), int8]) -> Tensor[(1, 224, 224, 64), int32] {
  %0 = nn.pad(%x1, 0 /* ty=int32 */, pad_width=[[0, 0], [3, 3], [3, 3], [0, 0]]) /* ty=Tensor[(1, 230, 230, 3), int8] */;
  nn.conv2d(%0, %x2, padding=[0, 0, 0, 0], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 224, 224, 64), int32] */
}

After my pass "SimplifyPad", the "nn.pad" operator is removed, and then the 2nd invocation of pass "InferType" will happen, the types of function at this time are something like below.

def @main(%x1: Tensor[(1, 224, 224, 3), int8], %x2: Tensor[(7, 7, 3, 64), int8]) -> Tensor[(1, 224, 224, 64), int32] {
  nn.conv2d(%x1, %x2, padding=[0, 0, 0, 0], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=IncompleteTypeNode(0, 0xYYYYYY) */
}

The key difference at this time is the return type of the function, it is defined and is "Tensor[(1, 224, 224, 64), int32]", because the last expression of this function is "nn.conv2d", so the return type of "nn.conv2d" is the return type of the function. With the code of line 557~559, the return type of "nn.conv2d" is changed from "IncompleteTypeNode(0, 0xYYYYYY)" to "Tensor[(1, 224, 224, 64), int32]" too.

Then the function "Conv2DRel" is called to infer the return type of this "nn.conv2d", but the item of its parameter "types" is "Tensor[(1, 224, 224, 64), int32]" instead of "IncompleteTypeNode(0, 0xYYYYYY)", the type inference logic of the function "Conv2DRel" give out the return type of "nn.conv2d" should be "Tensor[(1, 218, 218, 64), int32]", then the logic of function "tvm::relay::TypeSolver::Unifier::Unify" think the return type of "nn.conv2d" is infered to be 2 different one, so it report the error message.

After analyzing this issue will happen as long as your pass will change the shape of return type of Relay function, in another word, if your pass will not change the final shape of return type of Relay function, the issue will not be triggered.

@slyubomirsky @jroesch @tqchen I don't know whether we can just remove the "if" statement of L557-L559 to fix this issue, what's your opinions?

Thanks.

@tqchen
Copy link
Member

tqchen commented Jul 20, 2021

@slyubomirsky @jroesch @altanh would be great if you can help followup on this

@areusch areusch added the needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it label Oct 19, 2022
@Lunderberg
Copy link
Contributor

Confirmed that the error still occurs in main with the example script.

@Lunderberg Lunderberg added flow:relay The overall lowering flow for tvm.relay.build, including BYOC core, excluding tvm.driver.build. relay:op src/relay/op and removed needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it labels Oct 28, 2022
@sisleyli
Copy link
Contributor

Hi @Lunderberg @Johnson9009 , I had the same problem and I found a way to circumvent it.
When we inherit ExprMutator to modify the original model, we can overload VisitType to erase all type information except variable, and then use InferType to infer other nodes' types.
Does it make sense to you? I hope this method can help you.

@tqchen tqchen closed this as completed Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flow:relay The overall lowering flow for tvm.relay.build, including BYOC core, excluding tvm.driver.build. relay:op src/relay/op
Projects
None yet
Development

No branches or pull requests

5 participants