[Torch, CI] Support PyTorch 1.7 #6594

masahi · 2020-09-30T01:48:10Z

Let's summarize steps needed to update our PyTorch support for the latest version, 1.6 (the CI is at 1.4).

Quantization support: It is completely broken due to a bug introduced in 1.6 (see state_dict for quantized models pytorch/pytorch#42497). I found a simple workaround for that problem (finally!) and the fix is WIP. The representation of quantized weights and APIs of quantized ops have changed. To support both 1.6 and older versions, we need to handle both cases (with version check) for now.
torchvision detection model support: Need to add converters for aten::tensor, aten::empty, aten::numel and aten::IntImplicit. All of these ops seem trivial. I found that other missing ops I mentioned in [Frontend][Pytorch] Improve Pytorch frontend for object detection models #6449 (comment) can be removed using torch._C._jit_pass_onnx_function_substitution(...) pass. We also need to deal with one function batched_nms that was used to be traced but now it is scripted, see https://github.com/pytorch/vision/blob/6e10e3f88158f12b7a304d3c2f803d2bbdde0823/torchvision/ops/boxes.py#L42-L43
We need to solve a typing problem involving If and Any type.
Disable or fix tests that no longer work. For example, tracing googlenet is broken on Torch 1.5 or higher (see googlenet fires assert for not being an nn.Module with torch.jit.trace() pytorch/vision#2161), so we need to disable googlenet tests.
Upgrade CI docker image

cc @t-vi @siju-samuel @kevinthesun @yongwww @alexwong @tqchen

The text was updated successfully, but these errors were encountered:

masahi · 2020-10-01T04:30:26Z

@kevinthesun @yongwww I've taken a look at supporting detection models in 1.6. It turned out we need to convert a function batched_nms that was traced in 1.4, but now it is scripted. This brings a very nasty typing problem.

Looking at this code,
https://github.com/pytorch/vision/blob/6e10e3f88158f12b7a304d3c2f803d2bbdde0823/torchvision/ops/boxes.py#L75-L86

One of the types of if branch is a zero dim empty tensor of dtype int64 https://github.com/pytorch/vision/blob/6e10e3f88158f12b7a304d3c2f803d2bbdde0823/torchvision/ops/boxes.py#L76
while the type of other branch is a dynamic 1D tensor of dtype int32 https://github.com/pytorch/vision/blob/6e10e3f88158f12b7a304d3c2f803d2bbdde0823/torchvision/ops/boxes.py#L85-L86

We cannot type check the converted Relay model with If, because dtype of two branches are different. Even if I force the dtype to be the same, if one branch is Any shape while the other is static, Relay type inference chooses a static shape for the type of If. So the return type of above function becomes a zero dim tensor.

Is there a way to turn a static shape tensor to dynamic, when one of If branch is dynamic while other is static?

masahi · 2020-10-01T07:10:01Z

@yongwww It seems torchvision nms op https://github.com/pytorch/vision/blob/master/torchvision/ops/boxes.py#L35 returns int64 indices, while nms in Relay returns int32 bit indices. We need to cast the output indices to int64, that should resolve the one of typing problem (two branches of If having different dtypes, int32 and int64).

kevinthesun · 2020-10-10T01:14:25Z

@masahi That shape problem coming from If branch looks like a Relay type inference issue to me. Type inference should generate dynamic shape in this case.

kevinthesun · 2020-10-10T01:59:32Z

@masahi

import tvm  
from tvm import relay

dtype = "float32"
branch_a = relay.var("a", shape=(relay.Any(),), dtype=dtype)
branch_b = relay.var("b", shape=(0,), dtype=dtype)
cond = relay.var("cond", shape=(), dtype="bool")

out = relay.If(cond, branch_a, branch_b)

mod = tvm.IRModule()
mod["main"] = relay.Function([cond, branch_a,branch_b], out)
mod = relay.transform.InferType()(mod)
print(mod["main"])

Will generate

fn (%cond: bool, %a: Tensor[(0), float32], %b: Tensor[(0), float32]) -> Tensor[(0), float32] {
  if (%cond) {
    %a
  } else {
    %b
  }
}

while input a should be with shape(?,). There is an issue when inferring such an expression.

masahi · 2020-10-10T02:10:50Z

@kevinthesun thanks for pointing this out and providing a minimum test case. I'm going to take a look at what's going on inside type checker and hopefully fix it. With this fixed, I can send a PR to support PyTorch 1.6 mask rcnn and faster rcnn.

masahi · 2020-10-10T02:32:36Z

Finding suspicious code was easy, here, when lhs is IntImmNode and rhs is AnyNode, ulhs is returned.

https://github.com/apache/incubator-tvm/blob/98c2096f4944bdbdbbb2b7b20ccd35c6c11dfbf6/src/relay/analysis/type_solver.cc#L195-L198

Returning urhs there seems to fix this issue. I'm not sure if this is intended or a bug.

masahi · 2020-10-10T02:40:40Z

@kevinthesun @jroesch @lixiaoquan @MarisaKirisame I found that this change was introduced in #5795

If I make that above change, that effectively undos that PR and breaks the test case introduced there. Basically, what we want from type checking If involving both static and dynamic shape are complete opposite to the motivation of #5795.

What should we do about this? I think #5795 should be reverted, since type inference is supposed to pick the most general types.

kevinthesun · 2020-10-10T02:56:15Z

Hmmm, I think that PR is an optimization for some cases but is actually not correct since static dim is just a case of Any but not vise versa. The case provided in that PR only stands when False is fed with (Any, 1) tensor at runtime.

In TF frontend there are lots of cases when we want to make expression as static as possible. My guessing is that False actually has shape (Any, 1) but somehow not fully optimized to be less dynamic. Usually such kind of optimization is done in the frontend.

@lixiaoquan We should revert it?

masahi · 2020-10-10T04:14:49Z

I've sent the revert to #6658

With this and other minor fixes, I can finally run torchvision detection models from v1.6. See #6659.

zhiqwang mentioned this issue Sep 30, 2020

[Torch] Support aten::tensor, aten::empty and aten::numel #6535

Closed

masahi mentioned this issue Sep 30, 2020

[Torch, Quantization] Necessary workaround to prepare for 1.6 update #6602

Merged

masahi mentioned this issue Oct 10, 2020

Revert "[Relay] Keep fixed dim when unifying dynamic shape (#5795)" #6658

Merged

masahi mentioned this issue Oct 26, 2020

[Relay][Frontend][Onnx] If Operator Support #6730

Merged

masahi changed the title ~~[Torch, CI] Support PyTorch 1.6~~ [Torch, CI] Support PyTorch 1.7 Oct 27, 2020

This was referenced Nov 1, 2020

[Torch, CI] Update to PyTorch 1.7 #6811

Closed

[CI] Torch 1.7 update to mainline #6828

Merged

tqchen closed this as completed in #6828 Nov 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Torch, CI] Support PyTorch 1.7 #6594

[Torch, CI] Support PyTorch 1.7 #6594

masahi commented Sep 30, 2020 •

edited

Loading

masahi commented Oct 1, 2020 •

edited

Loading

masahi commented Oct 1, 2020 •

edited

Loading

kevinthesun commented Oct 10, 2020

kevinthesun commented Oct 10, 2020 •

edited

Loading

masahi commented Oct 10, 2020

masahi commented Oct 10, 2020

masahi commented Oct 10, 2020 •

edited

Loading

kevinthesun commented Oct 10, 2020 •

edited

Loading

masahi commented Oct 10, 2020 •

edited

Loading

[Torch, CI] Support PyTorch 1.7 #6594

[Torch, CI] Support PyTorch 1.7 #6594

Comments

masahi commented Sep 30, 2020 • edited Loading

masahi commented Oct 1, 2020 • edited Loading

masahi commented Oct 1, 2020 • edited Loading

kevinthesun commented Oct 10, 2020

kevinthesun commented Oct 10, 2020 • edited Loading

masahi commented Oct 10, 2020

masahi commented Oct 10, 2020

masahi commented Oct 10, 2020 • edited Loading

kevinthesun commented Oct 10, 2020 • edited Loading

masahi commented Oct 10, 2020 • edited Loading

masahi commented Sep 30, 2020 •

edited

Loading

masahi commented Oct 1, 2020 •

edited

Loading

masahi commented Oct 1, 2020 •

edited

Loading

kevinthesun commented Oct 10, 2020 •

edited

Loading

masahi commented Oct 10, 2020 •

edited

Loading

kevinthesun commented Oct 10, 2020 •

edited

Loading

masahi commented Oct 10, 2020 •

edited

Loading