Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiling yolov5 #253

Closed
Ownmarc opened this issue Mar 23, 2021 · 19 comments
Closed

Compiling yolov5 #253

Ownmarc opened this issue Mar 23, 2021 · 19 comments

Comments

@Ownmarc
Copy link

Ownmarc commented Mar 23, 2021

Hey, I am looking to run yolov5 https://github.com/ultralytics/yolov5 model on an inf1 instance for inference.

I am first trying to get the original Coco model to compile but hitting the following error. I have followed many aws tutorials (yolov4 and resnet) and trying to compile on a c5-xlarge instance (4 cpu with 8gb of ram) using the ubuntu 18 DLAMI in the aws_neuron_pytorch_p36 python env.

One thing I noticed is that the neuron-cc requires numpy <= 1.18.4 while yolov5 requires numpy >= 1.18.5 I first made sure the model would run correctly by updating numpy to 1.18.5 and then downgraded numpy to 1.18.4 per neuron-cc requirement before compiling/converting the model.

Not exactly sure where to look to debug this (if at all possible) and would welcome any hint.

fake_image = torch.zeros([1, 3, 608, 608], dtype=torch.float32)
here is the output of torch.neuron.analyze_model(model, example_inputs=[fake_image])

/home/ubuntu/yolov5/models/yolo.py:48: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if self.grid[i].shape[2:4] != x[i].shape[2:4]:
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py:940: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for `list`, use a `tuple` instead. for `dict`, use a `NamedTuple` instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior.
  _force_outplace,
INFO:Neuron:The following operations are currently supported in torch-neuron for this model:
INFO:Neuron:prim::TupleConstruct
INFO:Neuron:aten::permute
INFO:Neuron:aten::slice
INFO:Neuron:prim::Constant
INFO:Neuron:prim::ListConstruct
INFO:Neuron:aten::pow
INFO:Neuron:aten::max_pool2d
INFO:Neuron:aten::upsample_nearest2d
INFO:Neuron:aten::Int
INFO:Neuron:aten::mul
INFO:Neuron:aten::_convolution
INFO:Neuron:prim::NumToTensor
INFO:Neuron:aten::sub
INFO:Neuron:aten::sigmoid
INFO:Neuron:aten::silu
INFO:Neuron:prim::TupleUnpack
INFO:Neuron:aten::expand
INFO:Neuron:aten::contiguous
INFO:Neuron:aten::copy_
INFO:Neuron:aten::size
INFO:Neuron:aten::view
INFO:Neuron:aten::cat
INFO:Neuron:aten::select
INFO:Neuron:aten::add
INFO:Neuron:100.00% of all operations (including primitives) (2369 of 2369) are supported
INFO:Neuron:100.00% of arithmetic operations (304 of 304) are supported

and then I run the compiling function model_neuron = torch.neuron.trace(model, example_inputs=[fake_image], compiler_args="-O2") that gives an error with the following trace :

/home/ubuntu/yolov5/models/yolo.py:48: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if self.grid[i].shape[2:4] != x[i].shape[2:4]:
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py:940: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for `list`, use a `tuple` instead. for `dict`, use a `NamedTuple` instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior.
  _force_outplace,
INFO:Neuron:All operators are compiled by neuron-cc (this does not guarantee that neuron-cc will successfully compile)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 304, fused = 304, percent fused = 100.0%
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py:779: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for `list`, use a `tuple` instead. for `dict`, use a `NamedTuple` instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior.
  name, func, example_inputs, var_lookup_fn, strict, _force_outplace
INFO:Neuron:Compiler args type is <class 'str'> value is -O2
INFO:Neuron:compiling function _NeuronGraph$1842 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/bin/neuron-cc compile /tmp/tmp274rqrqq/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp274rqrqq/graph_def.neff --io-config {"inputs": {"tensor.1:0": [[1, 3, 608, 608], "float32"]}, "outputs": ["concat_14:0"]} -O2 --verbose 35'
WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph#0(%[790] : torch.float32(1, 3, 608, 608)):
  Focus#50:
    %[./6] : torch.float32(1, 3, 304, 608) = ./aten::slice#5(%[790])
    %[./12] : torch.float32(1, 3, 304, 304) = ./aten::slice#10(%[./6])
    %[./17] : torch.float32(1, 3, 304, 608) = ./aten::slice#15(%[790])
    %[./22] : torch.float32(1, 3, 304, 304) = ./aten::slice#20(%[./17])
    %[./27] : torch.float32(1, 3, 304, 608) = ./aten::slice#25(%[790])
    %[./32] : torch.float32(1, 3, 304, 304) = ./aten::slice#30(%[./27])
    %[./37] : torch.float32(1, 3, 304, 608) = ./aten::slice#35(%[790])
    %[./42] : torch.float32(1, 3, 304, 304) = ./aten::slice#40(%[./37])
    %[./45] : torch.float32(1, 12, 304, 304) = ./aten::cat#43()
  Focus#50/Conv#44/Conv2d#2:
        %[Focus#50/Conv#44/6] : torch.float32(1, 48, 304, 304) = ./aten::_convolution#20(%[Focus#50/45])
  Focus#50/Conv#44/SiLU#3:
        %[4215] : torch.float32(1, 48, 304, 304) = ./aten::silu_#0(%[Focus#50/Conv#44/6])
  Conv#51/Conv2d#2:
      %[Conv#51/6] : torch.float32(1, 96, 152, 152) = ./aten::_convolution#20(%[4215])
  Conv#51/SiLU#3:
      %[4216] : torch.float32(1, 96, 152, 152) = ./aten::silu_#0(%[Conv#51/6])
  C3#52/Conv#4/Conv2d#2:
        %[C3#52/Conv#4/6] : torch.float32(1, 48, 152, 152) = ./aten::_convolution#20(%[4216])
  C3#52/Conv#4/SiLU#3:
        %[C3#52/13] : torch.float32(1, 48, 152, 152) = ./aten::silu_#0(%[C3#52/Conv#4/6])
  C3#52/Sequential#5/Bottleneck#2/Conv#2/Conv2d#2:
            %[C3#52/Sequential#5/Bottleneck#2/Conv#2/6] : torch.float32(1, 48, 152, 152) = ./aten::_convolution#20(%[C3#52/13])
  C3#52/Sequential#5/Bottleneck#2/Conv#2/SiLU#3:
            %[C3#52/Sequential#5/Bottleneck#2/8] : torch.float32(1, 48, 152, 152) = ./aten::silu_#0(%[C3#52/Sequential#5/Bottleneck#2/Conv#2/6])
  C3#52/Sequential#5/Bottleneck#2/Conv#3/Conv2d#2:
            %[C3#52/Sequential#5/Bottleneck#2/Conv#3/6] : torch.float32(1, 48, 152, 152) = ./aten::_convolution#20(%[C3#52/Sequential#5/Bottleneck#2/8])
  C3#52/Sequential#5/Bottleneck#2/Conv#3/SiLU#3:
            %[C3#52/Sequential#5/Bottleneck#2/9] : torch.float32(1, 48, 152, 152) = ./aten::silu_#0(%[C3#52/Sequential#5/Bottleneck#2/Conv#3/6])
  C3#52/Sequential#5/Bottleneck#2:
        %[C3#52/Sequential#5/6] : torch.float32(1, 48, 152, 152) = ./aten::add#5(%[C3#52/13], %[./9])
  C3#52/Sequential#5/Bottleneck#3/Conv#2/Conv2d#2:
            %[C3#52/Sequential#5/Bottleneck#3/Conv#2/6] : torch.float32(1, 48, 152, 152) = ./aten::_convolution#20(%[C3#52/Sequential#5/6])
  C3#52/Sequential#5/Bottleneck#3/Conv#2/SiLU#3:
            %[C3#52/Sequential#5/Bottleneck#3/8] : torch.float32(1, 48, 152, 152) = ./aten::silu_#0(%[C3#52/Sequential#5/Bottleneck#3/Conv#2/6])
  C3#52/Sequential#5/Bottleneck#3/Conv#3/Conv2d#2:
            %[C3#52/Sequential#5/Bottleneck#3/Conv#3/6] : torch.float32(1, 48, 152, 152) = ./aten::_convolution#20(%[C3#52/Sequential#5/Bottleneck#3/8])
  C3#52/Sequential#5/Bottleneck#3/Conv#3/SiLU#3:
            %[C3#52/Sequential#5/Bottleneck#3/9] : torch.float32(1, 48, 152, 152) = ./aten::silu_#0(%[C3#52/Sequential#5/Bottleneck#3/Conv#3/6])
  C3#52/Sequential#5/Bottleneck#3:
        %[C3#52/14] : torch.float32(1, 48, 152, 152) = ./aten::add#5(%[C3#52/Sequential#5/6], %[./9])
  C3#52/Conv#6/Conv2d#2:
        %[C3#52/Conv#6/6] : torch.float32(1, 48, 152, 152) = ./aten::_convolution#20(%[4216])
  C3#52/Conv#6/SiLU#3:
        %[C3#52/15] : torch.float32(1, 48, 152, 152) = ./aten::silu_#0(%[C3#52/Conv#6/6])
  C3#52:
    %[./11] : torch.float32(1, 96, 152, 152) = ./aten::cat#9()
  C3#52/Conv#10/Conv2d#2:
        %[C3#52/Conv#10/6] : torch.float32(1, 96, 152, 152) = ./aten::_convolution#20(%[C3#52/11])
  C3#52/Conv#10/SiLU#3:
        %[4217] : torch.float32(1, 96, 152, 152) = ./aten::silu_#0(%[C3#52/Conv#10/6])
  Conv#53/Conv2d#2:
      %[Conv#53/6] : torch.float32(1, 192, 76, 76) = ./aten::_convolution#20(%[4217])
  Conv#53/SiLU#3:
      %[4218] : torch.float32(1, 192, 76, 76) = ./aten::silu_#0(%[Conv#53/6])
  C3#54/Conv#4/Conv2d#2:
        %[C3#54/Conv#4/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[4218])
  C3#54/Conv#4/SiLU#3:
        %[C3#54/13] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Conv#4/6])
  C3#54/Sequential#5/Bottleneck#6/Conv#2/Conv2d#2:
            %[C3#54/Sequential#5/Bottleneck#6/Conv#2/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#54/13])
  C3#54/Sequential#5/Bottleneck#6/Conv#2/SiLU#3:
            %[C3#54/Sequential#5/Bottleneck#6/8] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Sequential#5/Bottleneck#6/Conv#2/6])
  C3#54/Sequential#5/Bottleneck#6/Conv#3/Conv2d#2:
            %[C3#54/Sequential#5/Bottleneck#6/Conv#3/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#54/Sequential#5/Bottleneck#6/8])
  C3#54/Sequential#5/Bottleneck#6/Conv#3/SiLU#3:
            %[C3#54/Sequential#5/Bottleneck#6/9] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Sequential#5/Bottleneck#6/Conv#3/6])
  C3#54/Sequential#5/Bottleneck#6:
        %[C3#54/Sequential#5/14] : torch.float32(1, 96, 76, 76) = ./aten::add#5(%[C3#54/13], %[./9])
  C3#54/Sequential#5/Bottleneck#7/Conv#2/Conv2d#2:
            %[C3#54/Sequential#5/Bottleneck#7/Conv#2/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#54/Sequential#5/14])
  C3#54/Sequential#5/Bottleneck#7/Conv#2/SiLU#3:
            %[C3#54/Sequential#5/Bottleneck#7/8] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Sequential#5/Bottleneck#7/Conv#2/6])
  C3#54/Sequential#5/Bottleneck#7/Conv#3/Conv2d#2:
            %[C3#54/Sequential#5/Bottleneck#7/Conv#3/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#54/Sequential#5/Bottleneck#7/8])
  C3#54/Sequential#5/Bottleneck#7/Conv#3/SiLU#3:
            %[C3#54/Sequential#5/Bottleneck#7/9] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Sequential#5/Bottleneck#7/Conv#3/6])
  C3#54/Sequential#5/Bottleneck#7:
        %[C3#54/Sequential#5/15] : torch.float32(1, 96, 76, 76) = ./aten::add#5(%[C3#54/Sequential#5/14], %[./9])
  C3#54/Sequential#5/Bottleneck#8/Conv#2/Conv2d#2:
            %[C3#54/Sequential#5/Bottleneck#8/Conv#2/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#54/Sequential#5/15])
  C3#54/Sequential#5/Bottleneck#8/Conv#2/SiLU#3:
            %[C3#54/Sequential#5/Bottleneck#8/8] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Sequential#5/Bottleneck#8/Conv#2/6])
  C3#54/Sequential#5/Bottleneck#8/Conv#3/Conv2d#2:
            %[C3#54/Sequential#5/Bottleneck#8/Conv#3/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#54/Sequential#5/Bottleneck#8/8])
  C3#54/Sequential#5/Bottleneck#8/Conv#3/SiLU#3:
            %[C3#54/Sequential#5/Bottleneck#8/9] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Sequential#5/Bottleneck#8/Conv#3/6])
  C3#54/Sequential#5/Bottleneck#8:
        %[C3#54/Sequential#5/16] : torch.float32(1, 96, 76, 76) = ./aten::add#5(%[C3#54/Sequential#5/15], %[./9])
  C3#54/Sequential#5/Bottleneck#9/Conv#2/Conv2d#2:
            %[C3#54/Sequential#5/Bottleneck#9/Conv#2/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#54/Sequential#5/16])
  C3#54/Sequential#5/Bottleneck#9/Conv#2/SiLU#3:
            %[C3#54/Sequential#5/Bottleneck#9/8] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Sequential#5/Bottleneck#9/Conv#2/6])
  C3#54/Sequential#5/Bottleneck#9/Conv#3/Conv2d#2:
            %[C3#54/Sequential#5/Bottleneck#9/Conv#3/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#54/Sequential#5/Bottleneck#9/8])
  C3#54/Sequential#5/Bottleneck#9/Conv#3/SiLU#3:
            %[C3#54/Sequential#5/Bottleneck#9/9] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Sequential#5/Bottleneck#9/Conv#3/6])
  C3#54/Sequential#5/Bottleneck#9:
        %[C3#54/Sequential#5/17] : torch.float32(1, 96, 76, 76) = ./aten::add#5(%[C3#54/Sequential#5/16], %[./9])
  C3#54/Sequential#5/Bottleneck#10/Conv#2/Conv2d#2:
            %[C3#54/Sequential#5/Bottleneck#10/Conv#2/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#54/Sequential#5/17])
  C3#54/Sequential#5/Bottleneck#10/Conv#2/SiLU#3:
            %[C3#54/Sequential#5/Bottleneck#10/8] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Sequential#5/Bottleneck#10/Conv#2/6])
  C3#54/Sequential#5/Bottleneck#10/Conv#3/Conv2d#2:
            %[C3#54/Sequential#5/Bottleneck#10/Conv#3/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#54/Sequential#5/Bottleneck#10/8])
  C3#54/Sequential#5/Bottleneck#10/Conv#3/SiLU#3:
            %[C3#54/Sequential#5/Bottleneck#10/9] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Sequential#5/Bottleneck#10/Conv#3/6])
  C3#54/Sequential#5/Bottleneck#10:
        %[C3#54/Sequential#5/18] : torch.float32(1, 96, 76, 76) = ./aten::add#5(%[C3#54/Sequential#5/17], %[./9])
  C3#54/Sequential#5/Bottleneck#11/Conv#2/Conv2d#2:
            %[C3#54/Sequential#5/Bottleneck#11/Conv#2/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#54/Sequential#5/18])
  C3#54/Sequential#5/Bottleneck#11/Conv#2/SiLU#3:
            %[C3#54/Sequential#5/Bottleneck#11/8] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Sequential#5/Bottleneck#11/Conv#2/6])
  C3#54/Sequential#5/Bottleneck#11/Conv#3/Conv2d#2:
            %[C3#54/Sequential#5/Bottleneck#11/Conv#3/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#54/Sequential#5/Bottleneck#11/8])
  C3#54/Sequential#5/Bottleneck#11/Conv#3/SiLU#3:
            %[C3#54/Sequential#5/Bottleneck#11/9] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Sequential#5/Bottleneck#11/Conv#3/6])
  C3#54/Sequential#5/Bottleneck#11:
        %[C3#54/14] : torch.float32(1, 96, 76, 76) = ./aten::add#5(%[C3#54/Sequential#5/18], %[./9])
  C3#54/Conv#6/Conv2d#2:
        %[C3#54/Conv#6/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[4218])
  C3#54/Conv#6/SiLU#3:
        %[C3#54/15] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#54/Conv#6/6])
  C3#54:
    %[./11] : torch.float32(1, 192, 76, 76) = ./aten::cat#9()
  C3#54/Conv#10/Conv2d#2:
        %[C3#54/Conv#10/6] : torch.float32(1, 192, 76, 76) = ./aten::_convolution#20(%[C3#54/11])
  C3#54/Conv#10/SiLU#3:
        %[4219] : torch.float32(1, 192, 76, 76) = ./aten::silu_#0(%[C3#54/Conv#10/6])
  Conv#55/Conv2d#2:
      %[Conv#55/6] : torch.float32(1, 384, 38, 38) = ./aten::_convolution#20(%[4219])
  Conv#55/SiLU#3:
      %[4220] : torch.float32(1, 384, 38, 38) = ./aten::silu_#0(%[Conv#55/6])
  C3#56/Conv#4/Conv2d#2:
        %[C3#56/Conv#4/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[4220])
  C3#56/Conv#4/SiLU#3:
        %[C3#56/13] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Conv#4/6])
  C3#56/Sequential#5/Bottleneck#6/Conv#2/Conv2d#2:
            %[C3#56/Sequential#5/Bottleneck#6/Conv#2/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#56/13])
  C3#56/Sequential#5/Bottleneck#6/Conv#2/SiLU#3:
            %[C3#56/Sequential#5/Bottleneck#6/8] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Sequential#5/Bottleneck#6/Conv#2/6])
  C3#56/Sequential#5/Bottleneck#6/Conv#3/Conv2d#2:
            %[C3#56/Sequential#5/Bottleneck#6/Conv#3/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#56/Sequential#5/Bottleneck#6/8])
  C3#56/Sequential#5/Bottleneck#6/Conv#3/SiLU#3:
            %[C3#56/Sequential#5/Bottleneck#6/9] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Sequential#5/Bottleneck#6/Conv#3/6])
  C3#56/Sequential#5/Bottleneck#6:
        %[C3#56/Sequential#5/14] : torch.float32(1, 192, 38, 38) = ./aten::add#5(%[C3#56/13], %[./9])
  C3#56/Sequential#5/Bottleneck#7/Conv#2/Conv2d#2:
            %[C3#56/Sequential#5/Bottleneck#7/Conv#2/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#56/Sequential#5/14])
  C3#56/Sequential#5/Bottleneck#7/Conv#2/SiLU#3:
            %[C3#56/Sequential#5/Bottleneck#7/8] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Sequential#5/Bottleneck#7/Conv#2/6])
  C3#56/Sequential#5/Bottleneck#7/Conv#3/Conv2d#2:
            %[C3#56/Sequential#5/Bottleneck#7/Conv#3/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#56/Sequential#5/Bottleneck#7/8])
  C3#56/Sequential#5/Bottleneck#7/Conv#3/SiLU#3:
            %[C3#56/Sequential#5/Bottleneck#7/9] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Sequential#5/Bottleneck#7/Conv#3/6])
  C3#56/Sequential#5/Bottleneck#7:
        %[C3#56/Sequential#5/15] : torch.float32(1, 192, 38, 38) = ./aten::add#5(%[C3#56/Sequential#5/14], %[./9])
  C3#56/Sequential#5/Bottleneck#8/Conv#2/Conv2d#2:
            %[C3#56/Sequential#5/Bottleneck#8/Conv#2/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#56/Sequential#5/15])
  C3#56/Sequential#5/Bottleneck#8/Conv#2/SiLU#3:
            %[C3#56/Sequential#5/Bottleneck#8/8] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Sequential#5/Bottleneck#8/Conv#2/6])
  C3#56/Sequential#5/Bottleneck#8/Conv#3/Conv2d#2:
            %[C3#56/Sequential#5/Bottleneck#8/Conv#3/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#56/Sequential#5/Bottleneck#8/8])
  C3#56/Sequential#5/Bottleneck#8/Conv#3/SiLU#3:
            %[C3#56/Sequential#5/Bottleneck#8/9] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Sequential#5/Bottleneck#8/Conv#3/6])
  C3#56/Sequential#5/Bottleneck#8:
        %[C3#56/Sequential#5/16] : torch.float32(1, 192, 38, 38) = ./aten::add#5(%[C3#56/Sequential#5/15], %[./9])
  C3#56/Sequential#5/Bottleneck#9/Conv#2/Conv2d#2:
            %[C3#56/Sequential#5/Bottleneck#9/Conv#2/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#56/Sequential#5/16])
  C3#56/Sequential#5/Bottleneck#9/Conv#2/SiLU#3:
            %[C3#56/Sequential#5/Bottleneck#9/8] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Sequential#5/Bottleneck#9/Conv#2/6])
  C3#56/Sequential#5/Bottleneck#9/Conv#3/Conv2d#2:
            %[C3#56/Sequential#5/Bottleneck#9/Conv#3/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#56/Sequential#5/Bottleneck#9/8])
  C3#56/Sequential#5/Bottleneck#9/Conv#3/SiLU#3:
            %[C3#56/Sequential#5/Bottleneck#9/9] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Sequential#5/Bottleneck#9/Conv#3/6])
  C3#56/Sequential#5/Bottleneck#9:
        %[C3#56/Sequential#5/17] : torch.float32(1, 192, 38, 38) = ./aten::add#5(%[C3#56/Sequential#5/16], %[./9])
  C3#56/Sequential#5/Bottleneck#10/Conv#2/Conv2d#2:
            %[C3#56/Sequential#5/Bottleneck#10/Conv#2/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#56/Sequential#5/17])
  C3#56/Sequential#5/Bottleneck#10/Conv#2/SiLU#3:
            %[C3#56/Sequential#5/Bottleneck#10/8] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Sequential#5/Bottleneck#10/Conv#2/6])
  C3#56/Sequential#5/Bottleneck#10/Conv#3/Conv2d#2:
            %[C3#56/Sequential#5/Bottleneck#10/Conv#3/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#56/Sequential#5/Bottleneck#10/8])
  C3#56/Sequential#5/Bottleneck#10/Conv#3/SiLU#3:
            %[C3#56/Sequential#5/Bottleneck#10/9] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Sequential#5/Bottleneck#10/Conv#3/6])
  C3#56/Sequential#5/Bottleneck#10:
        %[C3#56/Sequential#5/18] : torch.float32(1, 192, 38, 38) = ./aten::add#5(%[C3#56/Sequential#5/17], %[./9])
  C3#56/Sequential#5/Bottleneck#11/Conv#2/Conv2d#2:
            %[C3#56/Sequential#5/Bottleneck#11/Conv#2/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#56/Sequential#5/18])
  C3#56/Sequential#5/Bottleneck#11/Conv#2/SiLU#3:
            %[C3#56/Sequential#5/Bottleneck#11/8] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Sequential#5/Bottleneck#11/Conv#2/6])
  C3#56/Sequential#5/Bottleneck#11/Conv#3/Conv2d#2:
            %[C3#56/Sequential#5/Bottleneck#11/Conv#3/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#56/Sequential#5/Bottleneck#11/8])
  C3#56/Sequential#5/Bottleneck#11/Conv#3/SiLU#3:
            %[C3#56/Sequential#5/Bottleneck#11/9] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Sequential#5/Bottleneck#11/Conv#3/6])
  C3#56/Sequential#5/Bottleneck#11:
        %[C3#56/14] : torch.float32(1, 192, 38, 38) = ./aten::add#5(%[C3#56/Sequential#5/18], %[./9])
  C3#56/Conv#6/Conv2d#2:
        %[C3#56/Conv#6/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[4220])
  C3#56/Conv#6/SiLU#3:
        %[C3#56/15] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#56/Conv#6/6])
  C3#56:
    %[./11] : torch.float32(1, 384, 38, 38) = ./aten::cat#9()
  C3#56/Conv#10/Conv2d#2:
        %[C3#56/Conv#10/6] : torch.float32(1, 384, 38, 38) = ./aten::_convolution#20(%[C3#56/11])
  C3#56/Conv#10/SiLU#3:
        %[4221] : torch.float32(1, 384, 38, 38) = ./aten::silu_#0(%[C3#56/Conv#10/6])
  Conv#57/Conv2d#2:
      %[Conv#57/6] : torch.float32(1, 768, 19, 19) = ./aten::_convolution#20(%[4221])
  Conv#57/SiLU#3:
      %[4222] : torch.float32(1, 768, 19, 19) = ./aten::silu_#0(%[Conv#57/6])
  SPP#58/Conv#8/Conv2d#2:
        %[SPP#58/Conv#8/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[4222])
  SPP#58/Conv#8/SiLU#3:
        %[SPP#58/18] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[SPP#58/Conv#8/6])
  SPP#58/MaxPool2d#9:
      %[SPP#58/19] : torch.float32(1, 384, 19, 19) = ./aten::max_pool2d#13(%[SPP#58/18])
  SPP#58/MaxPool2d#10:
      %[SPP#58/20] : torch.float32(1, 384, 19, 19) = ./aten::max_pool2d#13(%[SPP#58/18])
  SPP#58/MaxPool2d#11:
      %[SPP#58/21] : torch.float32(1, 384, 19, 19) = ./aten::max_pool2d#13(%[SPP#58/18])
  SPP#58:
    %[./16] : torch.float32(1, 1536, 19, 19) = ./aten::cat#14()
  SPP#58/Conv#15/Conv2d#2:
        %[SPP#58/Conv#15/6] : torch.float32(1, 768, 19, 19) = ./aten::_convolution#20(%[SPP#58/16])
  SPP#58/Conv#15/SiLU#3:
        %[4223] : torch.float32(1, 768, 19, 19) = ./aten::silu_#0(%[SPP#58/Conv#15/6])
  C3#59/Conv#4/Conv2d#2:
        %[C3#59/Conv#4/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[4223])
  C3#59/Conv#4/SiLU#3:
        %[C3#59/13] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[C3#59/Conv#4/6])
  C3#59/Sequential#5/Bottleneck#2/Conv#2/Conv2d#2:
            %[C3#59/Sequential#5/Bottleneck#2/Conv#2/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[C3#59/13])
  C3#59/Sequential#5/Bottleneck#2/Conv#2/SiLU#3:
            %[C3#59/Sequential#5/Bottleneck#2/6] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[C3#59/Sequential#5/Bottleneck#2/Conv#2/6])
  C3#59/Sequential#5/Bottleneck#2/Conv#3/Conv2d#2:
            %[C3#59/Sequential#5/Bottleneck#2/Conv#3/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[C3#59/Sequential#5/Bottleneck#2/6])
  C3#59/Sequential#5/Bottleneck#2/Conv#3/SiLU#3:
            %[C3#59/Sequential#5/6] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[C3#59/Sequential#5/Bottleneck#2/Conv#3/6])
  C3#59/Sequential#5/Bottleneck#3/Conv#2/Conv2d#2:
            %[C3#59/Sequential#5/Bottleneck#3/Conv#2/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[C3#59/Sequential#5/6])
  C3#59/Sequential#5/Bottleneck#3/Conv#2/SiLU#3:
            %[C3#59/Sequential#5/Bottleneck#3/6] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[C3#59/Sequential#5/Bottleneck#3/Conv#2/6])
  C3#59/Sequential#5/Bottleneck#3/Conv#3/Conv2d#2:
            %[C3#59/Sequential#5/Bottleneck#3/Conv#3/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[C3#59/Sequential#5/Bottleneck#3/6])
  C3#59/Sequential#5/Bottleneck#3/Conv#3/SiLU#3:
            %[C3#59/14] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[C3#59/Sequential#5/Bottleneck#3/Conv#3/6])
  C3#59/Conv#6/Conv2d#2:
        %[C3#59/Conv#6/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[4223])
  C3#59/Conv#6/SiLU#3:
        %[C3#59/15] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[C3#59/Conv#6/6])
  C3#59:
    %[./11] : torch.float32(1, 768, 19, 19) = ./aten::cat#9()
  C3#59/Conv#10/Conv2d#2:
        %[C3#59/Conv#10/6] : torch.float32(1, 768, 19, 19) = ./aten::_convolution#20(%[C3#59/11])
  C3#59/Conv#10/SiLU#3:
        %[4224] : torch.float32(1, 768, 19, 19) = ./aten::silu_#0(%[C3#59/Conv#10/6])
  Conv#60/Conv2d#2:
      %[Conv#60/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[4224])
  Conv#60/SiLU#3:
      %[4225] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[Conv#60/6])
  Upsample#61:
    %[4226] : torch.float32(1, 384, 38, 38) = ./aten::upsample_nearest2d#4(%[4225])
  Concat#62:
    %[4227] : torch.float32(1, 768, 38, 38) = ./aten::cat#2()
  C3#63/Conv#4/Conv2d#2:
        %[C3#63/Conv#4/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[4227])
  C3#63/Conv#4/SiLU#3:
        %[C3#63/13] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#63/Conv#4/6])
  C3#63/Sequential#5/Bottleneck#2/Conv#2/Conv2d#2:
            %[C3#63/Sequential#5/Bottleneck#2/Conv#2/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#63/13])
  C3#63/Sequential#5/Bottleneck#2/Conv#2/SiLU#3:
            %[C3#63/Sequential#5/Bottleneck#2/6] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#63/Sequential#5/Bottleneck#2/Conv#2/6])
  C3#63/Sequential#5/Bottleneck#2/Conv#3/Conv2d#2:
            %[C3#63/Sequential#5/Bottleneck#2/Conv#3/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#63/Sequential#5/Bottleneck#2/6])
  C3#63/Sequential#5/Bottleneck#2/Conv#3/SiLU#3:
            %[C3#63/Sequential#5/6] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#63/Sequential#5/Bottleneck#2/Conv#3/6])
  C3#63/Sequential#5/Bottleneck#3/Conv#2/Conv2d#2:
            %[C3#63/Sequential#5/Bottleneck#3/Conv#2/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#63/Sequential#5/6])
  C3#63/Sequential#5/Bottleneck#3/Conv#2/SiLU#3:
            %[C3#63/Sequential#5/Bottleneck#3/6] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#63/Sequential#5/Bottleneck#3/Conv#2/6])
  C3#63/Sequential#5/Bottleneck#3/Conv#3/Conv2d#2:
            %[C3#63/Sequential#5/Bottleneck#3/Conv#3/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#63/Sequential#5/Bottleneck#3/6])
  C3#63/Sequential#5/Bottleneck#3/Conv#3/SiLU#3:
            %[C3#63/14] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#63/Sequential#5/Bottleneck#3/Conv#3/6])
  C3#63/Conv#6/Conv2d#2:
        %[C3#63/Conv#6/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[4227])
  C3#63/Conv#6/SiLU#3:
        %[C3#63/15] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#63/Conv#6/6])
  C3#63:
    %[./11] : torch.float32(1, 384, 38, 38) = ./aten::cat#9()
  C3#63/Conv#10/Conv2d#2:
        %[C3#63/Conv#10/6] : torch.float32(1, 384, 38, 38) = ./aten::_convolution#20(%[C3#63/11])
  C3#63/Conv#10/SiLU#3:
        %[4228] : torch.float32(1, 384, 38, 38) = ./aten::silu_#0(%[C3#63/Conv#10/6])
  Conv#64/Conv2d#2:
      %[Conv#64/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[4228])
  Conv#64/SiLU#3:
      %[4229] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[Conv#64/6])
  Upsample#65:
    %[4230] : torch.float32(1, 192, 76, 76) = ./aten::upsample_nearest2d#4(%[4229])
  Concat#66:
    %[4231] : torch.float32(1, 384, 76, 76) = ./aten::cat#2()
  C3#67/Conv#4/Conv2d#2:
        %[C3#67/Conv#4/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[4231])
  C3#67/Conv#4/SiLU#3:
        %[C3#67/13] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#67/Conv#4/6])
  C3#67/Sequential#5/Bottleneck#2/Conv#2/Conv2d#2:
            %[C3#67/Sequential#5/Bottleneck#2/Conv#2/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#67/13])
  C3#67/Sequential#5/Bottleneck#2/Conv#2/SiLU#3:
            %[C3#67/Sequential#5/Bottleneck#2/6] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#67/Sequential#5/Bottleneck#2/Conv#2/6])
  C3#67/Sequential#5/Bottleneck#2/Conv#3/Conv2d#2:
            %[C3#67/Sequential#5/Bottleneck#2/Conv#3/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#67/Sequential#5/Bottleneck#2/6])
  C3#67/Sequential#5/Bottleneck#2/Conv#3/SiLU#3:
            %[C3#67/Sequential#5/6] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#67/Sequential#5/Bottleneck#2/Conv#3/6])
  C3#67/Sequential#5/Bottleneck#3/Conv#2/Conv2d#2:
            %[C3#67/Sequential#5/Bottleneck#3/Conv#2/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#67/Sequential#5/6])
  C3#67/Sequential#5/Bottleneck#3/Conv#2/SiLU#3:
            %[C3#67/Sequential#5/Bottleneck#3/6] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#67/Sequential#5/Bottleneck#3/Conv#2/6])
  C3#67/Sequential#5/Bottleneck#3/Conv#3/Conv2d#2:
            %[C3#67/Sequential#5/Bottleneck#3/Conv#3/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[C3#67/Sequential#5/Bottleneck#3/6])
  C3#67/Sequential#5/Bottleneck#3/Conv#3/SiLU#3:
            %[C3#67/14] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#67/Sequential#5/Bottleneck#3/Conv#3/6])
  C3#67/Conv#6/Conv2d#2:
        %[C3#67/Conv#6/6] : torch.float32(1, 96, 76, 76) = ./aten::_convolution#20(%[4231])
  C3#67/Conv#6/SiLU#3:
        %[C3#67/15] : torch.float32(1, 96, 76, 76) = ./aten::silu_#0(%[C3#67/Conv#6/6])
  C3#67:
    %[./11] : torch.float32(1, 192, 76, 76) = ./aten::cat#9()
  C3#67/Conv#10/Conv2d#2:
        %[C3#67/Conv#10/6] : torch.float32(1, 192, 76, 76) = ./aten::_convolution#20(%[C3#67/11])
  C3#67/Conv#10/SiLU#3:
        %[4232] : torch.float32(1, 192, 76, 76) = ./aten::silu_#0(%[C3#67/Conv#10/6])
  Conv#68/Conv2d#2:
      %[Conv#68/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[4232])
  Conv#68/SiLU#3:
      %[4233] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[Conv#68/6])
  Concat#69:
    %[4234] : torch.float32(1, 384, 38, 38) = ./aten::cat#2()
  C3#70/Conv#4/Conv2d#2:
        %[C3#70/Conv#4/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[4234])
  C3#70/Conv#4/SiLU#3:
        %[C3#70/13] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#70/Conv#4/6])
  C3#70/Sequential#5/Bottleneck#2/Conv#2/Conv2d#2:
            %[C3#70/Sequential#5/Bottleneck#2/Conv#2/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#70/13])
  C3#70/Sequential#5/Bottleneck#2/Conv#2/SiLU#3:
            %[C3#70/Sequential#5/Bottleneck#2/6] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#70/Sequential#5/Bottleneck#2/Conv#2/6])
  C3#70/Sequential#5/Bottleneck#2/Conv#3/Conv2d#2:
            %[C3#70/Sequential#5/Bottleneck#2/Conv#3/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#70/Sequential#5/Bottleneck#2/6])
  C3#70/Sequential#5/Bottleneck#2/Conv#3/SiLU#3:
            %[C3#70/Sequential#5/6] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#70/Sequential#5/Bottleneck#2/Conv#3/6])
  C3#70/Sequential#5/Bottleneck#3/Conv#2/Conv2d#2:
            %[C3#70/Sequential#5/Bottleneck#3/Conv#2/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#70/Sequential#5/6])
  C3#70/Sequential#5/Bottleneck#3/Conv#2/SiLU#3:
            %[C3#70/Sequential#5/Bottleneck#3/6] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#70/Sequential#5/Bottleneck#3/Conv#2/6])
  C3#70/Sequential#5/Bottleneck#3/Conv#3/Conv2d#2:
            %[C3#70/Sequential#5/Bottleneck#3/Conv#3/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[C3#70/Sequential#5/Bottleneck#3/6])
  C3#70/Sequential#5/Bottleneck#3/Conv#3/SiLU#3:
            %[C3#70/14] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#70/Sequential#5/Bottleneck#3/Conv#3/6])
  C3#70/Conv#6/Conv2d#2:
        %[C3#70/Conv#6/6] : torch.float32(1, 192, 38, 38) = ./aten::_convolution#20(%[4234])
  C3#70/Conv#6/SiLU#3:
        %[C3#70/15] : torch.float32(1, 192, 38, 38) = ./aten::silu_#0(%[C3#70/Conv#6/6])
  C3#70:
    %[./11] : torch.float32(1, 384, 38, 38) = ./aten::cat#9()
  C3#70/Conv#10/Conv2d#2:
        %[C3#70/Conv#10/6] : torch.float32(1, 384, 38, 38) = ./aten::_convolution#20(%[C3#70/11])
  C3#70/Conv#10/SiLU#3:
        %[4235] : torch.float32(1, 384, 38, 38) = ./aten::silu_#0(%[C3#70/Conv#10/6])
  Conv#71/Conv2d#2:
      %[Conv#71/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[4235])
  Conv#71/SiLU#3:
      %[4236] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[Conv#71/6])
  Concat#72:
    %[4237] : torch.float32(1, 768, 19, 19) = ./aten::cat#2()
  C3#73/Conv#4/Conv2d#2:
        %[C3#73/Conv#4/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[4237])
  C3#73/Conv#4/SiLU#3:
        %[C3#73/13] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[C3#73/Conv#4/6])
  C3#73/Sequential#5/Bottleneck#2/Conv#2/Conv2d#2:
            %[C3#73/Sequential#5/Bottleneck#2/Conv#2/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[C3#73/13])
  C3#73/Sequential#5/Bottleneck#2/Conv#2/SiLU#3:
            %[C3#73/Sequential#5/Bottleneck#2/6] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[C3#73/Sequential#5/Bottleneck#2/Conv#2/6])
  C3#73/Sequential#5/Bottleneck#2/Conv#3/Conv2d#2:
            %[C3#73/Sequential#5/Bottleneck#2/Conv#3/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[C3#73/Sequential#5/Bottleneck#2/6])
  C3#73/Sequential#5/Bottleneck#2/Conv#3/SiLU#3:
            %[C3#73/Sequential#5/6] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[C3#73/Sequential#5/Bottleneck#2/Conv#3/6])
  C3#73/Sequential#5/Bottleneck#3/Conv#2/Conv2d#2:
            %[C3#73/Sequential#5/Bottleneck#3/Conv#2/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[C3#73/Sequential#5/6])
  C3#73/Sequential#5/Bottleneck#3/Conv#2/SiLU#3:
            %[C3#73/Sequential#5/Bottleneck#3/6] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[C3#73/Sequential#5/Bottleneck#3/Conv#2/6])
  C3#73/Sequential#5/Bottleneck#3/Conv#3/Conv2d#2:
            %[C3#73/Sequential#5/Bottleneck#3/Conv#3/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[C3#73/Sequential#5/Bottleneck#3/6])
  C3#73/Sequential#5/Bottleneck#3/Conv#3/SiLU#3:
            %[C3#73/14] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[C3#73/Sequential#5/Bottleneck#3/Conv#3/6])
  C3#73/Conv#6/Conv2d#2:
        %[C3#73/Conv#6/6] : torch.float32(1, 384, 19, 19) = ./aten::_convolution#20(%[4237])
  C3#73/Conv#6/SiLU#3:
        %[C3#73/15] : torch.float32(1, 384, 19, 19) = ./aten::silu_#0(%[C3#73/Conv#6/6])
  C3#73:
    %[./11] : torch.float32(1, 768, 19, 19) = ./aten::cat#9()
  C3#73/Conv#10/Conv2d#2:
        %[C3#73/Conv#10/6] : torch.float32(1, 768, 19, 19) = ./aten::_convolution#20(%[C3#73/11])
  C3#73/Conv#10/SiLU#3:
        %[4238] : torch.float32(1, 768, 19, 19) = ./aten::silu_#0(%[C3#73/Conv#10/6])
  Detect#74/Conv2d#7:
      %[Detect#74/433] : torch.float32(1, 255, 76, 76) = ./aten::_convolution#20(%[4232])
  Detect#74:
    %[./11] : 1 = ./aten::size#9(%[./433])
    %[./13] : torch.int32() = ./aten::Int#11()
    %[./14] : torch.int32() = ./aten::Int#12()
    %[./19] : 76 = ./aten::size#14(%[./433])
    %[./21] : torch.int32() = ./aten::Int#16()
    %[./23] : 76 = ./aten::size#18(%[./433])
    %[./25] : torch.int32() = ./aten::Int#20()
    %[./29] : torch.float32(1, 3, 85, 76, 76) = ./aten::view#24(%[./433])
    %[./36] : torch.float32(1, 3, 76, 76, 85) = ./aten::permute#31(%[./29])
    %[./38] : torch.float32(1, 3, 76, 76, 85) = ./aten::contiguous#33(%[./36])
    %[./72] : torch.float32(1, 3, 76, 76, 85) = ./aten::sigmoid#35(%[./38])
    %[./77] : torch.float32(1, 3, 76, 76, 2) = ./aten::slice#40(%[./72])
    %[./79] : torch.float32(1, 3, 76, 76, 2) = ./aten::mul#42(%[./77])
    %[./82] : torch.float32(1, 3, 76, 76, 2) = ./aten::sub#45(%[./79])
    %[./84] : torch.float32(1, 3, 76, 76, 2) = ./aten::add#47(%[./82])
    %[./88] : torch.float32() = ./aten::select#51()
    %[./89] : torch.float32(1, 3, 76, 76, 2) = ./aten::mul#52(%[./84], %[./88])
    %[./94] : torch.float32(1, 3, 76, 76, 2) = ./aten::slice#57(%[./72])
    %[./100] : torch.float32(3, 76, 76, 2) = ./aten::view#63(%[./89])
    %[./108] : torch.float32(1, 3, 76, 76, 2) = ./aten::expand#71(%[./100])
    %[./110] : torch.float32(1, 3, 76, 76, 2) = ./aten::copy_#73(%[./94], %[./108])
    %[./115] : torch.float32(1, 3, 76, 76, 2) = ./aten::slice#78(%[./72])
    %[./117] : torch.float32(1, 3, 76, 76, 2) = ./aten::mul#80(%[./115])
    %[./119] : torch.float32(1, 3, 76, 76, 2) = ./aten::pow#82(%[./117])
    %[./122] : torch.float32(1, 3, 1, 1, 2) = ./aten::select#85()
    %[./123] : torch.float32(1, 3, 76, 76, 2) = ./aten::mul#86(%[./119], %[./122])
    %[./128] : torch.float32(1, 3, 76, 76, 2) = ./aten::slice#91(%[./72])
    %[./134] : torch.float32(3, 76, 76, 2) = ./aten::view#97(%[./123])
    %[./142] : torch.float32(1, 3, 76, 76, 2) = ./aten::expand#105(%[./134])
    %[./144] : torch.float32(1, 3, 76, 76, 2) = ./aten::copy_#107(%[./128], %[./142])
    %[./148] : torch.float32(1, 17328, 85) = ./aten::view#111(%[./72])
  Detect#74/Conv2d#112:
      %[Detect#74/434] : torch.float32(1, 255, 38, 38) = ./aten::_convolution#20(%[4235])
    %[./152] : 1 = ./aten::size#114(%[./434])
    %[./154] : torch.int32() = ./aten::Int#116()
    %[./155] : torch.int32() = ./aten::Int#117()
    %[./160] : 38 = ./aten::size#119(%[./434])
    %[./162] : torch.int32() = ./aten::Int#121()
    %[./164] : 38 = ./aten::size#123(%[./434])
    %[./166] : torch.int32() = ./aten::Int#125()
    %[./170] : torch.float32(1, 3, 85, 38, 38) = ./aten::view#129(%[./434])
    %[./177] : torch.float32(1, 3, 38, 38, 85) = ./aten::permute#136(%[./170])
    %[./179] : torch.float32(1, 3, 38, 38, 85) = ./aten::contiguous#138(%[./177])
    %[./213] : torch.float32(1, 3, 38, 38, 85) = ./aten::sigmoid#140(%[./179])
    %[./218] : torch.float32(1, 3, 38, 38, 2) = ./aten::slice#145(%[./213])
    %[./220] : torch.float32(1, 3, 38, 38, 2) = ./aten::mul#147(%[./218])
    %[./223] : torch.float32(1, 3, 38, 38, 2) = ./aten::sub#150(%[./220])
    %[./225] : torch.float32(1, 3, 38, 38, 2) = ./aten::add#152(%[./223])
    %[./228] : torch.float32() = ./aten::select#155()
    %[./229] : torch.float32(1, 3, 38, 38, 2) = ./aten::mul#156(%[./225], %[./228])
    %[./234] : torch.float32(1, 3, 38, 38, 2) = ./aten::slice#161(%[./213])
    %[./240] : torch.float32(3, 38, 38, 2) = ./aten::view#167(%[./229])
    %[./248] : torch.float32(1, 3, 38, 38, 2) = ./aten::expand#175(%[./240])
    %[./250] : torch.float32(1, 3, 38, 38, 2) = ./aten::copy_#177(%[./234], %[./248])
    %[./255] : torch.float32(1, 3, 38, 38, 2) = ./aten::slice#182(%[./213])
    %[./257] : torch.float32(1, 3, 38, 38, 2) = ./aten::mul#184(%[./255])
    %[./259] : torch.float32(1, 3, 38, 38, 2) = ./aten::pow#186(%[./257])
    %[./262] : torch.float32(1, 3, 1, 1, 2) = ./aten::select#189()
    %[./263] : torch.float32(1, 3, 38, 38, 2) = ./aten::mul#190(%[./259], %[./262])
    %[./268] : torch.float32(1, 3, 38, 38, 2) = ./aten::slice#195(%[./213])
    %[./274] : torch.float32(3, 38, 38, 2) = ./aten::view#201(%[./263])
    %[./282] : torch.float32(1, 3, 38, 38, 2) = ./aten::expand#209(%[./274])
    %[./284] : torch.float32(1, 3, 38, 38, 2) = ./aten::copy_#211(%[./268], %[./282])
    %[./288] : torch.float32(1, 4332, 85) = ./aten::view#215(%[./213])
  Detect#74/Conv2d#216:
      %[Detect#74/435] : torch.float32(1, 255, 19, 19) = ./aten::_convolution#20(%[4238])
    %[./292] : 1 = ./aten::size#218(%[./435])
    %[./294] : torch.int32() = ./aten::Int#220()
    %[./295] : torch.int32() = ./aten::Int#221()
    %[./300] : 19 = ./aten::size#223(%[./435])
    %[./302] : torch.int32() = ./aten::Int#225()
    %[./304] : 19 = ./aten::size#227(%[./435])
    %[./306] : torch.int32() = ./aten::Int#229()
    %[./310] : torch.float32(1, 3, 85, 19, 19) = ./aten::view#233(%[./435])
    %[./317] : torch.float32(1, 3, 19, 19, 85) = ./aten::permute#240(%[./310])
    %[./319] : torch.float32(1, 3, 19, 19, 85) = ./aten::contiguous#242(%[./317])
    %[./353] : torch.float32(1, 3, 19, 19, 85) = ./aten::sigmoid#244(%[./319])
    %[./358] : torch.float32(1, 3, 19, 19, 2) = ./aten::slice#249(%[./353])
    %[./360] : torch.float32(1, 3, 19, 19, 2) = ./aten::mul#251(%[./358])
    %[./363] : torch.float32(1, 3, 19, 19, 2) = ./aten::sub#254(%[./360])
    %[./365] : torch.float32(1, 3, 19, 19, 2) = ./aten::add#256(%[./363])
    %[./368] : torch.float32() = ./aten::select#259()
    %[./369] : torch.float32(1, 3, 19, 19, 2) = ./aten::mul#260(%[./365], %[./368])
    %[./374] : torch.float32(1, 3, 19, 19, 2) = ./aten::slice#265(%[./353])
    %[./380] : torch.float32(3, 19, 19, 2) = ./aten::view#271(%[./369])
    %[./388] : torch.float32(1, 3, 19, 19, 2) = ./aten::expand#279(%[./380])
    %[./390] : torch.float32(1, 3, 19, 19, 2) = ./aten::copy_#281(%[./374], %[./388])
    %[./395] : torch.float32(1, 3, 19, 19, 2) = ./aten::slice#286(%[./353])
    %[./397] : torch.float32(1, 3, 19, 19, 2) = ./aten::mul#288(%[./395])
    %[./399] : torch.float32(1, 3, 19, 19, 2) = ./aten::pow#290(%[./397])
    %[./402] : torch.float32(1, 3, 1, 1, 2) = ./aten::select#293()
    %[./403] : torch.float32(1, 3, 19, 19, 2) = ./aten::mul#294(%[./399], %[./402])
    %[./408] : torch.float32(1, 3, 19, 19, 2) = ./aten::slice#299(%[./353])
    %[./414] : torch.float32(3, 19, 19, 2) = ./aten::view#305(%[./403])
    %[./422] : torch.float32(1, 3, 19, 19, 2) = ./aten::expand#313(%[./414])
    %[./424] : torch.float32(1, 3, 19, 19, 2) = ./aten::copy_#315(%[./408], %[./422])
    %[./428] : torch.float32(1, 1083, 85) = ./aten::view#319(%[./353])
    %[./431] : torch.float32(1, 22743, 85) = ./aten::cat#322()
    %[4239] : (torch.float32(1, 3, 76, 76, 85), torch.float32(1, 3, 38, 38, 85), torch.float32(1, 3, 19, 19, 85), torch.float32(1, 22743, 85)) = ./prim::TupleConstruct#323(%[./38], %[./179], %[./319], %[./431])
  %[4211] : torch.float32(1, 3, 76, 76, 85), %[4212] : torch.float32(1, 3, 38, 38, 85), %[4213] : torch.float32(1, 3, 19, 19, 85), %[4214] : torch.float32(1, 22743, 85) = prim::TupleUnpack#75(%[4239])
  %[3088] : [torch.float32(1, 3, 76, 76, 85), torch.float32(1, 3, 38, 38, 85), torch.float32(1, 3, 19, 19, 85)] = prim::ListConstruct#76(%[4211], %[4212], %[4213])
  %[3089] : (torch.float32(1, 22743, 85), [torch.float32(1, 3, 76, 76, 85), torch.float32(1, 3, 38, 38, 85), torch.float32(1, 3, 19, 19, 85)]) = prim::TupleConstruct#77(%[4214], %[3088])
  return(%[3089] : (torch.float32(1, 22743, 85), [torch.float32(1, 3, 76, 76, 85), torch.float32(1, 3, 38, 38, 85), torch.float32(1, 3, 19, 19, 85)]))
; falling back to native python function call
ERROR:Neuron:3107
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/convert.py", line 448, in _convert_item
    item, inputs, compiler_workdir=sg_workdir, **kwargs)
  File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/decorators.py", line 194, in trace
    return create_runnable(metaneff, neff_ts, jit_trace, example_inputs, preprocessor, postprocessor, output_tensors)
  File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/decorators.py", line 313, in create_runnable
    neuron_trace = torch.jit.trace(neuron_function, example_inputs)
  File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py", line 779, in trace
    name, func, example_inputs, var_lookup_fn, strict, _force_outplace
  File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/decorators.py", line 312, in neuron_function
    return postprocessor(output_tensors)
  File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/decorators.py", line 1129, in __call__
    for value in node.inputs()]
  File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/decorators.py", line 1129, in <listcomp>
    for value in node.inputs()]
KeyError: 3107
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 304, compiled = 0, percent compiled = 0.0%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 12 [supported]
INFO:Neuron: => aten::_convolution: 86 [supported]
INFO:Neuron: => aten::add: 17 [supported]
INFO:Neuron: => aten::cat: 15 [supported]
INFO:Neuron: => aten::contiguous: 3 [supported]
INFO:Neuron: => aten::copy_: 6 [supported]
INFO:Neuron: => aten::expand: 6 [supported]
INFO:Neuron: => aten::max_pool2d: 3 [supported]
INFO:Neuron: => aten::mul: 12 [supported]
INFO:Neuron: => aten::permute: 3 [supported]
INFO:Neuron: => aten::pow: 3 [supported]
INFO:Neuron: => aten::select: 6 [supported]
INFO:Neuron: => aten::sigmoid: 3 [supported]
INFO:Neuron: => aten::silu: 83 [supported]
INFO:Neuron: => aten::size: 9 [supported]
INFO:Neuron: => aten::slice: 20 [supported]
INFO:Neuron: => aten::sub: 3 [supported]
INFO:Neuron: => aten::upsample_nearest2d: 2 [supported]
INFO:Neuron: => aten::view: 12 [supported]
@Ownmarc
Copy link
Author

Ownmarc commented Mar 24, 2021

Managed to get some of it to compile by adding the following code :

def subgraph_builder_function(node):
    return 'Detect' not in node.name
model_neuron = torch.neuron.trace(model, example_inputs=[fake_image], subgraph_builder_function=subgraph_builder_function)

But I'm not sure if its the way to go. I tried to figure out what was causing problem with the whole model compilation, but couldn't find it.

Here is the full trace of the model I managed to compile:

/home/ubuntu/yolov5/models/yolo.py:48: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if self.grid[i].shape[2:4] != x[i].shape[2:4]:
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py:940: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for `list`, use a `tuple` instead. for `dict`, use a `NamedTuple` instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior.
  _force_outplace,
INFO:Neuron:There are 3 ops of 1 different types in the TorchScript that are not compiled by neuron-cc: aten::_convolution, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-pytorch.md)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 304, fused = 207, percent fused = 68.09%
INFO:Neuron:compiling function _NeuronGraph$12969 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/bin/neuron-cc compile /tmp/tmpygd_jg_7/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpygd_jg_7/graph_def.neff --io-config {"inputs": {"tensor.1:0": [[1, 3, 608, 608], "float32"]}, "outputs": ["mul_66:0", "mul_74:0", "mul_82:0"]} --verbose 35'
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py:441: UserWarning: Neuron runtime cannot be initialized; falling back to CPU execution
Tensor output are ** NOT CALCULATED ** during CPU execution and only indicate tensor shape (Triggered internally at  /opt/workspace/KaenaPyTorchRuntime/neuron_op/neuron_op_impl.cpp:38.)
  outs = wrap_retval(mod(*_clone_inputs(inputs)))
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/graph.py:383: UserWarning: Neuron runtime cannot be initialized; falling back to CPU execution
Tensor output are ** NOT CALCULATED ** during CPU execution and only indicate tensor shape (Triggered internally at  /opt/workspace/KaenaPyTorchRuntime/neuron_op/neuron_op_impl.cpp:38.)
  return self.func(*inputs)
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 304, compiled = 207, percent compiled = 68.09%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 1 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 100.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron: => aten::_convolution: 83
INFO:Neuron: => aten::add: 14
INFO:Neuron: => aten::cat: 14
INFO:Neuron: => aten::max_pool2d: 3
INFO:Neuron: => aten::silu: 83
INFO:Neuron: => aten::slice: 8
INFO:Neuron: => aten::upsample_nearest2d: 2
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 12 [supported]
INFO:Neuron: => aten::_convolution: 3 [supported]
INFO:Neuron: => aten::add: 3 [supported]
INFO:Neuron: => aten::cat: 1 [supported]
INFO:Neuron: => aten::contiguous: 3 [supported]
INFO:Neuron: => aten::copy_: 6 [supported]
INFO:Neuron: => aten::expand: 6 [supported]
INFO:Neuron: => aten::mul: 12 [supported]
INFO:Neuron: => aten::permute: 3 [supported]
INFO:Neuron: => aten::pow: 3 [supported]
INFO:Neuron: => aten::select: 6 [supported]
INFO:Neuron: => aten::sigmoid: 3 [supported]
INFO:Neuron: => aten::size: 9 [supported]
INFO:Neuron: => aten::slice: 12 [supported]
INFO:Neuron: => aten::sub: 3 [supported]
INFO:Neuron: => aten::view: 12 [supported]

@Ownmarc
Copy link
Author

Ownmarc commented Mar 24, 2021

So I got Yolov5-x to run on an inf1 instance and did some speed test to see how it compares. This is some really basic test with a single image ran 10 time, here are my results on cpu (c5-xlarge), a 1080ti (my own computer) and an inf1-xlarge instance :

image

image

image

Here is the trace of the model compilation :

/home/ubuntu/yolov5/models/yolo.py:48: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if self.grid[i].shape[2:4] != x[i].shape[2:4]:
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py:940: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for `list`, use a `tuple` instead. for `dict`, use a `NamedTuple` instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior.
  _force_outplace,
INFO:Neuron:There are 3 ops of 1 different types in the TorchScript that are not compiled by neuron-cc: aten::_convolution, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-pytorch.md)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 414, fused = 317, percent fused = 76.57%
INFO:Neuron:compiling function _NeuronGraph$1426 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/bin/neuron-cc compile /tmp/tmphtcdzsru/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmphtcdzsru/graph_def.neff --io-config {"inputs": {"tensor.1:0": [[1, 3, 608, 608], "float32"]}, "outputs": ["mul_106:0", "mul_118:0", "mul_130:0"]} --verbose 35'
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py:441: UserWarning: Neuron runtime cannot be initialized; falling back to CPU execution
Tensor output are ** NOT CALCULATED ** during CPU execution and only indicate tensor shape (Triggered internally at  /opt/workspace/KaenaPyTorchRuntime/neuron_op/neuron_op_impl.cpp:38.)
  outs = wrap_retval(mod(*_clone_inputs(inputs)))
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/graph.py:383: UserWarning: Neuron runtime cannot be initialized; falling back to CPU execution
Tensor output are ** NOT CALCULATED ** during CPU execution and only indicate tensor shape (Triggered internally at  /opt/workspace/KaenaPyTorchRuntime/neuron_op/neuron_op_impl.cpp:38.)
  return self.func(*inputs)
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 414, compiled = 317, percent compiled = 76.57%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 1 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 100.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron: => aten::_convolution: 131
INFO:Neuron: => aten::add: 28
INFO:Neuron: => aten::cat: 14
INFO:Neuron: => aten::max_pool2d: 3
INFO:Neuron: => aten::silu: 131
INFO:Neuron: => aten::slice: 8
INFO:Neuron: => aten::upsample_nearest2d: 2
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 12 [supported]
INFO:Neuron: => aten::_convolution: 3 [supported]
INFO:Neuron: => aten::add: 3 [supported]
INFO:Neuron: => aten::cat: 1 [supported]
INFO:Neuron: => aten::contiguous: 3 [supported]
INFO:Neuron: => aten::copy_: 6 [supported]
INFO:Neuron: => aten::expand: 6 [supported]
INFO:Neuron: => aten::mul: 12 [supported]
INFO:Neuron: => aten::permute: 3 [supported]
INFO:Neuron: => aten::pow: 3 [supported]
INFO:Neuron: => aten::select: 6 [supported]
INFO:Neuron: => aten::sigmoid: 3 [supported]
INFO:Neuron: => aten::size: 9 [supported]
INFO:Neuron: => aten::slice: 12 [supported]
INFO:Neuron: => aten::sub: 3 [supported]
INFO:Neuron: => aten::view: 12 [supported]
/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py:940: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for `list`, use a `tuple` instead. for `dict`, use a `NamedTuple` instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior.
  _force_outplace,

@aws-zejdaj
Copy link
Contributor

Hi Marc, thanks for all the good inputs. As you noted, the model runs successfully when excluding the Detect layer. We expect our upcoming Neuron release to improve performance, and will update back when released so you can run the test again.

@Ownmarc
Copy link
Author

Ownmarc commented Mar 26, 2021

@aws-zejdaj thanks! Do you know this if model will 100% compile ? Also curious to know whats not compatible right now :)

@glenn-jocher
Copy link

glenn-jocher commented Mar 29, 2021

@aws-zejdaj @Ownmarc thanks for your efforts on getting YOLOv5 to operate correctly on the AWS Inferentia chips! If I can be of any help let me know, I'm the primary maintainer at https://github.com/ultralytics/yolov5.

If you have any specific feedback about the Detect() layer regarding what is causing incompatibilities we may be able to feed this info into future YOLOv5 design decisions as well, as part of our goal is ease of YOLOv5 deployment across the largest addressable markets.

To provide additional info, the YOLOv5 Detect() layer is the very last layer which combines multiple heads (P3/8-small, P4/16-medium, P5/32-large and optionally P6/64-xlarge) into a single output. The source is here. Note that this layer behaves differently during training and deployment, and that a YOLOv5 model.fuse() op will fuse batchnorm and nn.Conv2d() layers together typically before compilation.

If we wanted to incorporate torch.neuron export capability @Ownmarc then we could also update export.py for this.

https://github.com/ultralytics/yolov5/blob/2bf34f50fda2d5997f301364f9a0b196fa57117b/models/yolo.py#L24-L64

class Detect(nn.Module):
    stride = None  # strides computed during build
    export = False  # onnx export

    def __init__(self, nc=80, anchors=(), ch=()):  # detection layer
        super(Detect, self).__init__()
        self.nc = nc  # number of classes
        self.no = nc + 5  # number of outputs per anchor
        self.nl = len(anchors)  # number of detection layers
        self.na = len(anchors[0]) // 2  # number of anchors
        self.grid = [torch.zeros(1)] * self.nl  # init grid
        a = torch.tensor(anchors).float().view(self.nl, -1, 2)
        self.register_buffer('anchors', a)  # shape(nl,na,2)
        self.register_buffer('anchor_grid', a.clone().view(self.nl, 1, -1, 1, 1, 2))  # shape(nl,1,na,1,1,2)
        self.m = nn.ModuleList(nn.Conv2d(x, self.no * self.na, 1) for x in ch)  # output conv

    def forward(self, x):
        # x = x.copy()  # for profiling
        z = []  # inference output
        self.training |= self.export
        for i in range(self.nl):
            x[i] = self.m[i](x[i])  # conv
            bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
            x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()

            if not self.training:  # inference
                if self.grid[i].shape[2:4] != x[i].shape[2:4]:
                    self.grid[i] = self._make_grid(nx, ny).to(x[i].device)

                y = x[i].sigmoid()
                y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy
                y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh
                z.append(y.view(bs, -1, self.no))

        return x if self.training else (torch.cat(z, 1), x)

    @staticmethod
    def _make_grid(nx=20, ny=20):
        yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])
        return torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float()

@aws-zejdaj
Copy link
Contributor

Glenn, we'll be happy to discuss the best handling of the Detect layer, feel free to reach to us at [email protected]

@aws-renfu
Copy link

Hi Glenn, just want to let you know that this is a high priority item and we are actively working on this issue. Thanks!

@glenn-jocher
Copy link

@aws-zejdaj that's great news! We welcome PRs, so if you discover any useful improvements to the codebase at https://github.com/ultralytics/yolov5 we'd be happy to review and and integrate there also.

@glenn-jocher
Copy link

@Ownmarc @aws-zejdaj @aws-renfu good news 😃! This issue may now been fixed ✅ in PR #ultralytics/yolov5#2953. To receive this update you can:

  • git pull from within your yolov5/ directory
  • git clone https://github.com/ultralytics/yolov5 again
  • Force-reload PyTorch Hub: model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)

Thank you for spotting this issue and informing us of the problem. Please let us know if this update resolves the issue for you, and feel free to inform us of any other issues you discover or feature requests that come to mind. Happy trainings with YOLOv5 🚀!

@Ownmarc
Copy link
Author

Ownmarc commented May 1, 2021

Nice @glenn-jocher ! Thanks

@aws-joshim
Copy link
Contributor

Closing this ticket since the latest torch neuron release supports the ultralytics Yolo v5 model - https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/torch-neuron/torch-neuron.html#pytorch-neuron-rn

aws-wanhenr added a commit to aws-wanhenr/aws-neuron-sdk that referenced this issue Jun 25, 2021
…ons (aws-neuron#253)

* Add note for aws-neuron-dkms install to address multiple kernel versions

* fix troubleshooting guide reference

* clarify how to check if dkms is installed for the current kernel

* clarify dkms dependency on linux kernel
aws-mesharma pushed a commit that referenced this issue Jun 28, 2021
…ons (#253)

* Add note for aws-neuron-dkms install to address multiple kernel versions

* fix troubleshooting guide reference

* clarify how to check if dkms is installed for the current kernel

* clarify dkms dependency on linux kernel
@diazGT94
Copy link

diazGT94 commented Sep 23, 2021

Hi, I'm trying to replicate the steps indicated here to convert YoloV5s to neuron in inf1.

I am using Ubuntu 18.04 DLAMI. Activate the aws_neuron_pytorch_p36 python env

  1. Installed this: pip install -r https://raw.githubusercontent.com/ultralytics/yolov5/master/requirements.txt
  2. Then import from Pytorch-Hub: model = torch.hub.load('ultralytics/yolov5', 'yolov5s', force_reload=True)
  3. Create a Fake Image: fake_image = torch.zeros([1, 3, 608, 608], dtype=torch.float32)
  4. Model inspection: model_neuron_for_inspection = torch.neuron.trace(model, fake_image, skip_compiler=True)

But this gives me the following error:

Fusing layers... Model Summary: 224 layers, 7266973 parameters, 0 gradients Adding AutoShape... /home/ubuntu/.cache/torch/hub/ultralytics_yolov5_master/models/yolo.py:60: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic: /home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py:940: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for list, use a tupleinstead. fordict, use a NamedTupleinstead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior. _force_outplace, /home/ubuntu/.cache/torch/hub/ultralytics_yolov5_master/models/yolo.py:60: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if self.grid[i].shape[2:4] != x[i].shape[2:4] or self.onnx_dynamic: /home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py:940: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. forlist, use a tupleinstead. fordict, use a NamedTupleinstead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior. _force_outplace, Traceback (most recent call last): File "neuron_converter.py", line 11, in <module> model_neuron_for_inspection = torch.neuron.trace(model, fake_image, skip_compiler=True) File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/convert.py", line 103, in trace neuron_graph, jit_trace = to_graph(func, example_inputs, return_trace=True, **kwargs) File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/convert.py", line 182, in to_graph jit_trace = torch.jit.trace(func_or_mod, example_inputs, **kwargs) File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py", line 742, in trace _module_class, File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py", line 966, in trace_module _module_class, File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch/jit/_trace.py", line 519, in _check_trace raise TracingCheckError(*diag_info) torch.jit._trace.TracingCheckError: Tracing failed sanity checks! ERROR: Graphs differed across invocations!

Could you please guide me through how to perform the conversion for deploying it on Inf1? @Ownmarc

Thanks,

@Ownmarc
Copy link
Author

Ownmarc commented Sep 23, 2021

@diazGT94, I think I had this bug too when I was running torch.neuron.analyze_model. The first time I was running it, I was getting an exception and the second time I was running it it was working fine and then I could run torch.neuron.trace without any issue.

Try this :

fake_image = torch.zeros([1, 3, 608, 608], dtype=torch.float32)
try:
    torch.neuron.analyze_model(model, example_inputs=[fake_image])
except Exception:
    torch.neuron.analyze_model(model, example_inputs=[fake_image])

model_neuron = torch.neuron.trace(model, 
                                example_inputs=[fake_image])

@diazGT94
Copy link

diazGT94 commented Sep 23, 2021

@Ownmarc Thanks the code you provide me helped me. With the new upgrades from YoloV5 side is still required to define this and perform the conversion?

I just noticed that when the function below is not included in the conversion the detections doesn't perform well even if the parameters of the NMS are modified.

def subgraph_builder_function(node): return 'Detect' not in node.name

@diazGT94
Copy link

diazGT94 commented Dec 9, 2021

Hello, until a couple of weeks ago I was able to use the following structure to convert my custom models to neuron

import torch.neuron

def subgraph_builder_function(node):
    return 'Detect' not in node.name

model = torch.hub.load('ultralytics/yolov5', 'custom', path='best.pt',force_reload=True) 
fake_image = torch.zeros([1, 3, 416, 416], dtype=torch.float32)  #Need to be equal to the input size of the image
try:
    torch.neuron.trace(model, example_inputs=[fake_image],subgraph_builder_function=subgraph_builder_function)
except Exception:
    torch.neuron.trace(model, example_inputs=[fake_image],subgraph_builder_function=subgraph_builder_function)

model_neuron = torch.neuron.trace(model, example_inputs=[fake_image],subgraph_builder_function=subgraph_builder_function)

model_neuron.save('neuron_model.pt')  

However, today I tried to do it again but I got the following error:

INFO:Neuron:Compiling with command line: '/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/bin/neuron-cc compile /tmp/tmp3f_r9pcn/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp3f_r9pcn/graph_def.neff --io-config {"inputs": {"tensor:0": [[1, 3, 416, 416], "float32"]}, "outputs": ["tensor:0"]} --verbose 35'
Compiling with command line: '/home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/bin/neuron-cc compile /tmp/tmp3f_r9pcn/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp3f_r9pcn/graph_def.neff --io-config {"inputs": {"tensor:0": [[1, 3, 416, 416], "float32"]}, "outputs": ["tensor:0"]} --verbose 35'
.12/09/2021 04:26:36 PM ERROR [neuron-cc]: tensor tensor:0 appears in both input and output of --io-config

I can overcome the error by not parsing the subgraph_builder_function argument to the neuron trace function. However, when I tried to make the inference with a model created with this "hack" the predicted results had a terrible performance compared to the ones I used to had when I pass that argument. Should I downgrade my torch.neuron to get the previous results?

@glenn-jocher
Copy link

@diazGT94 this is likely related to ultralytics/yolov5#5845 from 5 days ago. A temporary workaround might be to drop down a level, i.e. use model.model instead of model, but I'll think of a better long term solution to revert the behavior to the original behavior.

@jpoberhauser
Copy link

@diazGT94 I was actually able to convert by running:

import torch
model_v5 = torch.hub.load(<path_to_local_yolov5>,
        'custom',
        path=model_path,
        source='local',
        force_reload=True)  # local repo
# Create an example input for compilation
image = torch.zeros([1, 3, 640, 480], dtype=torch.float32)
#get model trace
model_neuron = torch.neuron.trace(model_v5, example_inputs=[image])

@josebenitezg
Copy link

josebenitezg commented May 11, 2022

Hi!

I was able to convert the model from yolov5 to neuron with the follow code:

import torch
import torch_neuron
from torchvision import models

model = torch.hub.load('yolo5',
        'custom',
        path='yolov5.pt',
        source='local',
        force_reload=True)  # local repo

fake_image = torch.zeros([1, 3, 640, 640], dtype=torch.float32)
#fake_image = (torch.rand(3), torch.rand(3))
try:
    torch.neuron.analyze_model(model, example_inputs=[fake_image])
except Exception:
    torch.neuron.analyze_model(model, example_inputs=[fake_image])

model_neuron = torch.neuron.trace(model, 
                                example_inputs=[fake_image])

## Export to saved model
model_neuron.save("model_converted.pt")
Now that I am trying to test and compare I have the tensors outputs different from yolo as follow:

Neuron Yolov5 Model:

[tensor([[-0.0356,  0.1790,  0.7456,  0.6292,  0.9359, 13.0000],
        [ 0.5830,  0.1404,  1.1279,  0.6628,  0.9359, 13.0000],
        [ 0.0823,  0.6350,  0.6272,  1.1599,  0.9315, 13.0000],
        [-0.1443,  0.1416,  0.2542,  0.5107,  0.9224, 13.0000],
        [ 0.3516,  0.6426,  0.7500,  1.0137,  0.9188, 13.0000],
        [ 0.3555,  0.1436,  0.7539,  0.5127,  0.9147, 13.0000]])]

Yolov5:

[tensor([[334.57495, 176.98302, 407.46155, 213.81169,   0.93721,  13.00000]])]

Inference script:

im = cv2.imread('test_img.jpg')
img0 = im.copy()
im = cv2.resize(im, (640, 640), interpolation = cv2.INTER_AREA)
# Convert
im = im.transpose((2, 0, 1))[::-1]  # HWC to CHW, BGR to RGB
im = np.ascontiguousarray(im)
# Convert into torch
im = torch.from_numpy(im)
im = im.float()  # uint8 to fp16/32
im /= 255  # 0 - 255 to 0.0 - 1.0
if len(im.shape) == 3:
    im = im[None]  # expand for batch dim

# Load the compiled model
model = torch.jit.load('model_converted.pt')

# Inference
pred = model(im)
pred = non_max_suppression(pred) #nms function used same as yolov5 detect.py

#Process predictions
for i, det in enumerate(pred):  # per image
    im0 = img0.copy()
    color=(30, 30, 30)
    txt_color=(255, 255, 255)
    h_size, w_size = im.shape[-2:]
    print(h_size, w_size)
    lw = max(round(sum(im.shape) / 2 * 0.003), 2) 

    if len(det):
        # Write results
        for *xyxy, conf, cls in reversed(det):
            c = int(cls)  # integer class
            label = f'{CLASSES[c]} {conf:.2f}'
            print(label)
            box = xyxy 
            p1, p2 = (int(box[0]* w_size), int(box[1]* h_size)), (int(box[2]* w_size), int(box[3]* h_size))
            cv2.rectangle(im0, p1, p2, color, thickness=lw, lineType=cv2.LINE_AA)
            tf = max(lw - 1, 1)  # font thickness
            w, h = cv2.getTextSize(label, 0, fontScale=lw / 3, thickness=tf)[0]  # text width, height
            outside = p1[1] - h - 3 >= 0  # label fits outside box
            p2 = p1[0] + w, p1[1] - h - 3 if outside else p1[1] + h + 3
            cv2.rectangle(im0, p1, p2, color, -1, cv2.LINE_AA)  # filled
            cv2.putText(im0,
                        label, (p1[0], p1[1] - 2 if outside else p1[1] + h + 2),
                        0,
                        lw / 3,
                        txt_color,
                        thickness=tf,
                        lineType=cv2.LINE_AA)
    # Save results (image with detections)
    status = cv2.imwrite('out.jpg', im0)

Is there something wrong when converting the model or running inference? The label and also the acc seems to be same as the expected, but tensors not.

@jeffhataws
Copy link
Contributor

jeffhataws commented Jun 20, 2022

Is there something wrong when converting the model or running inference? The label and also the acc seems to be same as the expected, but tensors not.

Sorry @josebenitezg, we did not notice this new issue as it was posted in a closed issue. I have gone ahead and create a new github issue for you: #435 .

awsjoshir pushed a commit that referenced this issue May 2, 2023
…ons (#253)

* Add note for aws-neuron-dkms install to address multiple kernel versions

* fix troubleshooting guide reference

* clarify how to check if dkms is installed for the current kernel

* clarify dkms dependency on linux kernel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants