Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] a bug about onnx aten::index_put. #13759

Open
july8023 opened this issue Jan 11, 2023 · 16 comments
Open

[Bug] a bug about onnx aten::index_put. #13759

july8023 opened this issue Jan 11, 2023 · 16 comments
Labels
needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type: bug

Comments

@july8023
Copy link

The reason for this problem is the second input --indices cannot a dynamic shape tensor.

self._construct_nodes(graph)
File "/home/workspace/tvm/python/frontend/onnx.py", line 6567, in _construct_nodes
op = self._convert_operator(op_name, inputs, attr, self.opset)
File "/home/workspace/tvm/python/frontend/onnx.py", line 6686, in _convert_operator
sym = convert_map[op_name](inputs, attrs, self._params)
File "/home/workspace/tvm/python/frontend/onnx.py", line 4194, in _impl_v1
return cls._op_dispatch(operator, inputs, attr, params)
File "/home/workspace/tvm/python/frontend/onnx.py", line 4042, in _op_dispatch
return op_map[operator](inputs, attr, params)
File "/home/workspace/tvm/python/frontend/onnx.py", line 4140, in _index_put
indices, values = cls._check_index(inputs[1 : len(inputs) - 2], inputs[len(inputs) - 2])
File "/home/workspace/tvm/python/frontend/onnx.py", line 4134, in _check_index
return unfolding_indices(indices, values)
File "/home/workspace/tvm/python//frontend/onnx.py", line 4127, in unfolding_indices
_op.repeat(_op.tile(flatten_indices[i], (tile_size[i],)), repeat_size[i], 0)
File "/home/workspace/tvm/python/tvm/relay/op/transform.py", line 665, in repeat
return _make.repeat(data, repeats, axis)
File "/home/workspace/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in call
raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
[bt] (5) /home/workspace/tvm/build/libtvm.so(TVMFuncCall+0x63) [0x7f32be9d7123]
[bt] (4) /home/workspace/tvm/build/libtvm.so(tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::RelayExpr (tvm::RelayExpr, int, int)>::AssignTypedLambda<tvm::RelayExpr ()(tvm::RelayExpr, int, int)>(tvm::RelayExpr ()(tvm::RelayExpr, int, int), std::cxx11::basic_string<char, std::char_traits, std::allocator >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)+0x1fe) [0x7f32be2fa1be]
[bt] (3) /home/workspace/tvm/build/libtvm.so(tvm::runtime::TVMMovableArgValueWithContext
::operator int() const+0x28) [0x7f32bd0962b8]
[bt] (2) /home/workspace/tvm/build/libtvm.so(tvm::runtime::TVMPODValue
::operator int() const+0x194) [0x7f32bcfc1944]
[bt] (1) /home/workspace/tvm/build/libtvm.so(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x45) [0x7f32bcc5815b]
[bt] (0) /home/workspace/tvm/build/libtvm.so(tvm::runtime::Backtraceabi:cxx11+0x22) [0x7f32be9f7892]
File "/home/workspace/tvm/include/tvm/runtime/packed_func.h", line 777
TVMError: In function relay.op._make.repeat(0: RelayExpr, 1: int, 2: int) -> RelayExpr: error while converting argument 1: [17:09:32] /home/workspace/tvm/include/tvm/runtime/packed_func.h:562:

An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html

Check failed: type_code_ == kDLInt (8 vs. 0) : expected int but got Object

@july8023 july8023 added needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type: bug labels Jan 11, 2023
@masahi
Copy link
Member

masahi commented Jan 11, 2023

Please provide a reproducible script. Otherwise it is not clear what the issue is, so I'll close it.

@july8023
Copy link
Author

this is my demo extract from my network. @masahi


import torch 
from pdb import set_trace as d
import onnx
from tvm import relay

class TestIndexPut(torch.nn.Module): 
    def __init__(self):
        super(TestIndexPut, self).__init__()
 
    def forward(self,batch_src_corr_points,indices0,indices1,src_corr_points):
        batch_src_corr_points.index_put_([indices0, indices1], src_corr_points)
        return batch_src_corr_points
 
 
model = TestIndexPut()

dummy_batch_src_corr_points = torch.rand(136888, 3).float()

dummy_indices0 = torch.rand(50168,3).long()

dummy_indices1 = torch.rand(50168,3).long()

dummy_src_corr_points = torch.rand(50168,3).float()


dummy_output = model(dummy_batch_src_corr_points,dummy_indices0,dummy_indices1,dummy_src_corr_points) 

input_names=["dummy_batch_src_corr_points","dummy_indices0","dummy_indices1","dummy_src_corr_points"]
output_names=["batch_src_corr_points"]

dynamic_axes  = {
"dummy_batch_src_corr_points":{0:"bscp_shape0",1:"bscp_shape1"},
"dummy_indices0":{0:"i0_shape0",1:"i0_shape1"},
"dummy_indices1":{0:"i1_shape0",1:"i1_shape1"},
"dummy_src_corr_points":{0:"o_shape0",1:"o_shape1"}
}
with torch.no_grad():
    torch.onnx.export(model.eval(), (dummy_batch_src_corr_points,dummy_indices0,dummy_indices1,dummy_src_corr_points), 'TestIndexPut.onnx',
    input_names = input_names,output_names=output_names,opset_version=11,dynamic_axes=dynamic_axes,operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK) 

m = onnx.load('TestIndexPut.onnx')
relay.frontend.onnx.from_onnx(m)


@july8023
Copy link
Author

@liaojianjin Sorry to bother your. I have a problem with Indexput. It can't support dynamic shape. Do you have any ideas to solve this problem?

@liaojianjin
Copy link
Contributor

@honghuichao Thanks for your report.
I can't reproduce this error in my environment (pytorch == 1.13.1, onnx == 1.12.0, tvm commit is d8833bd), the converted IR is shown at the end.
On the other hand, there seems to be no dynamic shape tensor in the demo.

  %0 = add(%dummy_indices0, %dummy_indices1) /* ty=Tensor[(?, ?), int64] span=/Add:0:0 */;
  %1 = shape_of(%0, dtype="int64") /* ty=Tensor[(2), int64] span=/Shape:0:0 */;
  %2 = equal(%1, meta[relay.Constant][0] /* ty=Tensor[(2), int64] span=/Mul:0:0 */) /* ty=Tensor[(2), bool] span=/Equal:0:0 */;
  %3 = shape_of(%dummy_indices0, dtype="int64") /* ty=Tensor[(2), int64] span=/Expand:0:0 */;
  %4 = where(%2, meta[relay.Constant][1] /* ty=Tensor[(2), int64] span=/ConstantOfShape:0:0 */, %1) /* ty=Tensor[(2), int64] span=/Where:0:0 */;
  %5 = maximum(%3, %4) /* ty=Tensor[(2), int64] span=/Expand:0:0 */;
  %6 = dyn.broadcast_to(%dummy_indices0, %5, shape=None) /* ty=Tensor[(?, ?), int64] span=/Expand:0:0 */;
  %7 = equal(%1, meta[relay.Constant][2] /* ty=Tensor[(2), int64] span=/Mul_1:0:0 */) /* ty=Tensor[(2), bool] span=/Equal_1:0:0 */;
  %8 = shape_of(%dummy_indices1, dtype="int64") /* ty=Tensor[(2), int64] span=/Expand_1:0:0 */;
  %9 = where(%7, meta[relay.Constant][3] /* ty=Tensor[(2), int64] span=/ConstantOfShape_1:0:0 */, %1) /* ty=Tensor[(2), int64] span=/Where_1:0:0 */;
  %10 = maximum(%8, %9) /* ty=Tensor[(2), int64] span=/Expand_1:0:0 */;
  %11 = dyn.broadcast_to(%dummy_indices1, %10, shape=None) /* ty=Tensor[(?, ?), int64] span=/Expand_1:0:0 */;
  %12 = expand_dims(%6, axis=-1) /* ty=Tensor[(?, ?, 1), int64] span=/Unsqueeze:0:0 */;
  %13 = expand_dims(%11, axis=-1) /* ty=Tensor[(?, ?, 1), int64] span=/Unsqueeze_1:0:0 */;
  %14 = (%12, %13) /* ty=(Tensor[(?, ?, 1), int64], Tensor[(?, ?, 1), int64]) span=/Concat:0:0 */;
  %15 = concatenate(%14, axis=-1) /* ty=Tensor[(?, ?, 2), int64] span=/Concat:0:0 */;
  %16 = shape_of(%dummy_batch_src_corr_points, dtype="int64") /* ty=Tensor[(2), int64] span=/Shape_3:0:0 */;
  %17 = strided_slice(%16, begin=[2i64], end=[9223372036854775807i64], strides=[1i64], axes=[0i64]) /* ty=Tensor[(0), int64] span=/Slice:0:0 */;
  %18 = (%1, %17) /* ty=(Tensor[(2), int64], Tensor[(0), int64]) span=/Concat_1:0:0 */;
  %19 = concatenate(%18) /* ty=Tensor[(2), int64] span=/Concat_1:0:0 */;
  %20 = transpose(%15, axes=[2, 0, 1]) /* ty=Tensor[(2, ?, ?), int64] span=/ScatterND:0:0 */;
  %21 = dyn.reshape(%dummy_src_corr_points, %19, newshape=[]) /* ty=Tensor[(?, ?), float32] span=/Reshape:0:0 */;
  scatter_nd(%dummy_batch_src_corr_points, %20, %21, mode="update") /* ty=Tensor[(?, ?), float32] span=/ScatterND:0:0 */
}

, {})

@july8023
Copy link
Author

@liaojianjin

@classmethod
def _index_put(cls, inputs, attr, params):
    in_tensor = inputs[0]
    indices, values = cls._check_index(inputs[1 : len(inputs) - 2], inputs[len(inputs) - 2])
    accumulate = inputs[len(inputs) - 1].data.asnumpy() != 0
    if not accumulate:
        mode = "update"
    else:
        mode = "add"
    index_tensor = _op.stack(indices, axis=0)
    return _op.transform.scatter_nd(in_tensor, index_tensor, values, mode)

def _check_index(cls, indices, values):
    def unfolding_indices(indices, values):
        n = len(indices)
        flatten_indices = []
        slices_size = []
        for index in indices:
            flatten_indices.append(_op.reshape(index, _op.const([-1])))
            slices_size.append(infer_shape(flatten_indices[-1])[0])
        repeat_size = [1]
        tile_size = [1]
        for i in range(1, n):
            repeat_size.append(slices_size[-i] * repeat_size[-1])
            tile_size.append(slices_size[i - 1] * tile_size[-1])
        repeat_size.reverse()
        unflod_slices = []
        for i in range(n):
            unflod_slices.append(
                fold_constant(
                    _op.repeat(_op.tile(flatten_indices[i], (tile_size[i],)), repeat_size[i], 0)
                )
            )
        return unflod_slices, _op.reshape(values, _op.const([-1]))

in the function _check_index(cls, indices, values) ,it will call infer_shape to get the indices size,but the indice size cannot inference when shape is dynamatic. my environment base pytorch == 1.8.0. and I think if torch.onnx.export base pytorch1.13.1, the aten::indexput cannot show in onnx graph,but maybe scatter_nd show in onnx graph.

@liaojianjin
Copy link
Contributor

liaojianjin commented Feb 24, 2023

@honghuichao You can check the shape of index tensors before infer_shape() and return (indices, value) early with flattened index tensors and value tensor(in pytorch).

@july8023
Copy link
Author

Thank you for your suggestion.
But this op is too much in my torch code. so that I can't modify it one by one.

If you don't have time to solve this problem, I want to try to solve it.
It seems there are differ between torch.index_put_ and onnx:aten::index_put . But I can't search relevant materials about onnx::aten::indexput from pytorch documents(web site). Do you know the definition of onnx::aten::indexput. and how can I get.

@liaojianjin
Copy link
Contributor

You can find how to export index_put node in here @_onnx_symbolic("aten::index_put").

If your index is slice, it will be converted to tensor in ReshapeToAdvancedIndexingFormat.

You can provide the demo onnx model to let me help you with this problem, I can't downgrade pytorch to 1.8 at the moment.

@july8023
Copy link
Author

@liaojianjin Thanks your warmth extremely!
this has already perplexed me for a long time!

I constructed the following network structure using onnx api:

from onnx import helper
from onnx import TensorProto

dummy_batch_src_corr_points = helper.make_tensor_value_info("dummy_batch_src_corr_points",TensorProto.FLOAT,["bscp_shape0","bscp_shape1"])
dummy_indices0 = helper.make_tensor_value_info("dummy_indices0",TensorProto.FLOAT,["i0_shape0","i0_shape1"])
dummy_indices1 = helper.make_tensor_value_info("dummy_indices1",TensorProto.FLOAT,["i1_shape0","i1_shape1"])
dummy_src_corr_points = helper.make_tensor_value_info("dummy_src_corr_points",TensorProto.FLOAT,["o_shape0","o_shape1"])

dummy_output = helper.make_tensor_value_info("output",TensorProto.FLOAT,["o_shape0","o_shape1"])

c4 = helper.make_node("Constant",[],["4"],value=helper.make_tensor("4",TensorProto.INT32,(1,),(0,)))
c5 = helper.make_node("Constant",[],["5"],value=helper.make_tensor("5",TensorProto.INT32,(1,),(-1,)))
c7 = helper.make_node("Constant",[],["7"],value=helper.make_tensor("7",TensorProto.INT32,(1,),(-1,)))
reshape6 = helper.make_node("Reshape",["dummy_indices0","5"],["6"])
reshape8 = helper.make_node("Reshape",["dummy_indices1","7"],["8"])
out = helper.make_node("ATen",["dummy_batch_src_corr_points","6","8","dummy_src_corr_points","4"],["output"],operator="index_put")
graph_def = helper.make_graph([c4,c5,c7,reshape6,reshape8,out],
"index_put",
[dummy_batch_src_corr_points,
dummy_indices0,
dummy_indices1,
dummy_src_corr_points],
[dummy_output])
model_def = helper.make_model(graph_def)
onnx.checker.check_model(model_def)
onnx.save(model_def,"index_put.onnx")

@liaojianjin
Copy link
Contributor

@honghuichao You can try this commit first, I will create a PR for tvm.
liaojianjin@f8a0dd3

@july8023
Copy link
Author

@liaojianjin thank you. But when I tested the whole network, I found that “cuda: an illegal memory access was encountered”

@liaojianjin
Copy link
Contributor

@honghuichao Can you find out why it happened? Or the input and output of index_put. I only flatten the index tensor now.

@july8023
Copy link
Author

july8023 commented Feb 28, 2023

@liaojianjin
I tested the above phenomenon is random, and I don't know why. I'm testing and trying to solve.

But another case is failed in new code.

from onnx import helper
from onnx import TensorProto

dummy_batch_src_corr_points = helper.make_tensor_value_info("dummy_batch_src_corr_points",TensorProto.FLOAT,["bscp_shape0","bscp_shape1"])

dummy_indices0 = helper.make_tensor_value_info("dummy_indices0",TensorProto.FLOAT,["i0_shape0"])
dummy_indices1 = helper.make_tensor_value_info("dummy_indices1",TensorProto.FLOAT,[1,])

dummy_src_corr_points = helper.make_tensor_value_info("dummy_src_corr_points",TensorProto.FLOAT,["v_shape0",])

dummy_output = helper.make_tensor_value_info("output",TensorProto.FLOAT,["o_shape0","o_shape1"])

c4 = helper.make_node("Constant",[],["4"],value=helper.make_tensor("4",TensorProto.INT32,(2,),(-1,1)))
c5 = helper.make_node("Constant",[],["5"],value=helper.make_tensor("5",TensorProto.INT32,(1,),(-1,)))
reshape_indices0 = helper.make_node("Reshape",["dummy_indices0","4"],["6"])
reshape_indices1 = helper.make_node("Reshape",["dummy_indices1","5"],["7"])
op_type = helper.make_node("Constant",[],["op_type"],value=helper.make_tensor("op_type",TensorProto.INT32,(1,),(0,)))
out = helper.make_node("ATen",["dummy_batch_src_corr_points","6","7","dummy_src_corr_points","op_type"],["output"],operator="index_put")
graph_def = helper.make_graph([c4,c5,reshape_indices0,reshape_indices1,op_type,out],
"index_put",
[dummy_batch_src_corr_points,
dummy_indices0,
dummy_indices1,
dummy_src_corr_points],
[dummy_output])
model_def = helper.make_model(graph_def)

onnx.checker.check_model(model_def)
onnx.save(model_def,"index_put_new.onnx")

@liaojianjin
Copy link
Contributor

@honghuichao It may need to repeat the value of index1 to match the size of index0. But the size of index0 is unknown.

@july8023
Copy link
Author

@liaojianjin I think we can use "relay.shape_of " to get index0's shape . But I don't know how to repeat the value of index1.

@liaojianjin
Copy link
Contributor

@honghuichao You can try relay.take and relay.repeat to repeat the value. The final solution takes time because it needs to consider different types of indexes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type: bug
Projects
None yet
Development

No branches or pull requests

3 participants