Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BYOC] Added "include_non_call_ops" parameter to AnnotateTarget pass #6655

Merged
merged 5 commits into from
Dec 17, 2020

Conversation

d-smirnov
Copy link
Contributor

@d-smirnov d-smirnov commented Oct 9, 2020

This PR adds "include_non_call_ops" parameter to AnnotateTarget pass. When set (include_non_call_ops=True) the AnnotateTarget pass will annotate non call ops with default target or with the target of its arguments. This is current behavior of AnnotateTarget pass. When the flag is set to false (include_non_call_ops=False) the AnnotateTarget pass will not annotate non-call. This behavior is useful if you are not running MergeCompilerRegions pass after AnnotateTarget.

The PR related to an issue reported here: https://discuss.tvm.apache.org/t/arm-compute-library-segv-with-inception-v1-squeezenet/7985/8

@comaniac @manupa-arm @mbaret

@comaniac
Copy link
Contributor

comaniac commented Oct 9, 2020

Thanks for the PR. I have a hard deadline this weekend and will review it after next Tuesday.

@tqchen tqchen changed the base branch from master to main October 11, 2020 18:16
Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! some minor comments.

src/relay/transforms/annotate_target.cc Outdated Show resolved Hide resolved
src/relay/transforms/annotate_target.cc Outdated Show resolved Hide resolved
src/relay/transforms/annotate_target.cc Outdated Show resolved Hide resolved
src/relay/transforms/annotate_target.cc Outdated Show resolved Hide resolved
src/relay/transforms/annotate_target.cc Outdated Show resolved Hide resolved
tests/python/relay/test_pass_annotate_target.py Outdated Show resolved Hide resolved
tests/python/relay/test_pass_annotate_target.py Outdated Show resolved Hide resolved
tests/python/relay/test_pass_annotate_target.py Outdated Show resolved Hide resolved
@comaniac
Copy link
Contributor

This solution looks too ad hoc to me. From the semantic of the changes, I think this PR can be generalized to "improve the annotate target pass to annotate non-call ops to default". Accordingly, the API could be like

AnnotateTarget(targets, skip_non_call_ops=False)(mod) # I am bad at naming but it is the point

cc @zhiics

@d-smirnov
Copy link
Contributor Author

@tqchen Hi, I am trying to compile Mobilnet (after expanded this patch to all non-call ops, not committed yet) and encountered a check on src/relay/backend/compile_engine.cc:170
ICHECK(op->is_scalar());

The ConstantNode it fails is of TensorType: TensorType([1, 1, 1024, 1024], float32), and contains runtime.NDArray(0x74f0370) of corresponding shape (1x1x1024x1024). Could you explain the purpose of the check and should the check be extended to also accomodate ConstantNodes of TensorType. I am also interested to know the techniques to trace when and why this ConstantNode appeared.

Thank you. cc: @comaniac

@trevor-m
Copy link
Contributor

@tqchen Hi, I am trying to compile Mobilnet (after expanded this patch to all non-call ops, not committed yet) and encountered a check on src/relay/backend/compile_engine.cc:170
ICHECK(op->is_scalar());

The ConstantNode it fails is of TensorType: TensorType([1, 1, 1024, 1024], float32), and contains runtime.NDArray(0x74f0370) of corresponding shape (1x1x1024x1024). Could you explain the purpose of the check and should the check be extended to also accomodate ConstantNodes of TensorType. I am also interested to know the techniques to trace when and why this ConstantNode appeared.

Thank you. cc: @comaniac

Hey @d-smirnov have you tried with this change? #6912

@d-smirnov
Copy link
Contributor Author

d-smirnov commented Nov 15, 2020

Hey @d-smirnov have you tried with this change? #6912

@trevor-m Rebased, but it did not make any difference. Still fail on the check src/relay/backend/compile_engine.cc:170

Added annotate_non_call_ops parameter to AnnotateTarget pass to prevent
non-call to be promoted to previously annotated operations
This is useful in case if you are not running MergeCompilerRegions
pass after AnnotateTarget.
@d-smirnov
Copy link
Contributor Author

Updated. Please take a look @comaniac, @mbaret, @manupa-arm

@d-smirnov d-smirnov changed the title [BYOC] Added default_tuples parameter to AnnotateTarget pass [BYOC] Added "include_non_call_ops" parameter to AnnotateTarget pass Nov 20, 2020
Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should have separate postprocess logic to deal with the subgraph (a.k.a. region) without call nodes instead of coupling it to AnnotateTarget (similar to the pruning pass in TensorRT integration). The AnnotateTarget pass is already too complicate to maintain, and that's not what we expected. On the other hand, we can maybe let this in first and plan another refactor.

In addition, since this PR may change the output of AnnotateTaret, you need to add corresponding tests to the follow-up passes (i.e., MergeCompilerRegion and GraphPartition).

@rohanmukh @codeislife99 please help check if this change affect any existing workloads.

cc @zhiics @anijain2305

tests/python/relay/test_pass_annotate_target.py Outdated Show resolved Hide resolved
tests/python/relay/test_pass_annotate_target.py Outdated Show resolved Hide resolved
src/relay/transforms/annotate_target.cc Show resolved Hide resolved
src/relay/transforms/annotate_target.cc Show resolved Hide resolved
src/relay/transforms/annotate_target.cc Outdated Show resolved Hide resolved
op_expr_to_target_[new_expr] = op_expr_to_target_[expr];
const CallNode* call = expr.as<CallNode>();
if (op_expr_to_target_.find(expr) != op_expr_to_target_.end()) {
// Check whether expr has args, if not - do not insert compiler_end.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot connect this comment to the following logic. Could you elaborate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment related to this part (call && !call->args.empty())) of the condition

Comment on lines +188 to +191
// Already annotated. Recover target
if (op_expr_to_target_.find(input_expr) == op_expr_to_target_.end()) {
op_expr_to_target_[input_expr] = post.as<CallNode>()->attrs.as<CompilerAttrs>()->compiler;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you don't need the IF? Even input_expr is already in op_expr_to_target_, you can still override it, as suggested by the comment in L188. Accordingly, if you will override the target, you need InsertAnnotation.

Copy link
Contributor Author

@d-smirnov d-smirnov Nov 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is not the first invocation of the pass this branch supposed to restore targets from already annotated ops.

Comment on lines +203 to +205
if (arg_target != default_target) {
// annotated already
return post;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you remove the feature that considers the target in existing annotation nodes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it is peeking first arg and if it is already annotated with non-default target it returns the node untouched, preserving the target.

supported_targets.push_back(default_target); // Make default as the last option.
// Visit and mutate arguments after the target of this op has been determined.
Call post_call = Downcast<Call>(post);
if (pre->op->IsInstance<VarNode>()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate why a CallNode may have a VarNode as its op?

Copy link
Contributor Author

@d-smirnov d-smirnov Nov 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case is tests/python/relay/test_pass_annotate_target.py::test_while_let

if (op_expr_to_target_.find(expr) != op_expr_to_target_.end()) {
// Check whether expr has args, if not - do not insert compiler_end.
if (expr->IsInstance<RefWriteNode>() || expr->IsInstance<RefCreateNode>() ||
expr->IsInstance<RefReadNode>() || expr->IsInstance<TupleNode>() ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There would be more nodes, like constructors. But I am still concerned if this changed is needed. This really makes this already complicated pass more complicated. I still don't see a good point why we don't run mergecompilerregions. That would solve this problem. Without running it, we would have a large number of small segments, which requires frequent data transfer between the host and device as well as frequent kernel launch.

Copy link
Contributor

@manupak manupak Nov 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While running merge compiler regions helps cut down the regions, it also makes the external codegen's responsibility to allocate memory for intermediate tensors on those partitions. Thus, in the specific case of ACL, I think there is not much gained by such a merger as ACL would be implementing each ACL primitive operator and let tvm handle the memory allocation of the tensors passed onto external function. Moreover, the kernel launch overhead should also be minimal as it is running on the CPU (so the host and device is almost the same here). Also such a merger will also make the IO tensors live throughout the execution of external function while the space could be re-used if it was not merged.

The problem is the specification of the ACL did not indicate the simple regions (or non-call ops) to be annotated, thus annotate target here is doing something extra than it was asked.

I quite agree that this pass is complicated and needs breakdown. I guess that should be discussed in a RFC as to how it should look like. One direction maybe to take out the annotation of simple regions (non-call ops) as a seperate part ( I believe this was how it looked liked sometime back when it had something called AnnotateRestDefault until it got merged here :) ).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment here : #5277

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it would be nice to have the RFC and list the options there

@d-smirnov
Copy link
Contributor Author

@comaniac Ping. How can we make some progress here?

@comaniac
Copy link
Contributor

comaniac commented Dec 9, 2020

@comaniac Ping. How can we make some progress here?

I still don't think we should make AnnotateTarget pass even more complicate, so I proposed to have a separate ACL specific pass to workaround this issue like TensorRT. Later we could have an RFC to aggregate those passes and make a single, general pass for all BOYC targets.

cc @zhiics @mbaret @manupa-arm for comments.

@mbaret
Copy link
Contributor

mbaret commented Dec 10, 2020

I don't agree that this should be separate, because the current behaviour of AnnotateTarget is simply incorrect. ACL does not support tuples, so AnnotateTarget should not mark them as supported. We shouldn't need a 'fix-up' pass after AnnotateTarget to make it correct, it's just a bug in AnnotateTarget that's come about from faulty reasoning about how codegens work. As this is user-facing and a critical error, I think the priority is accepting a fix with a refactor to reduce complexity coming later.

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per offline discussion, we will let this PR in first. Afterwards, @mbaret will send an RFC discussing the desire behavior of AnnotateTarget.

@mbaret please merge this PR if that works for you. Thanks.

Copy link
Contributor

@mbaret mbaret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but yes we need to specify the expected behaviour of this pass more formally. We'll follow up with an RFC to this effect in the new year.

@mbaret mbaret merged commit 4060b4f into apache:main Dec 17, 2020
@mbaret
Copy link
Contributor

mbaret commented Dec 17, 2020

Thanks everyone.

TusharKanekiDey pushed a commit to TusharKanekiDey/tvm that referenced this pull request Jan 20, 2021
…pache#6655)

* [BYOC] Added annotate_non_call_ops parameter to AnnotateTarget pass

Added annotate_non_call_ops parameter to AnnotateTarget pass to prevent
non-call to be promoted to previously annotated operations
This is useful in case if you are not running MergeCompilerRegions
pass after AnnotateTarget.

* linter

* Tuple and TupleGetItem handling

* resored transform.py, added missing tests to main

* requested changes
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Jan 21, 2021
…pache#6655)

* [BYOC] Added annotate_non_call_ops parameter to AnnotateTarget pass

Added annotate_non_call_ops parameter to AnnotateTarget pass to prevent
non-call to be promoted to previously annotated operations
This is useful in case if you are not running MergeCompilerRegions
pass after AnnotateTarget.

* linter

* Tuple and TupleGetItem handling

* resored transform.py, added missing tests to main

* requested changes
electriclilies pushed a commit to electriclilies/tvm that referenced this pull request Feb 18, 2021
…pache#6655)

* [BYOC] Added annotate_non_call_ops parameter to AnnotateTarget pass

Added annotate_non_call_ops parameter to AnnotateTarget pass to prevent
non-call to be promoted to previously annotated operations
This is useful in case if you are not running MergeCompilerRegions
pass after AnnotateTarget.

* linter

* Tuple and TupleGetItem handling

* resored transform.py, added missing tests to main

* requested changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants