-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BYOC] Added "include_non_call_ops" parameter to AnnotateTarget pass #6655
Conversation
Thanks for the PR. I have a hard deadline this weekend and will review it after next Tuesday. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! some minor comments.
This solution looks too ad hoc to me. From the semantic of the changes, I think this PR can be generalized to "improve the annotate target pass to annotate non-call ops to default". Accordingly, the API could be like AnnotateTarget(targets, skip_non_call_ops=False)(mod) # I am bad at naming but it is the point cc @zhiics |
@tqchen Hi, I am trying to compile Mobilnet (after expanded this patch to all non-call ops, not committed yet) and encountered a check on src/relay/backend/compile_engine.cc:170 The ConstantNode it fails is of TensorType: TensorType([1, 1, 1024, 1024], float32), and contains runtime.NDArray(0x74f0370) of corresponding shape (1x1x1024x1024). Could you explain the purpose of the check and should the check be extended to also accomodate ConstantNodes of TensorType. I am also interested to know the techniques to trace when and why this ConstantNode appeared. Thank you. cc: @comaniac |
Hey @d-smirnov have you tried with this change? #6912 |
@trevor-m Rebased, but it did not make any difference. Still fail on the check |
Added annotate_non_call_ops parameter to AnnotateTarget pass to prevent non-call to be promoted to previously annotated operations This is useful in case if you are not running MergeCompilerRegions pass after AnnotateTarget.
f22bf74
to
301bd38
Compare
Updated. Please take a look @comaniac, @mbaret, @manupa-arm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think we should have separate postprocess logic to deal with the subgraph (a.k.a. region) without call nodes instead of coupling it to AnnotateTarget (similar to the pruning pass in TensorRT integration). The AnnotateTarget pass is already too complicate to maintain, and that's not what we expected. On the other hand, we can maybe let this in first and plan another refactor.
In addition, since this PR may change the output of AnnotateTaret, you need to add corresponding tests to the follow-up passes (i.e., MergeCompilerRegion and GraphPartition).
@rohanmukh @codeislife99 please help check if this change affect any existing workloads.
op_expr_to_target_[new_expr] = op_expr_to_target_[expr]; | ||
const CallNode* call = expr.as<CallNode>(); | ||
if (op_expr_to_target_.find(expr) != op_expr_to_target_.end()) { | ||
// Check whether expr has args, if not - do not insert compiler_end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot connect this comment to the following logic. Could you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment related to this part (call && !call->args.empty()))
of the condition
// Already annotated. Recover target | ||
if (op_expr_to_target_.find(input_expr) == op_expr_to_target_.end()) { | ||
op_expr_to_target_[input_expr] = post.as<CallNode>()->attrs.as<CompilerAttrs>()->compiler; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you don't need the IF? Even input_expr
is already in op_expr_to_target_
, you can still override it, as suggested by the comment in L188. Accordingly, if you will override the target, you need InsertAnnotation
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is not the first invocation of the pass this branch supposed to restore targets from already annotated ops.
if (arg_target != default_target) { | ||
// annotated already | ||
return post; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you remove the feature that considers the target in existing annotation nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it is peeking first arg and if it is already annotated with non-default target it returns the node untouched, preserving the target.
supported_targets.push_back(default_target); // Make default as the last option. | ||
// Visit and mutate arguments after the target of this op has been determined. | ||
Call post_call = Downcast<Call>(post); | ||
if (pre->op->IsInstance<VarNode>()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate why a CallNode may have a VarNode as its op?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test case is tests/python/relay/test_pass_annotate_target.py::test_while_let
if (op_expr_to_target_.find(expr) != op_expr_to_target_.end()) { | ||
// Check whether expr has args, if not - do not insert compiler_end. | ||
if (expr->IsInstance<RefWriteNode>() || expr->IsInstance<RefCreateNode>() || | ||
expr->IsInstance<RefReadNode>() || expr->IsInstance<TupleNode>() || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There would be more nodes, like constructors. But I am still concerned if this changed is needed. This really makes this already complicated pass more complicated. I still don't see a good point why we don't run mergecompilerregions. That would solve this problem. Without running it, we would have a large number of small segments, which requires frequent data transfer between the host and device as well as frequent kernel launch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While running merge compiler regions helps cut down the regions, it also makes the external codegen's responsibility to allocate memory for intermediate tensors on those partitions. Thus, in the specific case of ACL, I think there is not much gained by such a merger as ACL would be implementing each ACL primitive operator and let tvm handle the memory allocation of the tensors passed onto external function. Moreover, the kernel launch overhead should also be minimal as it is running on the CPU (so the host and device is almost the same here). Also such a merger will also make the IO tensors live throughout the execution of external function while the space could be re-used if it was not merged.
The problem is the specification of the ACL did not indicate the simple regions (or non-call ops) to be annotated, thus annotate target here is doing something extra than it was asked.
I quite agree that this pass is complicated and needs breakdown. I guess that should be discussed in a RFC as to how it should look like. One direction maybe to take out the annotation of simple regions (non-call ops) as a seperate part ( I believe this was how it looked liked sometime back when it had something called AnnotateRestDefault until it got merged here :) ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment here : #5277
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it would be nice to have the RFC and list the options there
@comaniac Ping. How can we make some progress here? |
I still don't think we should make AnnotateTarget pass even more complicate, so I proposed to have a separate ACL specific pass to workaround this issue like TensorRT. Later we could have an RFC to aggregate those passes and make a single, general pass for all BOYC targets. cc @zhiics @mbaret @manupa-arm for comments. |
I don't agree that this should be separate, because the current behaviour of AnnotateTarget is simply incorrect. ACL does not support tuples, so AnnotateTarget should not mark them as supported. We shouldn't need a 'fix-up' pass after AnnotateTarget to make it correct, it's just a bug in AnnotateTarget that's come about from faulty reasoning about how codegens work. As this is user-facing and a critical error, I think the priority is accepting a fix with a refactor to reduce complexity coming later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but yes we need to specify the expected behaviour of this pass more formally. We'll follow up with an RFC to this effect in the new year.
Thanks everyone. |
…pache#6655) * [BYOC] Added annotate_non_call_ops parameter to AnnotateTarget pass Added annotate_non_call_ops parameter to AnnotateTarget pass to prevent non-call to be promoted to previously annotated operations This is useful in case if you are not running MergeCompilerRegions pass after AnnotateTarget. * linter * Tuple and TupleGetItem handling * resored transform.py, added missing tests to main * requested changes
…pache#6655) * [BYOC] Added annotate_non_call_ops parameter to AnnotateTarget pass Added annotate_non_call_ops parameter to AnnotateTarget pass to prevent non-call to be promoted to previously annotated operations This is useful in case if you are not running MergeCompilerRegions pass after AnnotateTarget. * linter * Tuple and TupleGetItem handling * resored transform.py, added missing tests to main * requested changes
…pache#6655) * [BYOC] Added annotate_non_call_ops parameter to AnnotateTarget pass Added annotate_non_call_ops parameter to AnnotateTarget pass to prevent non-call to be promoted to previously annotated operations This is useful in case if you are not running MergeCompilerRegions pass after AnnotateTarget. * linter * Tuple and TupleGetItem handling * resored transform.py, added missing tests to main * requested changes
This PR adds "include_non_call_ops" parameter to AnnotateTarget pass. When set (include_non_call_ops=True) the AnnotateTarget pass will annotate non call ops with default target or with the target of its arguments. This is current behavior of AnnotateTarget pass. When the flag is set to false (include_non_call_ops=False) the AnnotateTarget pass will not annotate non-call. This behavior is useful if you are not running MergeCompilerRegions pass after AnnotateTarget.
The PR related to an issue reported here: https://discuss.tvm.apache.org/t/arm-compute-library-segv-with-inception-v1-squeezenet/7985/8
@comaniac @manupa-arm @mbaret