-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QNN] Conv2D operator #3580
[QNN] Conv2D operator #3580
Conversation
4f6e6bf
to
52a3c2b
Compare
} | ||
} | ||
|
||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the Requantize and Legalize pass are going through final details, it will be useful to prefetch and look at QNN conv2d lowering. Please review and let me know your comments.
c35d314
to
020c123
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for ping. Suggesting to have if {NHWC} elif {NCHW} else {assert}
when handling different memory layout. And, I am not sure dividing the compute into 4 terms is optimization-friendly, but that's out of this PR scope I think.
ps. Only looked into part of this PR, will recheck when the requantize op were merged. Feel free to ping please.
python/tvm/relay/qnn/op/qnn.py
Outdated
input_scale of the input quantized tensors. The zero point of the output | ||
quantized tensor is 0. By default, the dtype of output is int32. Please also | ||
refer to Requantize operator to understand how to scale back the int32 | ||
ouptut to (u)int8. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks :)
python/tvm/relay/qnn/op/qnn.py
Outdated
out_dtype="int32"): | ||
r"""Quantized 2D convolution. | ||
|
||
This operator convolves quantized weight with quantized data. The scale of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it be convolves quantized data with quantized weight? I know they are basically the same though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that sounds better.
python/tvm/relay/qnn/op/qnn.py
Outdated
r"""Quantized 2D convolution. | ||
|
||
This operator convolves quantized weight with quantized data. The scale of | ||
the output quantized tensor is the product of the weight_scale and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to align the term weight
and kernel
to be one of them rather than mixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done to kernel
src/relay/op/tensor/reduce.cc
Outdated
@@ -33,36 +34,6 @@ | |||
|
|||
namespace tvm { | |||
namespace relay { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are moving code like this to headers, would it be better to have a dedicated PR which involves no extra functionality?
src/relay/qnn/op/convolution.cc
Outdated
|
||
const auto in_shape = get_shape(0); | ||
int batch_size, in_channels; | ||
// NCHW layout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check it? Maybe simply if NCHW else if NHWC else assert
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L354-L358
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I'd like to discuss the layout handling a bit (won't block the PR). To me, I always prefer code like if condition 1: path 1; elif condition 2: path 2: else (condition 3): path 3 or assert
, even if the input guarantees that condition 3 won't happen. I think the path 1
inside a if
section is easier to read, and the last assert
is to handle unexpected code typo. Anyway, that is out the scope of this PR :).
src/relay/qnn/op/convolution.cc
Outdated
|
||
const auto kernel_shape = get_shape(1); | ||
int out_channels, kernel_h, kernel_w; | ||
// OIHW layout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to input layout handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L354-L358
src/relay/qnn/op/convolution.cc
Outdated
Array<IndexExpr> pad_w({param->padding[1], param->padding[1]}); | ||
|
||
Array<Array<IndexExpr>> pad_width; | ||
pad_width = {pad_n, pad_c, pad_h, pad_w}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Layout check and handling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L354-L358
bc5b780
to
699e34c
Compare
@u99127 @FrozenGene @jackwish @tqchen This is ready for review. |
@tqchen |
Pinging again in case this was missed @u99127 @FrozenGene @jackwish @tqchen |
03cc5a1
to
cadb1e5
Compare
cadb1e5
to
8baa53e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI reports that there is merge conflicts, so request changes still...
It is very glad that we are reaching this, the conditional optimization is really beneficial. Thank you for the impressive work @anijain2305 !
src/relay/qnn/op/convolution.cc
Outdated
} | ||
auto reduced_t3 = Sum(Cast(weight, Int(32)), axes_t3, false, false); | ||
|
||
// Find the newshape depenging on NCHW/NHWC layout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it depending?
src/relay/qnn/op/convolution.cc
Outdated
|
||
const auto in_shape = get_shape(0); | ||
int batch_size, in_channels; | ||
// NCHW layout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I'd like to discuss the layout handling a bit (won't block the PR). To me, I always prefer code like if condition 1: path 1; elif condition 2: path 2: else (condition 3): path 3 or assert
, even if the input guarantees that condition 3 won't happen. I think the path 1
inside a if
section is easier to read, and the last assert
is to handle unexpected code typo. Anyway, that is out the scope of this PR :).
f5cedbd
to
434f40d
Compare
@jackwish Thanks for the good words :) I incorporated your comments. Can you please review again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Glad to participate in this, thank you for the great work!
434f40d
to
7773c32
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, only left nit comment.
@@ -415,6 +415,71 @@ static inline Expr Full(Expr fill_value, | |||
return CallNode::make(op, {fill_value}, Attrs(attrs), {}); | |||
} | |||
|
|||
static inline Expr Conv2D(Expr data, Expr weight, Array<IndexExpr> strides, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks this is the same as MakeConv2d
, right?
If this is true, should we just keep one signature instead of having duplication. I am not strongly against this because it's obviously used by other cases as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I kept it to follow other usecases. I guess this repetition might be because typically TVM does not want header and implementation linking problems. Will keep it Conv2D for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong feeling about this, actually I'm not sure why we prefer to copying this (and the others) here instead of adding declarations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think linking should be fine. We can put TVM_DLL if needed. But anyway, we can keep it this way for now.
src/relay/qnn/op/convolution.cc
Outdated
|
||
/*! | ||
* Copyright (c) 2019 by Contributors | ||
* \file nn.cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong file name.
src/relay/qnn/op/convolution.cc
Outdated
*/ | ||
WorkloadType GetWorkload(const Array<tvm::relay::Type>& arg_types, const QnnConv2DAttrs* param) { | ||
// Get conv parameters. | ||
auto get_shape = [&](const Type& type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't actually need to capture anything here, right?
src/relay/qnn/op/convolution.cc
Outdated
// Since, this is integer division (floor), we can first multiply the data by the pool_size and | ||
// then perform avg_pool2d. Reversing this causes inaccuracy due to floor division. | ||
auto scaled_hw_t2 = Multiply(casted_t2, MakeConstantScalar(Int(32), kernel_h * kernel_w)); | ||
Array<IndexExpr> padding; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just use?
Array<IndexExpr> padding({0, 0})
from tvm.relay.testing import create_workload | ||
from tvm.contrib import graph_runtime | ||
|
||
def run_infer_type(expr): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run_infer_type
could be obtained from tvm.relay.testing as well.
7773c32
to
f2e7ec4
Compare
@@ -415,6 +415,71 @@ static inline Expr Full(Expr fill_value, | |||
return CallNode::make(op, {fill_value}, Attrs(attrs), {}); | |||
} | |||
|
|||
static inline Expr Conv2D(Expr data, Expr weight, Array<IndexExpr> strides, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong feeling about this, actually I'm not sure why we prefer to copying this (and the others) here instead of adding declarations
Rebasing. Empty commit. Clang-format styling.
f2e7ec4
to
d3a54c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Rebasing. Empty commit. Clang-format styling.
Rebasing. Empty commit. Clang-format styling.
Rebasing. Empty commit. Clang-format styling.
Rebasing. Empty commit. Clang-format styling.
Lowering of QNN Conv2D operation. We break the convolution into 4 terms as described in Option 1 here. Other relevant discussion is present at #2351