-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Relay] [Quantization] WIP - Protoyping the quantized convolution op #3367
Conversation
@tqchen @FrozenGene @jackwish @yzhliu @eqy @ZihengJiang @vinx13 |
Thank you for ping @anijain2305 and glad to see this draft. |
Thanks @jackwish |
Hey @anijain2305 , currently we use existing |
Hi @ZihengJiang Thanks for replying. Having this new op does not prevent the ability to perform AutoTVM. Infact, this new op act as a wrapper and gets lowered to existing Relay ops like conv, cast etc. The reason we need new ops is
Once these ops are lowered to existing Relay ops, all the relay and TVM optimizations are still applicable. Please find more discussion at |
src/relay/pass/quantize_rewrite.cc
Outdated
param->out_layout, | ||
Int(32)); | ||
// TODO(janimesh) - The out_dtype should come from outside.. | ||
int8_conv = Cast(int8_conv, param->out_dtype); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should name it q_conv2d_requant
or something is better. Here is not just simple cast, we will need input scale / kernel scale / output scale / output min / output max and so on to compute the correct uint8 value. Just use Cast will confuse us here is just cast simply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. This computation is not complete yet (I added a comment before but I dont thing it is visible enough).
I am looking into converting the scale computations into integer computations. I am having difficulty in understanding the rounding computations. Can you help with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may take this part of TFLite' conv (all stuff in that function) which is usually called Requantize semantically as reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will have a closer look this week and update the patch with the correct computation.
src/relay/pass/quantize_rewrite.cc
Outdated
CHECK_EQ(param->kernel_zero_point, 0) << "Only symmetric support yet"; | ||
CHECK_EQ(param->output_zero_point, 0) << "Only symmetric support yet"; | ||
// TODO(janimesh) - The out_dtype should be something else, like "int32". | ||
Expr int8_conv = Conv2D(quantized_data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is our data - input_zero_point
/ kernel - kernel_zero_point
computation? into quantized_data
/ quantized_kernel
? Because we will cast data / kernel be int16/ int32 to minus zero point, I care about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of points that can answer your question here
-
Currently, the rewrite of quantized_conv2d to a sequence of Relay operation is not complete. I was hoping to start the discussion of quantization flow, namespaces before we can start discussing the details of operations. I will work on this asynchronously, as I understand the computation. Please let me know more of your thoughts on the quantization flow, like namespaces, the infra usage for lowering the quantized ops into a series of Relay ops.
-
First PR would only support symmetric quantization. The asymmetric computation that has the
data - input_zero_point
computation will come later, once people agree on the quantization flow. See Line 44 - 46 that are limiting the scope to symmetric quantization. Highly possible, that I will need you help in lowering the asymmetric quantization convolution. But, we should not complicate this PR by including asymmetric quantization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Haven't noticed line 44 - 46 comment before.
9403ba6
to
70322db
Compare
fc487e6
to
155ccc1
Compare
3f2509e
to
cff0bdd
Compare
} | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit, unnecessary new lines.
|
||
/*! | ||
* \file nnvm/compiler/quantize_util.h | ||
* \brief Utility methods needs for quantized ops that can be shared |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nits -
s/needs/needed.
shared between frontends ?
|| is_Int16(dtype) || is_UInt16(dtype); | ||
} | ||
|
||
enum class QuantizeOpType : uint8_t { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some notes here on Quantize, Requantize and Quantize_Requantize might be appropriate. Alternatively a pointer to the python documentation might also be useful for the reader.
return -1; | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit, unnecessary new line.
|
||
This operator takes the quantized_weight as the convolution kernel | ||
and convolves it with quantized_data to produce an output quantized tensor. | ||
The scale of the output quantized tensor is the prodcut of the weight_scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/prodcut/product.
data_layout="NCHW", | ||
kernel_layout="OIHW", | ||
out_layout="", | ||
out_dtype="int32"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To self - Add documentation for out_dtype (accumuation accuracy etc)
<< " Only symmetric quantization supported for now."; | ||
|
||
if (param->input_zero_point == 0 && param->kernel_zero_point == 0) { | ||
Expr int8_conv = Conv2D(quantized_data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change name.
…apache#3367](apache#3367) In this PR I want to discuss the design and implementation of the - Quantize op -> FP32 to i8/u8 - Dequantize Op -> i8/u8 -> fp32 I have added test cases to verify the correctness of the ops.
…ntized op Features - New quantized conv2D and requantize op in Relay - Python API interface to instantiate the Relay op - Infer Type implemented - Lowering of quantized op to low-level Relay ops
Closing. Moving to #3580 |
Goal - Act as medium of discussion for pull request #2351
The patch only supports Symmetric quantization for now. The goal is to focus on infrastructure and not on the low-level details of what operations should be used for the computations. Once, we finalize the infra, we can tackle the low-level details.
Features
Discussion points
Missing implememtation
- Lowering of quantized conv into conv+cast is incomplete.
- Will work on it async. This is orthogonal to the discussion.
Thanks for contributing to TVM! Please refer to guideline https://docs.tvm.ai/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers.