[Relay] [Quantization] WIP - Protoyping the quantized convolution op #3367

anijain2305 · 2019-06-13T23:15:28Z

Goal - Act as medium of discussion for pull request #2351

The patch only supports Symmetric quantization for now. The goal is to focus on infrastructure and not on the low-level details of what operations should be used for the computations. Once, we finalize the infra, we can tackle the low-level details.

Features

New quantized conv2D op in Relay
Python API interface to instantiate the Relay op
Infer Type implemented
Lowering of quantized_conv op to low-level Relay ops

Discussion points

Does the namespace look correct?
- Relay op is called 'relay.op.nn._quantize.quantized_conv2d'
- Idea is that any op under '_quantize' namespace will go through rewrite.
How to tackle redundant code? Should we reuse Conv2DRel and Conv2DAttrs?
- Tried protoyping. Found it hard to derive from Conv2DAttr struct
- Infer Type has a param field. This need to come from the right datatype.

Missing implememtation
- Lowering of quantized conv into conv+cast is incomplete.
- Will work on it async. This is orthogonal to the discussion.

Thanks for contributing to TVM! Please refer to guideline https://docs.tvm.ai/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers.

anijain2305 · 2019-06-13T23:18:14Z

@tqchen @FrozenGene @jackwish @yzhliu @eqy @ZihengJiang @vinx13
Your reviews will be very helpful :)

zhenhuaw-me · 2019-06-14T03:54:14Z

@tqchen @FrozenGene @jackwish @yzhliu @eqy @ZihengJiang @vinx13
Your reviews will be very helpful :)

Thank you for ping @anijain2305 and glad to see this draft.
Personally, I'd like comment only on generic quantization stuff such as algorithms, arithmetics (and ecosystems maybe), but not TVM stack things, so not much from me currently. :)

anijain2305 · 2019-06-14T04:15:00Z

Personally, I'd like comment only on generic quantization stuff such as algorithms, arithmetics (and ecosystems maybe), but not TVM stack things, so not much from me currently. :)

Thanks @jackwish
I am pretty sure that we will need your expertise as soon as we have a TVM stack setup in a decent manner :) I will ping you once we have that

ZihengJiang · 2019-06-14T18:54:57Z

Hey @anijain2305 , currently we use existing conv2d op with out_dtype for the quantization so that we can utilize the autotvm etc. Any benefit to propose a new op?

anijain2305 · 2019-06-14T23:08:46Z

Hey @anijain2305 , currently we use existing conv2d op with out_dtype for the quantization so that we can utilize the autotvm etc. Any benefit to propose a new op?

Hi @ZihengJiang Thanks for replying. Having this new op does not prevent the ability to perform AutoTVM. Infact, this new op act as a wrapper and gets lowered to existing Relay ops like conv, cast etc.

The reason we need new ops is

The compute of quantized conv2d is different than normal conv2d. The difference becomes quite large in asymmetric. Plan is to have a _quantize namespace with these new quantized ops, that will be lowered to existing Relay ops.
One can write a convertor from framework quantized graphs to Relay using these new wrappers.

Once these ops are lowered to existing Relay ops, all the relay and TVM optimizations are still applicable.

Please find more discussion at

FrozenGene · 2019-06-15T13:37:41Z

src/relay/pass/quantize_rewrite.cc

+                          param->out_layout,
+                          Int(32));
+  // TODO(janimesh) - The out_dtype should come from outside..
+  int8_conv = Cast(int8_conv, param->out_dtype);


Maybe we should name it q_conv2d_requant or something is better. Here is not just simple cast, we will need input scale / kernel scale / output scale / output min / output max and so on to compute the correct uint8 value. Just use Cast will confuse us here is just cast simply.

Yes, I agree. This computation is not complete yet (I added a comment before but I dont thing it is visible enough).

I am looking into converting the scale computations into integer computations. I am having difficulty in understanding the rounding computations. Can you help with that?

You may take this part of TFLite' conv (all stuff in that function) which is usually called Requantize semantically as reference.

Thanks, I will have a closer look this week and update the patch with the correct computation.

FrozenGene · 2019-06-15T13:40:52Z

src/relay/pass/quantize_rewrite.cc

+  CHECK_EQ(param->kernel_zero_point, 0) << "Only symmetric support yet";
+  CHECK_EQ(param->output_zero_point, 0) << "Only symmetric support yet";
+  // TODO(janimesh) - The out_dtype should be something else, like "int32".
+  Expr int8_conv = Conv2D(quantized_data,


Where is our data - input_zero_point / kernel - kernel_zero_point computation? into quantized_data / quantized_kernel? Because we will cast data / kernel be int16/ int32 to minus zero point, I care about this.

There are a couple of points that can answer your question here

Currently, the rewrite of quantized_conv2d to a sequence of Relay operation is not complete. I was hoping to start the discussion of quantization flow, namespaces before we can start discussing the details of operations. I will work on this asynchronously, as I understand the computation. Please let me know more of your thoughts on the quantization flow, like namespaces, the infra usage for lowering the quantized ops into a series of Relay ops.

First PR would only support symmetric quantization. The asymmetric computation that has the data - input_zero_point computation will come later, once people agree on the quantization flow. See Line 44 - 46 that are limiting the scope to symmetric quantization. Highly possible, that I will need you help in lowering the asymmetric quantization convolution. But, we should not complicate this PR by including asymmetric quantization.

Ok. Haven't noticed line 44 - 46 comment before.

u99127 · 2019-07-08T12:09:35Z

include/tvm/relay/quantize_util.h

+}
+
+
+


Minor nit, unnecessary new lines.

u99127 · 2019-07-08T12:09:40Z

include/tvm/relay/quantize_util.h

+
+/*!
+ * \file nnvm/compiler/quantize_util.h
+ * \brief Utility methods needs for quantized ops that can be shared


Minor nits -

s/needs/needed.

shared between frontends ?

u99127 · 2019-07-08T12:11:01Z

include/tvm/relay/quantize_util.h

+      || is_Int16(dtype) || is_UInt16(dtype);
+}
+
+enum class QuantizeOpType : uint8_t {


Some notes here on Quantize, Requantize and Quantize_Requantize might be appropriate. Alternatively a pointer to the python documentation might also be useful for the reader.

u99127 · 2019-07-08T12:11:16Z

include/tvm/relay/quantize_util.h

+  return -1;
+}
+
+


Minor nit, unnecessary new line.

u99127 · 2019-07-08T12:11:42Z

python/tvm/relay/op/qnn/qnn.py

+
+    This operator takes the quantized_weight as the convolution kernel
+    and convolves it with quantized_data to produce an output quantized tensor.
+    The scale of the output quantized tensor is the prodcut of the weight_scale


s/prodcut/product.

anijain2305 · 2019-07-08T17:52:14Z

python/tvm/relay/op/qnn/qnn.py

+           data_layout="NCHW",
+           kernel_layout="OIHW",
+           out_layout="",
+           out_dtype="int32"):


To self - Add documentation for out_dtype (accumuation accuracy etc)

anijain2305 · 2019-07-08T17:55:56Z

src/relay/pass/quantize_rewrite.cc

+      << " Only symmetric quantization supported for now.";
+
+  if (param->input_zero_point == 0 && param->kernel_zero_point == 0) {
+    Expr int8_conv = Conv2D(quantized_data,


Change name.

…apache#3367](apache#3367) In this PR I want to discuss the design and implementation of the - Quantize op -> FP32 to i8/u8 - Dequantize Op -> i8/u8 -> fp32 I have added test cases to verify the correctness of the ops.

…ntized op Features - New quantized conv2D and requantize op in Relay - Python API interface to instantiate the Relay op - Infer Type implemented - Lowering of quantized op to low-level Relay ops

anijain2305 · 2019-07-18T20:59:45Z

Closing. Moving to #3580

tqchen added the status: WIP label Jun 13, 2019

FrozenGene reviewed Jun 15, 2019

View reviewed changes

anijain2305 force-pushed the qconv2d branch 5 times, most recently from 9403ba6 to 70322db Compare July 3, 2019 00:48

anijain2305 force-pushed the qconv2d branch 5 times, most recently from fc487e6 to 155ccc1 Compare July 6, 2019 01:02

anijain2305 mentioned this pull request Jul 7, 2019

[RFC][Quantization] Support quantized models from TensorflowLite #2351

Closed

4 tasks

anijain2305 force-pushed the qconv2d branch 6 times, most recently from 3f2509e to cff0bdd Compare July 8, 2019 07:05

u99127 reviewed Jul 8, 2019

View reviewed changes

include/tvm/relay/quantize_util.h

}

Copy link

Contributor

u99127 Jul 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit, unnecessary new lines.

u99127 reviewed Jul 8, 2019

View reviewed changes

include/tvm/relay/quantize_util.h

return -1;

}

Copy link

Contributor

u99127 Jul 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit, unnecessary new line.

u99127 reviewed Jul 8, 2019

View reviewed changes

anijain2305 commented Jul 8, 2019

View reviewed changes

shoubhik mentioned this pull request Jul 8, 2019

[RFC][Quantization] Designing and lowering of quantized ops #3512

Closed

anijain2305 force-pushed the qconv2d branch from cff0bdd to bc78007 Compare July 9, 2019 04:51

[Relay] [Quantization] Protoyping the quantized convolution and requa…

fb6b4b7

…ntized op Features - New quantized conv2D and requantize op in Relay - Python API interface to instantiate the Relay op - Infer Type implemented - Lowering of quantized op to low-level Relay ops

anijain2305 force-pushed the qconv2d branch from bc78007 to fb6b4b7 Compare July 10, 2019 21:11

anijain2305 closed this Jul 18, 2019

anijain2305 deleted the qconv2d branch November 13, 2019 00:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Relay] [Quantization] WIP - Protoyping the quantized convolution op #3367

[Relay] [Quantization] WIP - Protoyping the quantized convolution op #3367

anijain2305 commented Jun 13, 2019 •

edited

Loading

anijain2305 commented Jun 13, 2019 •

edited

Loading

zhenhuaw-me commented Jun 14, 2019

anijain2305 commented Jun 14, 2019

ZihengJiang commented Jun 14, 2019

anijain2305 commented Jun 14, 2019

FrozenGene Jun 15, 2019

anijain2305 Jun 15, 2019

zhenhuaw-me Jun 16, 2019 •

edited

Loading

anijain2305 Jun 16, 2019

FrozenGene Jun 15, 2019

anijain2305 Jun 15, 2019 •

edited

Loading

FrozenGene Jun 16, 2019

u99127 Jul 8, 2019

u99127 Jul 8, 2019

u99127 Jul 8, 2019

u99127 Jul 8, 2019

u99127 Jul 8, 2019

anijain2305 Jul 8, 2019

anijain2305 Jul 8, 2019

anijain2305 commented Jul 18, 2019

[Relay] [Quantization] WIP - Protoyping the quantized convolution op #3367

[Relay] [Quantization] WIP - Protoyping the quantized convolution op #3367

Conversation

anijain2305 commented Jun 13, 2019 • edited Loading

anijain2305 commented Jun 13, 2019 • edited Loading

zhenhuaw-me commented Jun 14, 2019

anijain2305 commented Jun 14, 2019

ZihengJiang commented Jun 14, 2019

anijain2305 commented Jun 14, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhenhuaw-me Jun 16, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anijain2305 Jun 15, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anijain2305 commented Jul 18, 2019

anijain2305 commented Jun 13, 2019 •

edited

Loading

anijain2305 commented Jun 13, 2019 •

edited

Loading

zhenhuaw-me Jun 16, 2019 •

edited

Loading

anijain2305 Jun 15, 2019 •

edited

Loading