Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Relay] [Quantization] WIP - Protoyping the quantized convolution op #3367

Closed
wants to merge 1 commit into from

Conversation

anijain2305
Copy link
Contributor

@anijain2305 anijain2305 commented Jun 13, 2019

Goal - Act as medium of discussion for pull request #2351

The patch only supports Symmetric quantization for now. The goal is to focus on infrastructure and not on the low-level details of what operations should be used for the computations. Once, we finalize the infra, we can tackle the low-level details.

Features

  • New quantized conv2D op in Relay
  • Python API interface to instantiate the Relay op
  • Infer Type implemented
  • Lowering of quantized_conv op to low-level Relay ops

Discussion points

  • Does the namespace look correct?
    • Relay op is called 'relay.op.nn._quantize.quantized_conv2d'
    • Idea is that any op under '_quantize' namespace will go through rewrite.
  • How to tackle redundant code? Should we reuse Conv2DRel and Conv2DAttrs?
    • Tried protoyping. Found it hard to derive from Conv2DAttr struct
    • Infer Type has a param field. This need to come from the right datatype.

Missing implememtation
- Lowering of quantized conv into conv+cast is incomplete.
- Will work on it async. This is orthogonal to the discussion.

Thanks for contributing to TVM! Please refer to guideline https://docs.tvm.ai/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers.

@anijain2305
Copy link
Contributor Author

anijain2305 commented Jun 13, 2019

@tqchen @FrozenGene @jackwish @yzhliu @eqy @ZihengJiang @vinx13
Your reviews will be very helpful :)

@zhenhuaw-me
Copy link
Contributor

@tqchen @FrozenGene @jackwish @yzhliu @eqy @ZihengJiang @vinx13
Your reviews will be very helpful :)

Thank you for ping @anijain2305 and glad to see this draft.
Personally, I'd like comment only on generic quantization stuff such as algorithms, arithmetics (and ecosystems maybe), but not TVM stack things, so not much from me currently. :)

@anijain2305
Copy link
Contributor Author

Personally, I'd like comment only on generic quantization stuff such as algorithms, arithmetics (and ecosystems maybe), but not TVM stack things, so not much from me currently. :)

Thanks @jackwish
I am pretty sure that we will need your expertise as soon as we have a TVM stack setup in a decent manner :) I will ping you once we have that

@ZihengJiang
Copy link
Contributor

Hey @anijain2305 , currently we use existing conv2d op with out_dtype for the quantization so that we can utilize the autotvm etc. Any benefit to propose a new op?

@anijain2305
Copy link
Contributor Author

Hey @anijain2305 , currently we use existing conv2d op with out_dtype for the quantization so that we can utilize the autotvm etc. Any benefit to propose a new op?

Hi @ZihengJiang Thanks for replying. Having this new op does not prevent the ability to perform AutoTVM. Infact, this new op act as a wrapper and gets lowered to existing Relay ops like conv, cast etc.

The reason we need new ops is

  1. The compute of quantized conv2d is different than normal conv2d. The difference becomes quite large in asymmetric. Plan is to have a _quantize namespace with these new quantized ops, that will be lowered to existing Relay ops.
  2. One can write a convertor from framework quantized graphs to Relay using these new wrappers.

Once these ops are lowered to existing Relay ops, all the relay and TVM optimizations are still applicable.

Please find more discussion at

  1. [RFC][Quantization] Support quantized models from TensorflowLite #2351 (comment)
  2. https://discuss.tvm.ai/t/tf-lite-quantized-conv2d-operator-conversion/2651/8?u=janimesh
  3. https://discuss.tvm.ai/t/tf-lite-quantized-conv2d-operator-conversion/2651/34?u=janimesh

param->out_layout,
Int(32));
// TODO(janimesh) - The out_dtype should come from outside..
int8_conv = Cast(int8_conv, param->out_dtype);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should name it q_conv2d_requant or something is better. Here is not just simple cast, we will need input scale / kernel scale / output scale / output min / output max and so on to compute the correct uint8 value. Just use Cast will confuse us here is just cast simply.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. This computation is not complete yet (I added a comment before but I dont thing it is visible enough).

I am looking into converting the scale computations into integer computations. I am having difficulty in understanding the rounding computations. Can you help with that?

Copy link
Contributor

@zhenhuaw-me zhenhuaw-me Jun 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may take this part of TFLite' conv (all stuff in that function) which is usually called Requantize semantically as reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will have a closer look this week and update the patch with the correct computation.

CHECK_EQ(param->kernel_zero_point, 0) << "Only symmetric support yet";
CHECK_EQ(param->output_zero_point, 0) << "Only symmetric support yet";
// TODO(janimesh) - The out_dtype should be something else, like "int32".
Expr int8_conv = Conv2D(quantized_data,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is our data - input_zero_point / kernel - kernel_zero_point computation? into quantized_data / quantized_kernel? Because we will cast data / kernel be int16/ int32 to minus zero point, I care about this.

Copy link
Contributor Author

@anijain2305 anijain2305 Jun 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of points that can answer your question here

  • Currently, the rewrite of quantized_conv2d to a sequence of Relay operation is not complete. I was hoping to start the discussion of quantization flow, namespaces before we can start discussing the details of operations. I will work on this asynchronously, as I understand the computation. Please let me know more of your thoughts on the quantization flow, like namespaces, the infra usage for lowering the quantized ops into a series of Relay ops.

  • First PR would only support symmetric quantization. The asymmetric computation that has the data - input_zero_point computation will come later, once people agree on the quantization flow. See Line 44 - 46 that are limiting the scope to symmetric quantization. Highly possible, that I will need you help in lowering the asymmetric quantization convolution. But, we should not complicate this PR by including asymmetric quantization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Haven't noticed line 44 - 46 comment before.

}



Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit, unnecessary new lines.


/*!
* \file nnvm/compiler/quantize_util.h
* \brief Utility methods needs for quantized ops that can be shared
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits -

s/needs/needed.

shared between frontends ?

|| is_Int16(dtype) || is_UInt16(dtype);
}

enum class QuantizeOpType : uint8_t {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes here on Quantize, Requantize and Quantize_Requantize might be appropriate. Alternatively a pointer to the python documentation might also be useful for the reader.

return -1;
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit, unnecessary new line.


This operator takes the quantized_weight as the convolution kernel
and convolves it with quantized_data to produce an output quantized tensor.
The scale of the output quantized tensor is the prodcut of the weight_scale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/prodcut/product.

data_layout="NCHW",
kernel_layout="OIHW",
out_layout="",
out_dtype="int32"):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To self - Add documentation for out_dtype (accumuation accuracy etc)

<< " Only symmetric quantization supported for now.";

if (param->input_zero_point == 0 && param->kernel_zero_point == 0) {
Expr int8_conv = Conv2D(quantized_data,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change name.

shoubhik pushed a commit to anijain2305/tvm that referenced this pull request Jul 8, 2019
…apache#3367](apache#3367)

In this PR I want to discuss the design and implementation of the
- Quantize op -> FP32 to i8/u8
- Dequantize Op -> i8/u8 -> fp32

I have added test cases to verify the correctness of the ops.
…ntized op

Features
- New quantized conv2D and requantize op in Relay
- Python API interface to instantiate the Relay op
- Infer Type implemented
- Lowering of quantized op to low-level Relay ops
@anijain2305
Copy link
Contributor Author

Closing. Moving to #3580

@anijain2305 anijain2305 deleted the qconv2d branch November 13, 2019 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants