-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PT2E conversion creates Transpose op for each conv2d weight set #179
Comments
This is a good idea to improve performance and sounds like it would be quite common (Pretty much all CV models). |
Hi @edupuis-psee , thanks for the issue report. The issue in the example you provided seems to be transposes on quantized weights not properly folded. We will improve this in our converter later. Besides, instead of PT2E quant, we suggest to use For general NCHW -> NHWC transformation, we have dedicated optimization in our converter to minimize number of transposes while preserving the model input and output signatures, all happen automatically. We also have a utility to help you transform model input and output to NHWC. If you run into other issues where transposes are not properly eliminated (like this issue), feel free to report to us and we will improve our optimization algorithm. Thanks! |
Marking this issue as stale since it has been open for 7 days with no activity. This issue will be closed if no further activity occurs. |
Thank you for your answer, do you have more info on |
Hello, the repo is now public here https://github.com/google-ai-edge/ai-edge-quantizer/tree/main. QAT is not currently supported though so our best bet today is still converting pre-QAT'd models. If you don't strictly require QAT, converting with AI Edge Torch and then quantizing with AI Edge Quantizer will give you the cleanest (hence most optimal) graph. Otherwise I'd defer to @chunnienc on future plans to support NHWC weights. |
Marking this issue as stale since it has been open for 7 days with no activity. This issue will be closed if no further activity occurs. |
This issue was closed because it has been inactive for 14 days. Please post a new issue if you need further assistance. Thanks! |
Description of the bug:
The current implementation of the PT2E creates numerous transpose operation (NCHW -> NHWC) for the weights, which slows down the inference, is there a way to have the weights stored in NHWC format directly ?
To reproduce:
Actual vs expected behavior:
Currently after a PT2E -> TFLITE conversion weights are stored in NCHW and a transpose op is inserted before the conv layer. The expected behavior is storing the weights in NHWC
Any other information you'd like to share?
The text was updated successfully, but these errors were encountered: