Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QNN][Relay][Topi] Add qnn.dense with weight layout #13854

Merged
merged 2 commits into from
Feb 2, 2023

Conversation

ibsidorenko
Copy link
Contributor

@ibsidorenko ibsidorenko commented Jan 27, 2023

This commit adds new Relay operation qnn.contrib_dense_pack that supports different weights layout (nn.dense and qnn.dense do not support this attribute). This new operation is full analog of nn.contrib_dense_pack operation but in QNN space.

With this PR, current QNN Dense can achieve ~10x performance gain on Hexagon target without QNN canonicalization (through the use of vrmpy intrinsic).

Also, this PR includes slight performance improvement for qnn.mul (without QNN canonicalization).

@tvm-bot
Copy link
Collaborator

tvm-bot commented Jan 27, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

This commit imroves performance of qnn.mul operation without QNN
canonicalization.
@ibsidorenko ibsidorenko force-pushed the wo-qnn-perf-improvement branch 2 times, most recently from 116a60d to f55a0c9 Compare January 30, 2023 08:57
@TejashShah
Copy link

TejashShah commented Jan 30, 2023

@quic-sanirudh
Copy link
Contributor

cc @quic-sanirudh @jverma-quic @farshidsp

This is cool, thanks a lot. I went through this a little already earlier, but I'll go through the PR more closely.

Copy link
Contributor

@quic-sanirudh quic-sanirudh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me apart from the one question about schedule.

Also, one small thought and I might be wrong here, but I see that the nn version of this op is called nn.contrib_dense_pack, so should we name this qnn.contrib_dense_pack?

sch: Schedule
The computation schedule for the op.
"""
return default_schedule(outs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a plan to add vrmpy tensorized schedule separately, or is that going to be dependent on metaschedule?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have such plans, but with low priority. The most interesting is work of MetaScheduler for now.

@ibsidorenko
Copy link
Contributor Author

Looks good to me apart from the one question about schedule.

Also, one small thought and I might be wrong here, but I see that the nn version of this op is called nn.contrib_dense_pack, so should we name this qnn.contrib_dense_pack?

Yes, I can rename this operator to be aligned with its analog nn.contrib_dense_pack

Comment on lines +387 to +391
# Shift kernel if necessary.
if kernel_dtype == "int8":
# Compute (QA + 128) and (zp_a + 128)
kernel, kernel_zero_point = _shift(kernel, kernel_zero_point, "uint8")

Copy link
Contributor

@jverma-quic jverma-quic Jan 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to convert kernel to int8? This would result into non-zero value zero-point and introduced additional computation which could have been avoided if kernel stayed in int8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main goal of conversion int8 --> uint8 is to enable using of faster vrmpy u8u8i32 intrinsic instead of vrmpy u8i8i32 for qnn.dense/qnn.conv2d
Yes, you're absolutely right. We have some overhead on such conversion. That's why I have commented out this code and skip i8 -> u8 conversion for weights. Right now I do not see any performance improvement due to overhead.
I will enable this code if I will manage to get better performance.

@@ -296,7 +297,98 @@ def _get_mod(data_dtype, kernel_dtype):
assert "cast" in legalized_mod.astext() and "qnn" in legalized_mod.astext()


def test_qnn_legalize_qnn_conv2d_non_scalar_qnn_params():
"""
Test QNN legalization for qnn.dense op for Hexagon target when kernel zero point and kernel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo?: qnn.dense->qnn.conv2d

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, fixed

def test_qnn_legalize_qnn_conv2d_non_scalar_qnn_params():
"""
Test QNN legalization for qnn.dense op for Hexagon target when kernel zero point and kernel
scale are not scalars.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate what you mean by kernel zero-point and scale not being scalars?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, by "not scalar" I mean constant vector of scalars. For example:
relay.const([val1, val2 ... valN])
This is more interesting case for testing because on legalization we do padding of weights. And dimension of weights, vector of scale/zero point and 'channels' attribute should be aligned.

This commit adds new Relay operation "qnn.contrib_dense_pack" that supports
different weights layout (nn.dense and qnn.dense do not support this
attribute). This new operation is full analog of "nn.contrib_dense_pack"
operation but in QNN space.
@echuraev echuraev merged commit 37e1a68 into apache:main Feb 2, 2023
csullivan pushed a commit to csullivan/tvm that referenced this pull request Feb 7, 2023
* [Hexagon][QNN] Improve performance of qnn.mul

This commit imroves performance of qnn.mul operation without QNN
canonicalization.

* [QNN][Relay][Topi] Add qnn.dense with weight layout

This commit adds new Relay operation "qnn.contrib_dense_pack" that supports
different weights layout (nn.dense and qnn.dense do not support this
attribute). This new operation is full analog of "nn.contrib_dense_pack"
operation but in QNN space.
@ibsidorenko ibsidorenko deleted the wo-qnn-perf-improvement branch March 29, 2023 06:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants