Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove empty (DQ -> Q -> graph output) sequence in TransposeOptimizer #22172

Merged
merged 11 commits into from
Sep 25, 2024

Conversation

adrianlizarraga
Copy link
Contributor

@adrianlizarraga adrianlizarraga commented Sep 22, 2024

Description

Updates the TransposeOptimizer to also remove empty (DQ -> Q) sequences that occur at a graph output. An empty DQ->Q sequence results from a Transpose being optimized out.

Consider the following example model:
image

The TransposeOptimizer removes the final Transpose and leaves an empty DQ->Q->output_0 sequence. This PR ensures that the final DQ->Q is also removed.

Motivation and Context

Models with quantized output can run on QNN EP. The inference latency of a customer model is impacted by the unnecessary DQ->Q sequence at the output.

@adrianlizarraga adrianlizarraga changed the title [DRAFT] Remove empty (DQ -> Q -> graph output) sequence in TransposeOptimizer Remove empty (DQ -> Q -> graph output) sequence in TransposeOptimizer Sep 24, 2024
@adrianlizarraga adrianlizarraga added the ep:QNN issues related to QNN exeution provider label Sep 24, 2024
@adrianlizarraga adrianlizarraga marked this pull request as ready for review September 24, 2024 18:03
HectorSVC
HectorSVC previously approved these changes Sep 24, 2024
@adrianlizarraga adrianlizarraga merged commit a47254e into main Sep 25, 2024
83 of 85 checks passed
@adrianlizarraga adrianlizarraga deleted the adrianl/handle-dq-q-at-graph-output branch September 25, 2024 04:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:QNN issues related to QNN exeution provider
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants