Remove empty (DQ -> Q -> graph output) sequence in TransposeOptimizer #22172

adrianlizarraga · 2024-09-22T01:14:26Z

Description

Updates the TransposeOptimizer to also remove empty (DQ -> Q) sequences that occur at a graph output. An empty DQ->Q sequence results from a Transpose being optimized out.

Consider the following example model:

The TransposeOptimizer removes the final Transpose and leaves an empty DQ->Q->output_0 sequence. This PR ensures that the final DQ->Q is also removed.

Motivation and Context

Models with quantized output can run on QNN EP. The inference latency of a customer model is impacted by the unnecessary DQ->Q sequence at the output.

…quence

onnxruntime/test/testdata/make_transpose_optimizer_empty_dq_q_at_output_model.py

onnxruntime/core/optimizer/transpose_optimization/onnx_transpose_optimization.cc

…se_optimization.cc Co-authored-by: Scott McKay <[email protected]>

adrianlizarraga added 7 commits September 21, 2024 17:14

Allow DQ->Q fusion into QNN Convert for same quantization type

acf599a

Update TransposeOptimizer to also remove (DQ -> Q -> graph output) se…

8fa83a9

…quence

Undo use of double

04aad45

Fix removal order

7f2fadb

Add unit test for TransposeOptimizer

7b4ef20

Merge branch 'main' into adrianl/handle-dq-q-at-graph-output

82e0434

Comment-out code to save debug model

eee89b1

github-advanced-security bot found potential problems Sep 22, 2024

View reviewed changes

onnxruntime/test/testdata/make_transpose_optimizer_empty_dq_q_at_output_model.py Fixed Show fixed Hide fixed

adrianlizarraga added 3 commits September 22, 2024 01:04

Run lintrunner

9d8dae5

Move QNN-EP specific changes to a separate PR

65ada59

Merge branch 'main' into adrianl/handle-dq-q-at-graph-output

61026ee

adrianlizarraga changed the title ~~[DRAFT] Remove empty (DQ -> Q -> graph output) sequence in TransposeOptimizer~~ Remove empty (DQ -> Q -> graph output) sequence in TransposeOptimizer Sep 24, 2024

adrianlizarraga added the ep:QNN issues related to QNN exeution provider label Sep 24, 2024

adrianlizarraga marked this pull request as ready for review September 24, 2024 18:03

adrianlizarraga requested review from edgchen1, skottmckay, jywu-msft and HectorSVC September 24, 2024 18:03

HectorSVC previously approved these changes Sep 24, 2024

View reviewed changes

skottmckay previously approved these changes Sep 24, 2024

View reviewed changes

onnxruntime/core/optimizer/transpose_optimization/onnx_transpose_optimization.cc Outdated Show resolved Hide resolved

Update onnxruntime/core/optimizer/transpose_optimization/onnx_transpo…

f8cb77a

…se_optimization.cc Co-authored-by: Scott McKay <[email protected]>

adrianlizarraga dismissed stale reviews from skottmckay and HectorSVC via f8cb77a September 24, 2024 23:26

skottmckay approved these changes Sep 24, 2024

View reviewed changes

adrianlizarraga merged commit a47254e into main Sep 25, 2024
83 of 85 checks passed

adrianlizarraga deleted the adrianl/handle-dq-q-at-graph-output branch September 25, 2024 04:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove empty (DQ -> Q -> graph output) sequence in TransposeOptimizer #22172

Remove empty (DQ -> Q -> graph output) sequence in TransposeOptimizer #22172

adrianlizarraga commented Sep 22, 2024 •

edited

Loading

Remove empty (DQ -> Q -> graph output) sequence in TransposeOptimizer #22172

Remove empty (DQ -> Q -> graph output) sequence in TransposeOptimizer #22172

Conversation

adrianlizarraga commented Sep 22, 2024 • edited Loading

Description

Motivation and Context

adrianlizarraga commented Sep 22, 2024 •

edited

Loading