-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qualcomm AI Engine Direct - add program validation #4297
Conversation
Summary: - update graph signature for get_fake_program to work properly - make sure program is valid after capture_program - retire cature_pre_autograd_graph - fix release build error & make cross-compile flatcc deterministic - some minor fix
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4297
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit 77e4060 with merge base 4a88318 (): FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Hi @cccclai, this PR relates to #3860. For Thank you. |
Thank you for these changes, @haowhsu-quic. To help focus our reviews, please create separate PRs for:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for putting up the pr!
exir/backend/backend_api.py
Outdated
f"Error in get_fake_program for graph {edge_program.graph_module}, fallback to deepcopy: {e}" | ||
) | ||
fake_edge_program = copy.deepcopy(edge_program) | ||
fake_edge_program = get_fake_program(edge_program) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change can be separate - we may need to test it with broader tests in case some paths still rely on the fallback. Glad to see qnn path can work with the fake edge program! I believe the RAM usage will go down quite a bit now.
sdk/CMakeLists.txt
Outdated
@@ -84,25 +84,16 @@ option(EXECUTORCH_SEPARATE_FLATCC_HOST_PROJECT | |||
) | |||
|
|||
if(EXECUTORCH_SEPARATE_FLATCC_HOST_PROJECT) | |||
# Add the host project. We build this separately so that we can generate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also thanks for fixing the sdk build! cc: @Olivia-liu @tarun292
# param nodes will be FakeTensor when doing partition | ||
# fill in random numeric for validation | ||
if isinstance(coeff, torch._subclasses.fake_tensor.FakeTensor): | ||
coeff = torch.ones(coeff.shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is coeff
inserted to graph or just an intermediate tensor? If it's inserted, we may need to lift it to the i/o?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's inserted, but we make it static inside the graph for it could be identified when building operator. This can reduce extra memory copies, I think.
Thank you, sorry for the inconvenience: |
Thank you so much for splitting up the PR! This is very helpful for us. |
@@ -223,7 +224,12 @@ def capture_program( | |||
core_ep.transform(ConvertBinaryOpsWithScalar()) | |||
edge_ep = core_ep.to_edge(qnn_edge_config()) | |||
_transform(edge_ep.exported_program) | |||
|
|||
# Since QDQ nodes are stripped, update graph signature again to validate program |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this diff - is it possible to run a mix of fp ops and quantized ops? Does it support well? The reason I'm asking is because we're removing all q/dq ops and the insert back to the i/o. It may limit us to do mixed dtypes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently mixed precision only supports for quantized ops since compiler spec for HTP precision (quantized or fp16) is on graph level granularity.
We'll have multi-graphs change in the near future, hopefully some mechanisms like weight sharing / fp mixed precision could be well-supported at that time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do so, it would be great if the framework interface can provide runtime option (like having an argument in method::execute()
) for backend to react: e.g. change performance config, select which graph in the context to be executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we've been discussing how to pass the runtime option at the interface. Ideally passed it by the backend context. Question: do you need it to be method::init
time or method::execute
time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
method::execute
looks more flexible on QNN. Look forward to the change!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thank you for adding the validation and reduce the deep copy.
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: As mentioned in #4297, the original flow makes host / cross build happen concurrently. This change moves host build process into cmake configuring stage and refine related dependencies. Pull Request resolved: #4312 Test Plan: - cross-compile > Through running `backends/qualcomm/script/build.sh --release`, we could check if the compiling process successfully finished. - native-compile > Run following to check: ```shell cmake \ -DCMAKE_BUILD_TYPE=RelWithDebInfo \ -DQNN_SDK_ROOT=${QNN_SDK_ROOT} \ -DEXECUTORCH_BUILD_QNN=ON \ -DEXECUTORCH_BUILD_SDK=ON \ -DPYTHON_EXECUTABLE=$PYTHON_EXECUTABLE \ -S $EXECUTORCH_ROOT \ -B $EXECUTORCH_ROOT/build_x86_64 \ ``` Reviewed By: tarun292 Differential Revision: D60243701 Pulled By: dbort fbshipit-source-id: ff8d8cb06f0cc296c7ef465596e7e3df367dd059
Summary: