Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get error while using Dml EP #20742

Open
klin2024 opened this issue May 20, 2024 · 7 comments
Open

Get error while using Dml EP #20742

klin2024 opened this issue May 20, 2024 · 7 comments
Labels
ep:DML issues related to the DirectML execution provider platform:windows issues related to the Windows platform stale issues that have not been addressed in a while; categorized by a bot

Comments

@klin2024
Copy link

Describe the issue

Based on https://github.com/instant-high/deoldify-onnx, we tried to deploy the model by using Dml EP. The fp32 onnx model runs well. We tried to convert it to fp16 onnx model by using float16.convert_float_to_float16. There is no problem with CPU EP. However, we would observe runtime error when we use Dml EP.

Error Message:
(D:\conda_envs\deoldify) D:\deoldify-onnx>python video.py --source "video.mp4" --result "video_colorized.mp4"
2024-05-20 16:17:13.7289822 [E:onnxruntime:, inference_session.cc:2045 onnxruntime::InferenceSession::Initialize::<lambda_de9340899c8cfefde68f4d8c5936aa80>::operator ()] Exception during initialization: C:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlGraphFusionHelper.cpp(576)\onnxruntime_pybind11_state.pyd!00007FFE6F0124D1: (caller: 00007FFE6EFE0CF5) Exception(1) tid(3d0c) 80070057 The parameter is incorrect.

Traceback (most recent call last):
File "D:\deoldify-onnx\video.py", line 43, in
colorizer = DEOLDIFY(model_path="color/DeoldifyVideo_dyn_fp16.onnx", device="dml")
File "D:\deoldify-onnx\color\deoldify_fp16.py", line 16, in init
self.session = onnxruntime.InferenceSession(model_path, sess_options=session_options, providers=providers)
File "D:\conda_envs\deoldify\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 419, in init
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "D:\conda_envs\deoldify\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 483, in _create_inference_session
sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: C:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlGraphFusionHelper.cpp(576)\onnxruntime_pybind11_state.pyd!00007FFE6F0124D1: (caller: 00007FFE6EFE0CF5) Exception(1) tid(3d0c) 80070057 The parameter is incorrect.

To reproduce

  1. git clone https://github.com/instant-high/deoldify-onnx.git
  2. Download the model from https://drive.google.com/drive/folders/1bU9Zj7zGVEujIzvDTb1b9cyWU3s__WQR?usp=sharing
  3. Modify color/deoldify.py and color/deoldify_fp16.py to make them use Dml EP
  4. Convert the model to fp16 mode by using float16.convert_float_to_float16.
  5. Command:
    python image.py --source_image "image.jpg"
    or
    python video.py --source "video.mp4" --result "video_colorized.mp4"

For the fp32 model, it runs well with Dml EP and CPU EP. Convert it to fp16 model, it runs well with CPU EP but we would observe runtime error while using Dml EP.

Urgency

No response

Platform

Windows

OS Version

22631.3593

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.17.3

ONNX Runtime API

Python

Architecture

X64

Execution Provider

DirectML

Execution Provider Library Version

onnxruntime-directml 1.18.0

@github-actions github-actions bot added ep:DML issues related to the DirectML execution provider platform:windows issues related to the Windows platform labels May 20, 2024
@siegelaaron94
Copy link

siegelaaron94 commented May 21, 2024

We are also seeing this with some of our Fp16 models. Our models run fine on RTX-20XX, RTX-30XX, and RTX-40XX, but they seem to fail on all AMD cards and some GTX-10XX cards. As a reference, these models all ran fine under the onnxruntime 1.13.1, but when we switched to 1.17.1, they started failing.

Copy link
Contributor

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Jun 21, 2024
@siegelaaron94
Copy link

siegelaaron94 commented Jul 12, 2024

This is still a genuine bug; I have worked around it by doing this during session configuration.

status = sOrtAPI->AddSessionConfigEntry(inOptions, kOrtSessionOptionsConfigDisableDmlGraphFusion, "1");

But this causes severe performance degradations, three times slower for at least one of our models, even on good hardware, an NVIDIA RTX 3070 mobile GPU. This same bug or similar issues have been written up more than once by more than one person using more than one model.

#21205
#20742
#20575

This regression seams to have been introduced with this pull request: #13131 or maybe #18160.

@thewh1teagle
Copy link

status = sOrtAPI->AddSessionConfigEntry(inOptions, kOrtSessionOptionsConfigDisableDmlGraphFusion, "1");

I experience the same issue with DirectML (inferencing whisper).
Adding the following didn't helped me.

@thewh1teagle
Copy link

I found the root cause of that issue.
onnxruntime of directml depends on DirectML.dll and uses old DLL from system directory.
The solution is to download newer DirectML.dll from nuget.org/packages/Microsoft.AI.DirectML
Choose the button Download package.
Unzip it and move the file DirectML.dll inside bin/x64-win folder into your executable folder

@siegelaaron94
Copy link

@thewh1teagle, I think we are seeing a different problem. I directly create the DirectML device using LoadLibraryEx, making sure to load the DirectML.dll (1.13.1) associated with the onnxruntime (1.17.1) and use GetProcAddress to get the function DMLCreateDevice1 then call SessionOptionsAppendExecutionProvider_DML1 with the device I create.

@hedecai
Copy link

hedecai commented Oct 2, 2024

SessionOptionsAppendExecutionProvider_DML1

@siegelaaron94
Yes. We got the same exception as "DmlGraphFusionHelper.cpp 80070057 The parameter is incorrect". After debug we found that it was IDMLDevive::CreateOperater, with DML_OPERATOR_GEMM operator type and the channel count of ATensor shape is bigger than 65535 the IDMLDevive::CreateOperater will return E_INVALIDARG (We test the input shape {65536, 1, 256}).

Any one known is that a limitation of 65535 channel count?
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:DML issues related to the DirectML execution provider platform:windows issues related to the Windows platform stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

4 participants