Get error while using Dml EP #20742

klin2024 · 2024-05-20T23:32:00Z

Describe the issue

Based on https://github.com/instant-high/deoldify-onnx, we tried to deploy the model by using Dml EP. The fp32 onnx model runs well. We tried to convert it to fp16 onnx model by using float16.convert_float_to_float16. There is no problem with CPU EP. However, we would observe runtime error when we use Dml EP.

Error Message:
(D:\conda_envs\deoldify) D:\deoldify-onnx>python video.py --source "video.mp4" --result "video_colorized.mp4"
2024-05-20 16:17:13.7289822 [E:onnxruntime:, inference_session.cc:2045 onnxruntime::InferenceSession::Initialize::<lambda_de9340899c8cfefde68f4d8c5936aa80>::operator ()] Exception during initialization: C:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlGraphFusionHelper.cpp(576)\onnxruntime_pybind11_state.pyd!00007FFE6F0124D1: (caller: 00007FFE6EFE0CF5) Exception(1) tid(3d0c) 80070057 The parameter is incorrect.

Traceback (most recent call last):
File "D:\deoldify-onnx\video.py", line 43, in
colorizer = DEOLDIFY(model_path="color/DeoldifyVideo_dyn_fp16.onnx", device="dml")
File "D:\deoldify-onnx\color\deoldify_fp16.py", line 16, in init
self.session = onnxruntime.InferenceSession(model_path, sess_options=session_options, providers=providers)
File "D:\conda_envs\deoldify\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 419, in init
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "D:\conda_envs\deoldify\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 483, in _create_inference_session
sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: C:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlGraphFusionHelper.cpp(576)\onnxruntime_pybind11_state.pyd!00007FFE6F0124D1: (caller: 00007FFE6EFE0CF5) Exception(1) tid(3d0c) 80070057 The parameter is incorrect.

To reproduce

git clone https://github.com/instant-high/deoldify-onnx.git
Download the model from https://drive.google.com/drive/folders/1bU9Zj7zGVEujIzvDTb1b9cyWU3s__WQR?usp=sharing
Modify color/deoldify.py and color/deoldify_fp16.py to make them use Dml EP
Convert the model to fp16 mode by using float16.convert_float_to_float16.
Command:
python image.py --source_image "image.jpg"
or
python video.py --source "video.mp4" --result "video_colorized.mp4"

For the fp32 model, it runs well with Dml EP and CPU EP. Convert it to fp16 model, it runs well with CPU EP but we would observe runtime error while using Dml EP.

Urgency

No response

Platform

Windows

OS Version

22631.3593

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.17.3

ONNX Runtime API

Python

Architecture

X64

Execution Provider

DirectML

Execution Provider Library Version

onnxruntime-directml 1.18.0

siegelaaron94 · 2024-05-21T16:30:01Z

We are also seeing this with some of our Fp16 models. Our models run fine on RTX-20XX, RTX-30XX, and RTX-40XX, but they seem to fail on all AMD cards and some GTX-10XX cards. As a reference, these models all ran fine under the onnxruntime 1.13.1, but when we switched to 1.17.1, they started failing.

github-actions · 2024-06-21T15:00:52Z

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

siegelaaron94 · 2024-07-12T19:31:35Z

This is still a genuine bug; I have worked around it by doing this during session configuration.

status = sOrtAPI->AddSessionConfigEntry(inOptions, kOrtSessionOptionsConfigDisableDmlGraphFusion, "1");

But this causes severe performance degradations, three times slower for at least one of our models, even on good hardware, an NVIDIA RTX 3070 mobile GPU. This same bug or similar issues have been written up more than once by more than one person using more than one model.

#21205
#20742
#20575

This regression seams to have been introduced with this pull request: #13131 or maybe #18160.

thewh1teagle · 2024-07-21T01:02:55Z

status = sOrtAPI->AddSessionConfigEntry(inOptions, kOrtSessionOptionsConfigDisableDmlGraphFusion, "1");

I experience the same issue with DirectML (inferencing whisper).
Adding the following didn't helped me.

thewh1teagle · 2024-07-21T14:03:23Z

I found the root cause of that issue.
onnxruntime of directml depends on DirectML.dll and uses old DLL from system directory.
The solution is to download newer DirectML.dll from nuget.org/packages/Microsoft.AI.DirectML
Choose the button Download package.
Unzip it and move the file DirectML.dll inside bin/x64-win folder into your executable folder

siegelaaron94 · 2024-07-29T16:32:21Z

@thewh1teagle, I think we are seeing a different problem. I directly create the DirectML device using LoadLibraryEx, making sure to load the DirectML.dll (1.13.1) associated with the onnxruntime (1.17.1) and use GetProcAddress to get the function DMLCreateDevice1 then call SessionOptionsAppendExecutionProvider_DML1 with the device I create.

hedecai · 2024-10-02T08:39:52Z

SessionOptionsAppendExecutionProvider_DML1

@siegelaaron94
Yes. We got the same exception as "DmlGraphFusionHelper.cpp 80070057 The parameter is incorrect". After debug we found that it was IDMLDevive::CreateOperater, with DML_OPERATOR_GEMM operator type and the channel count of ATensor shape is bigger than 65535 the IDMLDevive::CreateOperater will return E_INVALIDARG (We test the input shape {65536, 1, 256}).

Any one known is that a limitation of 65535 channel count?
Thanks.

github-actions bot added ep:DML issues related to the DirectML execution provider platform:windows issues related to the Windows platform labels May 20, 2024

siegelaaron94 mentioned this issue May 22, 2024

DirectML Exception 80070057 "The parameter is incorrect" #20575

Open

github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Jun 21, 2024

thewh1teagle mentioned this issue Jul 21, 2024

feat: add directml support k2-fsa/sherpa-onnx#1153

Merged

hedecai mentioned this issue Oct 3, 2024

IDMLDevive::CreateOperater, with DML_OPERATOR_GEMM operator type and the channel count of ATensor bigger than 65535 return E_INVALIDARG microsoft/DirectML#650

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get error while using Dml EP #20742

Get error while using Dml EP #20742

klin2024 commented May 20, 2024

siegelaaron94 commented May 21, 2024 •

edited

Loading

github-actions bot commented Jun 21, 2024

siegelaaron94 commented Jul 12, 2024 •

edited

Loading

thewh1teagle commented Jul 21, 2024

thewh1teagle commented Jul 21, 2024

siegelaaron94 commented Jul 29, 2024

hedecai commented Oct 2, 2024 •

edited

Loading

Get error while using Dml EP #20742

Get error while using Dml EP #20742

Comments

klin2024 commented May 20, 2024

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

siegelaaron94 commented May 21, 2024 • edited Loading

github-actions bot commented Jun 21, 2024

siegelaaron94 commented Jul 12, 2024 • edited Loading

thewh1teagle commented Jul 21, 2024

thewh1teagle commented Jul 21, 2024

siegelaaron94 commented Jul 29, 2024

hedecai commented Oct 2, 2024 • edited Loading

siegelaaron94 commented May 21, 2024 •

edited

Loading

siegelaaron94 commented Jul 12, 2024 •

edited

Loading

hedecai commented Oct 2, 2024 •

edited

Loading