-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
working model with Resize node becomes invalid after using convert_float_to_float16 #14827
Comments
Try nightly package: https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly/PyPI/ort-nightly-gpu/. BTW, you can try our stable diffusion optimizations: |
I'm hoping to eventually use all of the optimizations that ORT is providing, they look pretty helpful. For now, I'm still get the same error with the nightly package and
with optimized = convert_float_to_float16(
model,
keep_io_types=False,
force_fp16_initializers=False,
disable_shape_infer=True,
op_block_list=[
"RandomNormalLike",
"Resize",
]
) onnxruntime/onnxruntime/python/tools/transformers/models/stable_diffusion/optimize_pipeline.py Line 151 in e097e4e
The optimize command shown in the docs runs to completion, and loads on the CUDA provider but not the CPU provider: > python3 -m onnxruntime.transformers.models.stable_diffusion.optimize_pipeline -i ~/onnx-web/models/stable-diffusion-onnx-v1-5 -o ./sd-v1-5-fp16 --float16
...
optimize_sd_pipeline: Convert unet to float16 ...
get_operator_statistics: Operators:{'Constant': 192, 'Transpose': 294, 'MatMul': 112, 'Shape': 66, 'Reshape': 64, 'Gather': 65, 'NhwcConv': 98, 'Unsqueeze': 158, 'GroupNorm': 61, 'Concat': 47,
'ConstantOfShape': 1, 'Mul': 20, 'Equal': 1, 'Where': 1, 'Expand': 1, 'Sin': 1, 'Cos': 1, 'Slice': 2, 'Gemm': 24, 'Sigmoid': 2, 'Add': 60, 'LayerNormalization': 16, 'MultiHeadAttention': 32, 'S
kipLayerNormalization': 32, 'Resize': 3, 'BiasSplitGelu': 16, 'BiasAdd': 16, 'Cast': 3}
get_fused_operator_statistics: Optimized operators:{'Attention': 0, 'MultiHeadAttention': 32, 'LayerNormalization': 16, 'SkipLayerNormalization': 32, 'BiasSplitGelu': 16, 'GroupNorm': 61, 'Nhwc
Conv': 98}
save_model_to_file: Sort graphs in topological order
save_model_to_file: Model saved to sd-v1-5-fp16/unet/model.onnx
optimize_sd_pipeline: unet is optimized
...
> python3
>>> import onnxruntime
>>> sess = onnxruntime.InferenceSession("./sd-v1-5-fp16/vae_decoder/model.onnx", providers=['CPUExecutionProvider'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ssube/onnx-repro/ort_env/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 366, in __init__
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/home/ssube/onnx-repro/ort_env/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 414, in _create_inference_session
sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.NotImplemented: [ONNXRuntimeError] : 9 : NOT_IMPLEMENTED : Failed to find kernel for NhwcConv(1) (node NhwcConv_0-/post_quant_conv/Conv). Kernel not found
>>> sess = onnxruntime.InferenceSession("./sd-v1-5-fp16/vae_decoder/model.onnx", providers=['CUDAExecutionProvider'])
2023-02-25 23:11:07.349772352 [W:onnxruntime:, session_state.cc:1136 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-02-25 23:11:07.349789765 [W:onnxruntime:, session_state.cc:1138 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
>>> The fp16 optimization is not really meant for CPU, so that's probably ok, but using |
@ssube, I can reproduce the issue. Walkaround is either change You are right. The optimization for stable diffusion works for CUDA only. CPUExecutionProvider does not support float16 operator. If an operator has float32 implementation in CPUExecutionProvider, ORT actually will add Cast to convert tensors back to float32 so that it can run the operator in float32. That might cause float16 model to be slower than float32 in CPUExecutionProvider. |
Thanks for testing that. I had tried setting |
Describe the issue
After converting a model to FP16 internally using
onnxruntime.transformers.float16.convert_float_to_float16
, the model can be loaded withonnx.load_model
but fails to load withonnxruntime.InferenceSession
. The model was valid before conversion and both ONNX and ORT could load it:This appears to be related to #8327 and #2848, both of which were closed as errors in third-party models, but this model was valid before calling
convert_float_to_float16
.The error seems to come from the FP16 type incorrectly being applied to
Resize
nodes:This is an invalid model. Type Error: Type 'tensor(float16)' of input parameter (onnx::Resize_9300) of operator (Resize) in node (Resize_4928) is invalid.
I've tested with the
CPUExecutionProvider
, which should be the most compatible?, as well as theCUDAExecutionProvider
and they both report the same error.onnx/onnxmltools#361 (comment) suggests there might be an overly-aggressive conversion happening and suggests raising an issue here:
I don't see any way to exclude the
Resize
nodes during conversion.To reproduce
Download
https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/onnx/vae_decoder/model.onnx
ahead-of-time or within the notebook/script:Using what appear to be the latest packages (fresh venv created for this repro):
Forcing FP16 initializers,
force_fp16_initializers=True
, does not seem to make a difference.Converting a model that does not use the
Resize
node, like the corresponding encoder https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/onnx/vae_encoder/model.onnx, does work correctly and can be loaded by both ONNX and ORT.Urgency
Not urgent, but limiting usage on cards with low VRAM (ssube/onnx-web#121 (comment)).
I'm happy to start a PR if this is the correct place and not user error.
Platform
Linux
OS Version
Linux compute-infer-1 5.15.0-60-generic #66-Ubuntu SMP Fri Jan 20 14:29:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Ubuntu 22.04.1 LTS
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.14.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
ORT 1.14.0, PyTorch 1.13.1
The text was updated successfully, but these errors were encountered: