Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error using LoRA with torch-fp16 model #274

Closed
ssube opened this issue Mar 21, 2023 · 1 comment
Closed

error using LoRA with torch-fp16 model #274

ssube opened this issue Mar 21, 2023 · 1 comment
Labels
Milestone

Comments

@ssube
Copy link
Owner

ssube commented Mar 21, 2023

[2023-03-21 04:37:29,749] ERROR: 507752 140259377041408 onnx_web.worker.worker: error while running job
Traceback (most recent call last):
  File "/opt/onnx-web/api/onnx_web/worker/worker.py", line 52, in worker_main
    job.fn(context, *job.args, **job.kwargs)
  File "/opt/onnx-web/api/onnx_web/diffusers/run.py", line 37, in run_txt2img_pipeline
    pipe = load_pipeline(
  File "/opt/onnx-web/api/onnx_web/diffusers/load.py", line 254, in load_pipeline
    OnnxRuntimeModel.load_model(
  File "/opt/onnx-web/api/onnx_env/lib/python3.10/site-packages/diffusers/pipelines/onnx_utils.py", line 77, in load_model
    return ort.InferenceSession(path, providers=[provider], sess_options=sess_options)
  File "/opt/onnx-web/api/onnx_env/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 360, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/opt/onnx-web/api/onnx_env/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 399, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Type Error: Type parameter (T) of Optype (MatMul) bound to different types (tensor(float16) and tensor(float) in node (/text_model/encoder/layers.0/self_attn/k_proj/MatMul).
@ssube ssube added status/new issues that have not been confirmed yet type/bug broken features scope/api model/diffusion provider/cuda model/lora labels Mar 21, 2023
@ssube ssube added this to the v0.9 milestone Mar 21, 2023
@ssube ssube added status/fixed issues that have been fixed and released and removed status/new issues that have not been confirmed yet labels Mar 22, 2023
@ssube
Copy link
Owner Author

ssube commented Mar 22, 2023

Since conversion happens primarily on the CPU, using the optimized dtype will often fail:

[2023-03-22 02:54:29,929] ERROR: 634418 139916390940672 onnx_web.convert.diffusion.lora: error blending weights for key up_blocks_3_attentions_0_transformer_blocks_0_ff_net_2            
Traceback (most recent call last):                                                                                                                                                        
  File "/opt/onnx-web/api/onnx_web/convert/diffusion/lora.py", line 107, in blend_loras                                                                                                   
    weights = up_weight @ down_weight                                                                                                                                                     
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'                                                                                                                                
[2023-03-22 02:54:29,929] DEBUG: 634418 139916390940672 onnx_web.convert.diffusion.lora: blending weights for keys: lora_unet_up_blocks_3_attentions_1_proj_in.lora_down.weight, lora_unet
_up_blocks_3_attentions_1_proj_in.lora_up.weight, lora_unet_up_blocks_3_attentions_1_proj_in.alpha                                                                                        
[2023-03-22 02:54:29,934] ERROR: 634418 139916390940672 onnx_web.convert.diffusion.lora: error blending weights for key up_blocks_3_attentions_1_proj_in                                  
Traceback (most recent call last):                                                                                                                                                        
  File "/opt/onnx-web/api/onnx_web/convert/diffusion/lora.py", line 119, in blend_loras                                                                                                   
    up_weight.squeeze(3).squeeze(2)                                                                                                                                                       
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' 

@ssube ssube closed this as completed Mar 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant