You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have some code for loading LoRA (and LyCORIS, LoHA, etc) weights into ONNX models at runtime, without converting them to/from PyTorch again or writing anything to disk. It doesn't look like that is currently supported, so I would like to contribute that feature if possible, but I'm not sure the best way to go about that.
This code is working and has been tested with many models and LoRA files from Civitai. There are a few more exotic LyCORIS variants that are not fully supported yet, but LoHA seems to work pretty well. I have support for SDXL on a branch, almost ready to merge, which is why I am opening this issue now.
The LoRA safetensors do not need to be converted to ONNX ahead of time. Support is limited to LoRA weights trained by the kohya-ss scripts or some descendant of those, the cloneofsimo scripts remove the node names from the LoRA, making it very difficult to match up the weights with the correct ONNX nodes.
Motivation
This is supported for the other non-ONNX pipelines, and would be cool to have as part of the ONNX pipeline as well. diffusers has a LoRA loader mixin, which seems like the right way to structure this code? That's the part I'm not really sure about.
Blending the base model and LoRA weights in-memory without writing anything back to disk was super important for a few users with HDDs, and by externalizing the weight initializers in memory, it's possible to do regular numpy maths on them before loading the model into ORT: https://github.com/ssube/onnx-web/blob/main/api/onnx_web/diffusers/load.py#L258-L276
Feature request
I have some code for loading LoRA (and LyCORIS, LoHA, etc) weights into ONNX models at runtime, without converting them to/from PyTorch again or writing anything to disk. It doesn't look like that is currently supported, so I would like to contribute that feature if possible, but I'm not sure the best way to go about that.
This code is working and has been tested with many models and LoRA files from Civitai. There are a few more exotic LyCORIS variants that are not fully supported yet, but LoHA seems to work pretty well. I have support for SDXL on a branch, almost ready to merge, which is why I am opening this issue now.
The LoRA safetensors do not need to be converted to ONNX ahead of time. Support is limited to LoRA weights trained by the kohya-ss scripts or some descendant of those, the cloneofsimo scripts remove the node names from the LoRA, making it very difficult to match up the weights with the correct ONNX nodes.
Motivation
This is supported for the other non-ONNX pipelines, and would be cool to have as part of the ONNX pipeline as well.
diffusers
has a LoRA loader mixin, which seems like the right way to structure this code? That's the part I'm not really sure about.Blending the base model and LoRA weights in-memory without writing anything back to disk was super important for a few users with HDDs, and by externalizing the weight initializers in memory, it's possible to do regular numpy maths on them before loading the model into ORT: https://github.com/ssube/onnx-web/blob/main/api/onnx_web/diffusers/load.py#L258-L276
The state dicts are loaded from the LoRA normally, but then I use the node names from both the LoRA and ONNX models to find the correct
MatMul
initializer, which is always an input to that node: https://github.com/ssube/onnx-web/blob/main/api/onnx_web/convert/diffusion/lora.py#L358-L364The protobuf-based structure of ONNX model graphs makes this a little bit difficult, but since the number of nodes does not change, they can be replaced in-place, which works just fine: https://github.com/ssube/onnx-web/blob/main/api/onnx_web/convert/diffusion/lora.py#L429-L430
See ssube/onnx-web#213 and microsoft/onnxruntime#15024 for some more background and context. If you have any questions, I'm happy to chat about this on Discord as well.
Your contribution
I have the code and it works, it just needs some cleanup and I'm not sure where you would want it.
The code is in here: https://github.com/ssube/onnx-web/blob/main/api/onnx_web/convert/diffusion/lora.py (SDXL is still on a branch)
It currently looks something like:
There are a few different steps that I can break down into individual functions. Looking at the
LoraLoaderMixin
from https://github.com/huggingface/diffusers/blob/main/src/diffusers/loaders.py#L855,blend_loras
is roughly equivalent toload_lora_into_unet
andload_lora_into_text_encoder
, although ORT doesn't differentiate between the two models. Once you locate the correct MatMul node and its initializer, the math is all normal and follows the algo described in https://github.com/KohakuBlueleaf/LyCORIS/blob/main/Algo.md.The text was updated successfully, but these errors were encountered: