Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for safetensors and LoRa. #2448

Merged
merged 2 commits into from
Mar 3, 2023
Merged

Adding support for safetensors and LoRa. #2448

merged 2 commits into from
Mar 3, 2023

Conversation

Narsil
Copy link
Contributor

@Narsil Narsil commented Feb 21, 2023

Enabling safetensors support for the LoRA files:

Asked here: huggingface/safetensors#180

Same attitude as the regular model weights.

If:

  • safetensors is installed
  • repo has the safetensors lora
    then it is the default.

Adding safe_serialization on lora create so users can default to saving safetensors formats.

What's techically missing is the option to choose on load what the format is along with the weights_name. I ddin't want to add it here for simplicity (Since most users should be using the default anyway). But we could add that.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Feb 21, 2023

The documentation is not available anymore as the PR was closed or merged.

@Narsil
Copy link
Contributor Author

Narsil commented Feb 21, 2023

I don't think the failing test is linked to this PR, is it ?

@pcuenca
Copy link
Member

pcuenca commented Feb 21, 2023

Hi @Narsil! The code looks good to me, and the failing tests have nothing to do with it.

For additional context on top of the safetensors issue you mentioned, people are interested in converting LoRA weights generated with other tools. See for example: #2363, #2403. And also the other way around (being able to use diffusers LoRA weights in other tools): #2326.

Focusing on the first task (converting from other tools to diffusers), I'm not sure how hard the problem is. In addition to changes in key names, some tools seem to do pivotal tuning or text inversion in addition to the cross-attention layers in our implementation. This PR seems to save the full pipeline instead of attempting to convert the incremental weights.

TL;DR: I think this PR is quite useful and necessary, but not sure if it will help towards the issue you mentioned :) (But I may be wrong, I still have to find the time to test this sort of interoperability).

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

I think the current approach is reasonable – if the user supplies a custom name on save, they have to provide it on load too, and we'll try safetensors first.

Maybe @sayakpaul or @patil-suraj would like to take a look too.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@Narsil
Copy link
Contributor Author

Narsil commented Feb 24, 2023

Shall I merge ?

@jochemstoel
Copy link

Please merge.

@sayakpaul
Copy link
Member

I would like to have @patil-suraj also review it. But he is currently on leave and should be back early next week. If it can wait, I would like to wait for that while.

@@ -219,7 +250,10 @@ def save_attn_procs(
return

if save_function is None:
save_function = torch.save
if safe_serialization:
save_function = safetensors.torch.save_file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we save the format here as well as this is needed in some loading code?

E.g. see updated code here:

safetensors.torch.save_file(

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Can we also save:

metadata={"format": "pt"} 

as well here?

@Narsil
Copy link
Contributor Author

Narsil commented Mar 3, 2023

Done.

@patrickvonplaten patrickvonplaten merged commit 1f4deb6 into huggingface:main Mar 3, 2023
@gadicc
Copy link

gadicc commented Mar 3, 2023

Hey all, thanks for all the amazing work here.

I need to spend a bit more time on this but I think this introduces a regression. I have an integration test that calls:

// https://huggingface.co/patrickvonplaten/lora_dreambooth_dog_example/resolve/main/pytorch_lora_weights.bin
pipe.unet.load_attn_procs("pytorch_lora_weights.bin")

and is now failing with:

  File "/api/diffusers/src/diffusers/loaders.py", line 170, in load_attn_procs
    state_dict = safetensors.torch.load_file(model_file, device="cpu")
  File "/opt/conda/envs/xformers/lib/python3.9/site-packages/safetensors/torch.py", line 98, in load_file
    with safe_open(filename, framework="pt", device=device) as f:
Exception: Error while deserializing header: HeaderTooLarge

I guess because it's trying to load a non-safetensors file with safetensors.torch.load_file ? Here's the relevant code (last line is the failing line):

if is_safetensors_available():
if weight_name is None:
weight_name = LORA_WEIGHT_NAME_SAFE
try:
model_file = _get_model_file(
pretrained_model_name_or_path_or_dict,
weights_name=weight_name,
cache_dir=cache_dir,
force_download=force_download,
resume_download=resume_download,
proxies=proxies,
local_files_only=local_files_only,
use_auth_token=use_auth_token,
revision=revision,
subfolder=subfolder,
user_agent=user_agent,
)
state_dict = safetensors.torch.load_file(model_file, device="cpu")

There's no exception raised because the file indeed exists, it's just not in safetensors format. So I guess we need a safe_serialization or from_safetensors or some such kwarg maybe?

Looking at the surrounding code the easy way around this is to call torch.load() myself and pass the state_dict but I don't think that's a good approach. Let me know if I'm doing anything wrong but I think I should be able to specify an exact file, right?

Looking back at the original PR intro I guess this is exactly what @Narsil says:

What's techically missing is the option to choose on load what the format is along with the weights_name. I didn't want to add it here for simplicity (Since most users should be using the default anyway). But we could add that.

So should I open a new issue for this?

Thanks!

gadicc added a commit to kiri-art/docker-diffusers-api that referenced this pull request Mar 3, 2023
@Narsil Narsil deleted the lora_safetensors branch March 3, 2023 22:45
@Narsil
Copy link
Contributor Author

Narsil commented Mar 4, 2023

Ooops ! Thanks for notifying. I created a fix here: #2551

@gadicc
Copy link

gadicc commented Mar 4, 2023

So fast! Thanks, @Narsil! 🙏

@Ir1d
Copy link

Ir1d commented Mar 20, 2023

Exception: Error while deserializing header: HeaderTooLarge still appears for the lora workflow (train + test)

@Narsil
Copy link
Contributor Author

Narsil commented Mar 20, 2023

@Ir1d can you provide a reproducible workflow (ideally fast to execute) ?

@kilimchoi
Copy link

kilimchoi commented Mar 21, 2023

@Narsil would something like this work now?

model = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
model.scheduler = DPMSolverMultistepScheduler.from_config(model.scheduler.config)
model.unet.load_attn_procs(model_path, use_safetensors=True) ###model_path = 'xxx.safetensors'

@teaguexiao
Copy link

@Narsil would something like this work now?

model = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
model.scheduler = DPMSolverMultistepScheduler.from_config(model.scheduler.config)
model.unet.load_attn_procs(model_path, use_safetensors=True) ###model_path = 'xxx.safetensors'

I tried this in my env but it's not working.
I am also wondering a way to load .safetensors instead of the pytorch_lora_weights.bin, any ideas?

@Narsil
Copy link
Contributor Author

Narsil commented Mar 22, 2023

Do you have links to the model_path you're referring to ?

Here is a modified version of your scripts that creates the proper LoRA safetensors file:

from diffusers import *
import torch
from diffusers.models.attention_processor import AttnProcessor, LoRAAttnProcessor

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

model = pipe.unet
lora_attn_procs = {}
for name in model.attn_processors.keys():
    cross_attention_dim = None if name.endswith("attn1.processor") else model.config.cross_attention_dim
    if name.startswith("mid_block"):
        hidden_size = model.config.block_out_channels[-1]
    elif name.startswith("up_blocks"):
        block_id = int(name[len("up_blocks.")])
        hidden_size = list(reversed(model.config.block_out_channels))[block_id]
    elif name.startswith("down_blocks"):
        block_id = int(name[len("down_blocks.")])
        hidden_size = model.config.block_out_channels[block_id]

    lora_attn_procs[name] = LoRAAttnProcessor(
        hidden_size=hidden_size, cross_attention_dim=cross_attention_dim
    )
    lora_attn_procs[name] = lora_attn_procs[name].to(model.device)

    # add 1 to weights to mock trained weights
    with torch.no_grad():
        lora_attn_procs[name].to_q_lora.up.weight += 1
        lora_attn_procs[name].to_k_lora.up.weight += 1
        lora_attn_procs[name].to_v_lora.up.weight += 1
        lora_attn_procs[name].to_out_lora.up.weight += 1

model.set_attn_processor(lora_attn_procs)
model.save_attn_procs("./out", safe_serialization=True)
model.load_attn_procs("./out", use_safetensors=True) ###model_path = 'xxx.safetensors'

This should work ? HeaderTooLarge error seems to indicate the file you're having is corrupted in some way or isn't a safetensors file to begin with.

@fecet
Copy link
Contributor

fecet commented Mar 23, 2023

So our current workflow is use convert_lora_safetensor_to_diffusers.py to merge a lora to its base model, then if we want to separate it and use it like a native lora in diffuser we use this script? @Narsil

from diffusers import *
import torch
from diffusers.models.attention_processor import AttnProcessor, LoRAAttnProcessor

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

model = pipe.unet
lora_attn_procs = {}
for name in model.attn_processors.keys():
    cross_attention_dim = None if name.endswith("attn1.processor") else model.config.cross_attention_dim
    if name.startswith("mid_block"):
        hidden_size = model.config.block_out_channels[-1]
    elif name.startswith("up_blocks"):
        block_id = int(name[len("up_blocks.")])
        hidden_size = list(reversed(model.config.block_out_channels))[block_id]
    elif name.startswith("down_blocks"):
        block_id = int(name[len("down_blocks.")])
        hidden_size = model.config.block_out_channels[block_id]

    lora_attn_procs[name] = LoRAAttnProcessor(
        hidden_size=hidden_size, cross_attention_dim=cross_attention_dim
    )
    lora_attn_procs[name] = lora_attn_procs[name].to(model.device)

    # add 1 to weights to mock trained weights
    with torch.no_grad():
        lora_attn_procs[name].to_q_lora.up.weight += 1
        lora_attn_procs[name].to_k_lora.up.weight += 1
        lora_attn_procs[name].to_v_lora.up.weight += 1
        lora_attn_procs[name].to_out_lora.up.weight += 1

model.set_attn_processor(lora_attn_procs)
model.save_attn_procs("./out", safe_serialization=True)
model.load_attn_procs("./out", use_safetensors=True) ###model_path = 'xxx.safetensors'

@Narsil
Copy link
Contributor Author

Narsil commented Mar 23, 2023

So our current workflow is use convert_lora_safetensor_to_diffusers.py to merge a lora to its base model, then if we want to separate it and use it like a native lora in diffuser we use this script? @Narsil

Sorry I'm not familiar with this workflow nor this particular. Do you have some script+workflow I could reproduce to try and reproduce the faulty file ? My guess is that something went wrong during conversion leading to a bad file since everything look relatively straightforward in that script.
That, or the safe_serialization=True wasn't working properly somehow.

@fecet
Copy link
Contributor

fecet commented Mar 23, 2023

Sorry I'm not familiar with this workflow nor this particular. Do you have some script+workflow I could reproduce to try and reproduce the faulty file ? My guess is that something went wrong during conversion leading to a bad file since everything look relatively straightforward in that script. That, or the safe_serialization=True wasn't working properly somehow.

Sorry for misunderstanding, I'm trying to use a lora/ckpt from civitai with diffusers, and I wonder what's the correct way.
My attempt is downloading a civitai weight, convert it by this scripts. It works perfectly, and I can load it by StableDiffusionPipeline.from_pretrained.

Then I would like to use it with a lora, the suggested way seems to be

pipe.unet.load_attn_procs("lora_path", use_safetensors=True)

but that raised an KeyError: 'to_k_lora.down.weight'.

https://github.com/huggingface/diffusers/blob/main/scripts/convert_lora_safetensor_to_diffusers.py can merge it with its base model, this works but gave me a huge model, which cannot work easily with other base models. I wonder if we need to fuse it with the base model first and save it in the diffuser format using the script above in order to obtain a lightweight lora replica(like it original be).

@Narsil
Copy link
Contributor Author

Narsil commented Mar 23, 2023

but that raised an KeyError: 'to_k_lora.down.weight'.

This means the LoRA is still in SD format, and you need to change it to diffusers format I guess.
@pcuenca Might know more ?

mengfei25 pushed a commit to mengfei25/diffusers that referenced this pull request Mar 27, 2023
* Adding support for `safetensors` and LoRa.

* Adding metadata.
mengfei25 pushed a commit to mengfei25/diffusers that referenced this pull request Mar 27, 2023
@Youngboy12
Copy link

I want to know how to do it?

@patrickvonplaten
Copy link
Contributor

Duplicate of #2551 (comment) => let's make this a feature request

w4ffl35 pushed a commit to w4ffl35/diffusers that referenced this pull request Apr 14, 2023
* Adding support for `safetensors` and LoRa.

* Adding metadata.
w4ffl35 pushed a commit to w4ffl35/diffusers that referenced this pull request Apr 14, 2023
gadicc pushed a commit to kiri-art/docker-diffusers-api that referenced this pull request May 24, 2023
# [1.5.0](v1.4.0...v1.5.0) (2023-05-24)

### Bug Fixes

* **app:** async fixes for download, train_dreambooth ([0dcbd16](0dcbd16))
* **app:** diffusers callback cannot be async; use asyncio.run() ([7854649](7854649))
* **app:** up sanic RESPONSE_TIMEOUT from 1m to 1hr ([8e2003a](8e2003a))
* **attn_procs:** apply workaround only for storage not hf repos ([b98710f](b98710f))
* **attn_procs:** load non-safetensors attn_procs ourself ([072e7a3](072e7a3)), closes [/github.com/huggingface/diffusers/pull/2448#issuecomment-1453938119](https://github.com//github.com/huggingface/diffusers/pull/2448/issues/issuecomment-1453938119)
* **deps:** pin websockets<11.0 for sanic ([33ae2f4](33ae2f4))
* **inference:** return $error NO_MODEL_ID vs later crash on None ([46ea977](46ea977))
* **storage:** actually, always set self.status (default None) ([c309ca9](c309ca9))
* **storage:** don't set self.status to None ([9b88b80](9b88b80))
* **storage:** extract with dir= must not mutate dir (download, logs) ([b1f8f87](b1f8f87))
* **tests:** pin urlllib3 to < 2, avoids break in docker package ([ccf8231](ccf8231))

### Features

* **app:** run pipeline via asyncio.to_thread ([e87f7e7](e87f7e7))
* **attn_procs:** from_safetensors override, save .savetensors fname ([5fb6487](5fb6487))
* **cors:** add sanic-ext and set default cors-origin to "*" ([eb2a385](eb2a385))
* **diffusers:** bump to 0.15.0 + 2 weeks with lpw fix (9965cb5) ([77e9078](77e9078))
* **diffusers:** bump to latest diffusers, 0.14 + patches (see note) ([48a99a5](48a99a5))
* **download:** async, status; download.py: use download_and_extract ([bb7434a](bb7434a))
* **HTTPStorage:** store filename from content-disposition ([2066c44](2066c44))
* **loadModel:** send loadModel status ([db75740](db75740))
* **status:** initial status work ([d1cd39e](d1cd39e))
* **storage:** support misc tar compression; progress ([a8c8337](a8c8337))
* **stream_events:** stream send()'s to client too ([08daf4f](08daf4f))
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
* Adding support for `safetensors` and LoRa.

* Adding metadata.
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
* Adding support for `safetensors` and LoRa.

* Adding metadata.
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.