Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SD3 missing support from-single-file #8546

Closed
vladmandic opened this issue Jun 13, 2024 · 9 comments · Fixed by #8631
Closed

SD3 missing support from-single-file #8546

vladmandic opened this issue Jun 13, 2024 · 9 comments · Fixed by #8631
Labels
bug Something isn't working

Comments

@vladmandic
Copy link
Contributor

vladmandic commented Jun 13, 2024

Describe the bug

StableDiffusion3Pipeline does implement from_single_file which correctly loads DiT and VAE.
however, it fails to deal with any of the text-encoders: TE1, TE2 and TE3.

  • if loading sd3_medium.safetensors that is understandable as that model does have any TEs baked in
  • if loading sd3_medium_incl_clips.safetensors expectation is that TE1 and TE2 would be loaded correctly and TE3 would be skipped. true, load does not fail, but nothing actually works. see reproduction
  • if loading sd3_medium_incl_clips_t5xxlfp8.safetensors expectation is same as above plus that FP8 version of TE3 would be correctly loaded - right now that is not yet done and TE3 must be loaded separately.

Reproduction

import warnings
import torch
import diffusers
import transformers
import rich.traceback

rich.traceback.install()
warnings.filterwarnings(action="ignore", category=FutureWarning)
repo_id = 'stabilityai/stable-diffusion-3-medium-diffusers'
cache_dir = '/mnt/models/Diffusers'

pipe = diffusers.StableDiffusion3Pipeline.from_single_file(
    '/mnt/models/stable-diffusion/sd3/sd3_medium_incl_clips.safetensors',
    torch_dtype = torch.float16,
    text_encoder_3 = None,
    tokenizer_3 = None,
    cache_dir = cache_dir,
)

# pipe.text_encoder = transformers.CLIPTextModelWithProjection.from_pretrained(repo_id, subfolder='text_encoder', cache_dir=cache_dir, torch_dtype=torch.float16)
# pipe.text_encoder_2 = transformers.CLIPTextModelWithProjection.from_pretrained(repo_id, subfolder='text_encoder_2', cache_dir=cache_dir, torch_dtype=torch.float16)

pipe.to('cuda')

result = pipe(
    prompt='A photo of a cat',
    width=1024,
    height=1024,
)
image = result.images[0]
image.save('test.png')

results in runtime error on pipe.to('cuda')

Cannot copy out of meta tensor;

or if model is not moved:

Tensor on device cpu is not on the expected device meta

enabling two lines that load TE1 and TE2 make model actually work without issues.

for TE3, attempting to load sd3_medium_incl_clips_t5xxlfp8 results in the same error, so loading it manually is the only way:

pipe.text_encoder_3 = transformers.T5EncoderModel.from_pretrained(
    repo_id,
    subfolder='text_encoder_3',
    quantization_config=transformers.BitsAndBytesConfig(load_in_8bit=True),
    cache_dir=cache_dir,
)
pipe.tokenizer_3 = transformers.T5TokenizerFast.from_pretrained(
    repo_id,
    subfolder='tokenizer_3',
    cache_dir=cache_dir,
)

all-in-all, this totally defeats the point of using from_single_file as loading as TE1, TE2 and TE3 have to be manually added to model by loading using from_pretrained.

Logs

No response

System Info

diffusers==0.29.0
torch==2.3.1
cuda==12.1
ubuntu==24.04

Who can help?

@yiyixuxu @sayakpaul @DN6

@vladmandic vladmandic added the bug Something isn't working label Jun 13, 2024
@yiyixuxu
Copy link
Collaborator

Hi @vladmandic
we merged in this PR today #8517 which expanded support for from_single_file

@vladmandic
Copy link
Contributor Author

ahh, sorry, i was looking at commit log before opening an issue and somehow missed that.
tested with te1 and te2 loaded from sd3_medium_incl_clips.safetensors, that is now working.

te3 loading from sd3_medium_incl_clips_t5xxlfp8.safetensors is failing

/home/vlado/dev/sdnext/venv/lib/python3.12/site-packages/diffusers/loaders/single_file_utils.py:892 in convert_ldm_unet_checkpoint
892 new_checkpoint[diffusers_key] = unet_state_dict[ldm_key]
KeyError: 'label_emb.0.0.weight'

also, do you have plan to release diffusers==0.29.1 patch release with all the extra work that has been going on since release?

@yiyixuxu
Copy link
Collaborator

yep, we will do a patch soon once our single-file support is "vlad-approved" 😁 and this one is in #8506

@yiyixuxu
Copy link
Collaborator

cc @DN6 for the fp8 failure

@DN6
Copy link
Collaborator

DN6 commented Jun 17, 2024

@vladmandic Could you try installing from main? I'm able to load the FP8 checkpoint on my end.

@vladmandic
Copy link
Contributor Author

@DN6 I can load it now, but it does not load TE3 at all.

pipe = diffusers.StableDiffusion3Pipeline.from_single_file('sd3_medium_incl_clips_t5xxlfp8.safetensors')
print('TE1', pipe.text_encoder)
print('TE2', pipe.text_encoder_2)
print('TE3', pipe.text_encoder_3)

you can see that from_single_file does not complain about missing TE3, it simply does not load it - its None

@yiyixuxu
Copy link
Collaborator

@DN6 I can reproduce this too

@yiyixuxu
Copy link
Collaborator

@vladmandic can you check if this works now? #8631

@vladmandic
Copy link
Contributor Author

confirmed as working with that fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants