Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StableDiffusion3Img2ImgPipeline.__call__() is missing width and height parameters #9933

Open
chie2727 opened this issue Nov 15, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@chie2727
Copy link

Describe the bug

The docstring for the StableDiffusion3Img2ImgPipeline.__call__() function includes width and height parameters, but the function itself does not include these parameters.
Is this a typo or is width and height supposed to be handled by the function?

Source file:
diffusers/src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3_img2img.py

Reproduction

import torch
from diffusers import StableDiffusion3Img2ImgPipeline

pipe = StableDiffusion3Img2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-medium",
    torch_dtype=torch_dtype,
    cache_dir=torch.float16,
    token=hf_token,
)

image = pipe(
    prompt="Resize the input image",
    image=input_image
    width=1024,
    height=512,
    strength=1.0
).images[0]

Logs

TypeError: StableDiffusion3Img2ImgPipeline.__call__() got an unexpected keyword argument 'width'

System Info

  • 🤗 Diffusers version: 0.31.0
  • Platform: Linux-6.1.79-99.167.amzn2023.x86_64-x86_64-with-glibc2.34
  • Running on Google Colab?: No
  • Python version: 3.11.6
  • PyTorch version (GPU?): 2.5.1+cu118 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.26.2
  • Transformers version: 4.46.2
  • Accelerate version: 1.1.1
  • PEFT version: not installed
  • Bitsandbytes version: not installed
  • Safetensors version: 0.4.5
  • xFormers version: not installed
  • Accelerator: NVIDIA A10G, 23028 MiB
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: distributed

Who can help?

@yiyixuxu @sayakpaul

@chie2727 chie2727 added the bug Something isn't working label Nov 15, 2024
@ghunkins
Copy link
Contributor

As far as I understand it, height and width are inferred from the input image. Docstring addition appears to be a copy-paste error.

Adding any required resizing prior to sending the image to the pipeline should yield what you're looking for!

@chie2727
Copy link
Author

@ghunkins
So it's copy-paste error in the docstring then - thank you for clarifying!

I was hoping to be able to specify an output image size that differs from the input image size, but I'll do a bit more research into how to achieve this.

@sayakpaul
Copy link
Member

@ghunkins thanks for helping out. @chie2727 feel free to close the issue if you think if it's resolved.

@ukaprch
Copy link

ukaprch commented Nov 15, 2024

Interestingly, FLUX has no such limitation in their FluxImg2ImgPipeline.

@asomoza
Copy link
Member

asomoza commented Nov 15, 2024

Hi, what would be the use case of using a different width and height. That only will result in a distorted image if they don't match the source image, why people would want that?

If the others had it (img2img) and there's a genuine use case maybe we can add it.

@ukaprch
Copy link

ukaprch commented Nov 15, 2024 via email

@ghunkins
Copy link
Contributor

@chie2727 Here is some documentation from PIL as to various image resizing techniques given a specific desired size. Best of luck!

https://pillow.readthedocs.io/en/stable/reference/ImageOps.html#resize-relative-to-a-given-size

from PIL import ImageOps

required_size = (1024, 512)
resized_input_image = ImageOps.fit(input_image, required_size)

image = pipe(
    prompt="Resize the input image",
    image=resized_input_image,
    strength=0.5,
).images[0]

@asomoza
Copy link
Member

asomoza commented Nov 15, 2024

To your point, resizing the image for the best sizes that FLUX supports would be the main reason taking into account aspect ratios. Based on a post I saw on Reddit which made sense, these are the best sizes to use for FLUX ...

@ukaprch but this is for SD3 and not Flux, also I agree that there are some resolutions that works best for the models but that only make sense using the txt2img pipelines, with img2img as @ghunkins pointed out, people should resize the source image before feeding it to the pipeline, otherwise the generated image will be distorted.

@ukaprch
Copy link

ukaprch commented Nov 16, 2024

I agree with you, the sizes I showed are for Flux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants