Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pixart-sigma test to image example #247

Merged
merged 5 commits into from
Jul 18, 2024
Merged

Add pixart-sigma test to image example #247

merged 5 commits into from
Jul 18, 2024

Conversation

dacorvo
Copy link
Collaborator

@dacorvo dacorvo commented Jul 18, 2024

What does this PR do?

This adds a simple example of quantization of a pixart sigma diffusers pipeline.

Both the text_encoder and transformer models of the pipeline are quantized.

This pull-request also fixes #231.

@dacorvo dacorvo merged commit 28df7f1 into main Jul 18, 2024
12 checks passed
@dacorvo dacorvo deleted the sigma-xl branch July 18, 2024 16:04
@dacorvo
Copy link
Collaborator Author

dacorvo commented Jul 18, 2024

Here an example image with int4 weights on a T4 (device memory is only 3GB instead of 12 GB).

pixart-sigma-dtype@fp16-qtype@int4

@sayakpaul
Copy link
Member

Ah nice. With int4, there's a drastic performance drop.

Can we serialize and deserialize the weights too?

@dacorvo
Copy link
Collaborator Author

dacorvo commented Jul 19, 2024

I have helpers for transformers llm models that can help, but I haven't wrote the equivalent for pipelines:

https://github.com/huggingface/optimum-quanto/blob/main/README.md#llm-models

Ideally, if we were able to load each quantized submodel individually and pass the list to some pipeline creation method it would work.

@sayakpaul
Copy link
Member

Yes, loading individual modules should be more than enough. Thanks!

@sayakpaul
Copy link
Member

@dacorvo the example states that with int4 and CUDA, it won't work. Does it still apply?

@sayakpaul
Copy link
Member

sayakpaul commented Jul 22, 2024

Also, for int4 option in the example, it's failing on the DGX:

Traceback (most recent call last):
  File "/home/sayak/optimum-quanto/examples/vision/text-to-image/quantize_pixart_sigma.py", line 78, in <module>
    image = pipeline(
  File "/home/sayak/.pyenv/versions/diffusers/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/sayak/diffusers/src/diffusers/pipelines/pixart_alpha/pipeline_pixart_sigma.py", line 834, in __call__
    noise_pred = self.transformer(
  File "/home/sayak/.pyenv/versions/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/sayak/.pyenv/versions/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sayak/diffusers/src/diffusers/models/transformers/pixart_transformer_2d.py", line 321, in forward
    hidden_states = self.proj_out(hidden_states)
  File "/home/sayak/.pyenv/versions/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/sayak/.pyenv/versions/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/sayak/optimum-quanto/optimum/quanto/nn/qlinear.py", line 45, in forward
    return torch.nn.functional.linear(input, self.qweight, bias=self.bias)
  File "/home/sayak/optimum-quanto/optimum/quanto/tensor/qtensor.py", line 90, in __torch_function__
    return qfunc(*args, **kwargs)
  File "/home/sayak/optimum-quanto/optimum/quanto/tensor/qtensor_func.py", line 152, in linear
    return QTensorLinear.apply(input, other, bias)
  File "/home/sayak/.pyenv/versions/diffusers/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/sayak/optimum-quanto/optimum/quanto/tensor/qtensor_func.py", line 130, in forward
    output = output + bias
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I am on PyTorch 2.3.1. CUDA VERSION is 12.2.

Anything am I missing out here? It doesn't seem to be a problem for Colab, though.

Could be because of the driver versions?

DGX: Driver Version: 535.129.03
Colab: Driver Version: 535.104.05

@dacorvo
Copy link
Collaborator Author

dacorvo commented Jul 22, 2024

Yes, I am tracking this in #248. As you can see in the issue, the workaround is to exclude the final proj.

@sayakpaul
Copy link
Member

Ah thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fp8 leads to black images (numerical instabilities) for transformer diffusion models
2 participants