-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pixart-sigma test to image example #247
Conversation
Ah nice. With int4, there's a drastic performance drop. Can we serialize and deserialize the weights too? |
I have helpers for transformers llm models that can help, but I haven't wrote the equivalent for pipelines: https://github.com/huggingface/optimum-quanto/blob/main/README.md#llm-models Ideally, if we were able to load each quantized submodel individually and pass the list to some pipeline creation method it would work. |
Yes, loading individual modules should be more than enough. Thanks! |
@dacorvo the example states that with int4 and CUDA, it won't work. Does it still apply? |
Also, for Traceback (most recent call last):
File "/home/sayak/optimum-quanto/examples/vision/text-to-image/quantize_pixart_sigma.py", line 78, in <module>
image = pipeline(
File "/home/sayak/.pyenv/versions/diffusers/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/sayak/diffusers/src/diffusers/pipelines/pixart_alpha/pipeline_pixart_sigma.py", line 834, in __call__
noise_pred = self.transformer(
File "/home/sayak/.pyenv/versions/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/sayak/.pyenv/versions/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/sayak/diffusers/src/diffusers/models/transformers/pixart_transformer_2d.py", line 321, in forward
hidden_states = self.proj_out(hidden_states)
File "/home/sayak/.pyenv/versions/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/sayak/.pyenv/versions/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/sayak/optimum-quanto/optimum/quanto/nn/qlinear.py", line 45, in forward
return torch.nn.functional.linear(input, self.qweight, bias=self.bias)
File "/home/sayak/optimum-quanto/optimum/quanto/tensor/qtensor.py", line 90, in __torch_function__
return qfunc(*args, **kwargs)
File "/home/sayak/optimum-quanto/optimum/quanto/tensor/qtensor_func.py", line 152, in linear
return QTensorLinear.apply(input, other, bias)
File "/home/sayak/.pyenv/versions/diffusers/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/sayak/optimum-quanto/optimum/quanto/tensor/qtensor_func.py", line 130, in forward
output = output + bias
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. I am on PyTorch 2.3.1. CUDA VERSION is 12.2. Anything am I missing out here? It doesn't seem to be a problem for Colab, though. Could be because of the driver versions? DGX: Driver Version: 535.129.03 |
Yes, I am tracking this in #248. As you can see in the issue, the workaround is to exclude the final proj. |
Ah thank you! |
What does this PR do?
This adds a simple example of quantization of a pixart sigma diffusers pipeline.
Both the
text_encoder
andtransformer
models of the pipeline are quantized.This pull-request also fixes #231.