-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPTQ Env vars: catch correct type of error #596
Conversation
Can you provide an example of model where this error is triggered instead ?I 'm very surprised that this error could be raised. |
This is the line the exception is being thrown from: https://github.com/huggingface/text-generation-inference/blob/f2f0289fb99c7caab0c3749fdf211e4d5ab2938b/server/text_generation_server/utils/weights.py#L49C22-L49C22 Following through the code this is what I've pieced together: Happy to be proven wrong here! ... I'll find a model in in a sec that triggers this and put the link in here ... |
100% ! Catching and reraising at its best... |
Merging, tests are red because you don't have access to our secrets. |
@olivier FYI |
Quick update, I've got a model in a private repo that I quantized with GPTQ-for-Llama that triggers this reliably. That's here: https://huggingface.co/ssmi153/student-feedback-llama-30b-guanaco-2023-07-03-final-gptq and I can DM over a HuggingFace token to give you access to it (probably shouldn't put that out in public though!). I've been trying to find other public GPTQ model files that do it too, but for all of TheBloke's conversions I run into another issue:
This is from this model: https://huggingface.co/TheBloke/orca_mini_v2_7B-GPTQ It's also a conversion using GPTQ-for-Llama so should be broadly compatible. Here's the quantization config: { | "bits": 4, Is this incompatibility due to the desc_act = false, or something else? |
Annnnd this is why I don't particularly enjoy maintaining external models... I'm not sure I have the bandwidth to really investigate an escape hatch. |
What does this PR do?
When passing in environment variables like gptq_bits, we still get errors thrown from TGI because the try/catch block is catching the wrong type of error. This PR aims to fix that.
@Narsil - let me know if this is how you want this formatted. My Python is a little shaky, so I hope this syntax is correct.