-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference throwing: TypeError: forward() got an unexpected keyword argument 'position_ids #6
Comments
Good catch, thank you. I'll fix things up for the latest transformers head. In the meantime, you could try transformers @ commit a92e0ad2e20ef4ce28410b5e05c5d63a5a304e65 |
@fpgaminer Your latest triton updates is really fast. Can't believe it. GPTQ-for-LlaMa ported your new codes over and it finally made the triton branch not only useful but the fastest, on all my real-world tests. Btw, not sure if it is transformer related or triton related but beam searching doesn't appear to work? I expected a slow down with num_beams going up but I get the same performance /s back which doesn't make much sense. Does triton need to implement beam or that should be handled by higher level transformer api? I am trying to isolate why beams are not functioning. Thanks. |
That should be handled in the transformers library. |
FYI, there's a 10% performance regression in the latest |
Update: As of the latest GPTQ-triton commit (3daf413), |
Currently quantizing 30b 4bit using repo's new quantize script and will do some testing later. Will post finding here. |
@fpgaminer Confirmed transformer[head] compat issue fixed with quantized 30b 4bit using your repo's quantize script. However, found a beaming issue at #11 |
Env:
Ubuntu 22.04
pytorch 2.1 nightly cuda 11.8
transformer[head]
peft[head]
Reproduction steps:
Result:
Is the quantlized code not compatible with
transformer[head]
? Or am I doing something wrong?Generation code
The text was updated successfully, but these errors were encountered: