-
Notifications
You must be signed in to change notification settings - Fork 26.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Beam Search Fails for Llama 70b #26332
Comments
I also tried installing from github and received the same error. See the update environment below (same script/error)
|
It's also worth mentioning, I noticed that for lower numbers of tokens (10 tokens generated) this error did not occur. It only happened for longer generations, such as the up to 512 token runs above. |
cc @Rocketknight1 an example of how to produce the |
Got it! I'll see if I can reproduce this and push a fix to LLaMA (which might also help bringing the code into line with the InternLM code) |
I made a much shorter reproduction script for the issue that doesn't need
The issue occurs on GPU and CPU, in float16/bfloat16/float32. It is only triggered by beam search, and doesn't occur with standard generation. Working on it! |
Further update: This issue only occurs in 'beam sample' decoding, not 'beam search'. As a temporary workaround @jconley-deloitte , you can add |
Got it: This is nothing to do with LLaMA's code at all! The cause is that LLaMA's The reason seems to be that Possible solutions include tweaking the In the meantime @jconley-deloitte, you can either use |
@Rocketknight1 thank you for diving in!
We already do this. The root of the issue is that in There has been a similar issue in the past , and, regardless of being a bug that causes crashes, I think that it makes more sense to apply the logits warpers before adding the scores: All this to say that I'm going to open a PR to break the legacy behavior, as it is a recurrent issue that up takes significant time every time it pops up :) I've tested locally, and changing this detail fixes the crashing snippets! |
➕ on breaking this as we have had quite a lot of issues. Having a self.legacy flag might be ok to have a deprecation cycle / just keep both for |
System Info
transformers
version: 4.33.2Who can help?
@gante Appears to the be the relevant developer because this is an issue with model.generate
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
The below script only needs a change to have a TOKEN and CACHE_DIR and can be ran to generate the error.
I have tried with and without autocast, and it does not affect this.
I have also verified that the GPUs/Machine are not memory constrained.
Greedy generation works as expected, only beam-search is failing.
The resulting error
Expected behavior
The model generates tokens using beam search.
The text was updated successfully, but these errors were encountered: