Avoid nan during sampling in generate() #17937

ydshieh · 2022-06-29T13:20:36Z

What does this PR do?

Fix CI test error

            # sample
            probs = nn.functional.softmax(next_token_scores, dim=-1)
>           next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
E           RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

in
https://github.com/huggingface/transformers/runs/6959698965?check_suite_focus=true

The test test_sample_generate may still fail at

transformers/tests/generation/test_generation_utils.py

Line 711 in 8f40077

self.assertListEqual(output_sample.tolist(), output_generate.tolist())

for some unknown reason. I think it is better to investigate this in another PR.

ydshieh · 2022-06-29T13:32:21Z

I have some doubts here, as this will make all tokens having equal probability to be sampled. But with all -inf, nothing could be sampled which leads to error. I feel there is no well-defined expected results in such edge cases.

HuggingFaceDocBuilderDev · 2022-06-29T13:34:02Z

The documentation is not available anymore as the PR was closed or merged.

patrickvonplaten · 2022-06-29T21:22:37Z

src/transformers/generation_utils.py

@@ -1970,8 +1970,19 @@ def sample(
                        else (outputs.hidden_states,)
                    )

+            # To avoid all `-inf` along the vocab dimension (dim -1), which gives `nan` after `softmax` and error
+            # in `torch.multinomial`.
+            _next_token_scores = torch.max(


@ydshieh, softmax should be able to handle -inf correctly actually.
You can try:

torch.nn.functional.softmax(torch.tensor([0, float("-inf")]))

which works as mathematically expected.

It's only when all values are -inf that it doesn't work in which case this fix won't help because the generation is broken.

This will fix the nan issue actually. The concern is that it doesn't really make sense, as it changes the probability to uniform distribution along vocab dim, while in the broken cases, it is nothing can't be sampled (all probability 0 , mathematically)

patrickvonplaten

As explained here: https://github.com/huggingface/transformers/pull/17937/files#r910424357
this won't fix the problem. Also note that generation is used a lot so it's every additional operation (torch.max(...)) leads to a tiny slow down.

Usually if you get nan's after the softmax it means that the generation is broken anyways which can happen and I think there is little we can do against it

ydshieh · 2022-06-30T07:28:14Z

Yes, that happens only when all -inf along the vocab dim. I will close this PR, and we have to maybe create a doc with all possible flaky tests :-)

ydshieh added 2 commits June 29, 2022 15:14

fix nan during sampling

06a3aa3

fix style

cd53a51

ydshieh changed the title ~~fix nan during sampling~~ fix nan during sampling in generate() Jun 29, 2022

ydshieh changed the title ~~fix nan during sampling in generate()~~ Avoid nan during sampling in generate() Jun 29, 2022

ydshieh requested review from patrickvonplaten and patil-suraj June 29, 2022 13:21

patrickvonplaten reviewed Jun 29, 2022

View reviewed changes

ydshieh closed this Jun 30, 2022

ydshieh self-assigned this Jun 30, 2022

ydshieh mentioned this pull request Jun 30, 2022

[Do NOT merge 🙏 ] Skip a particular exception in test_sample_generate #17972

Closed

ydshieh deleted the fix_test_sample_generate_1 branch September 7, 2022 08:11

ydshieh mentioned this pull request Oct 31, 2023

Llama inference instability in fp16 producing inf in the middle of the model #27179

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid nan during sampling in generate() #17937

Avoid nan during sampling in generate() #17937

ydshieh commented Jun 29, 2022

ydshieh commented Jun 29, 2022

HuggingFaceDocBuilderDev commented Jun 29, 2022 •

edited

Loading

patrickvonplaten Jun 29, 2022

ydshieh Jun 30, 2022 •

edited

Loading

patrickvonplaten left a comment

ydshieh commented Jun 30, 2022

Avoid nan during sampling in generate() #17937

Avoid nan during sampling in generate() #17937

Conversation

ydshieh commented Jun 29, 2022

What does this PR do?

ydshieh commented Jun 29, 2022

HuggingFaceDocBuilderDev commented Jun 29, 2022 • edited Loading

patrickvonplaten Jun 29, 2022

Choose a reason for hiding this comment

ydshieh Jun 30, 2022 • edited Loading

Choose a reason for hiding this comment

patrickvonplaten left a comment

Choose a reason for hiding this comment

ydshieh commented Jun 30, 2022

HuggingFaceDocBuilderDev commented Jun 29, 2022 •

edited

Loading

ydshieh Jun 30, 2022 •

edited

Loading