"Dynamic" Issue in LlamaDynamicNTKScalingRotaryEmbedding - Long context inference will impact short context inference. #25306

LetianLee · 2023-08-04T00:31:00Z

System Info

transformers version: 4.32.0.dev0
Platform: Linux-5.15.109+-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.16.4
Safetensors version: 0.3.1
Accelerate version: 0.22.0.dev0
Accelerate config: not found
PyTorch version (GPU?): 2.0.1+cu118 (True)
Tensorflow version (GPU?): 2.12.0 (True)
Flax version (CPU?/GPU?/TPU?): 0.7.0 (gpu)
Jax version: 0.4.13
JaxLib version: 0.4.13
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

Who can help?

@sgugger

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Please see my colab code:
https://colab.research.google.com/drive/1SnQQxW7WMHgSOvAwF_HIlIDrAuXZ4IKp?usp=sharing

I asked the same prompt twice, with a long-context prompt inserted in between. However, this intermediate long-context inference resulted in different answers for the same question before and after it.

Expected behavior

Since the input length of the tested prompts is within the maximum input token capacity the model can handle, the significance of "Dynamic" lies in ensuring that the embeddings for the inputs before and after remain the same, and consequently, the output results should also be the same.

I reviewed the code of the class "LlamaDynamicNTKScalingRotaryEmbedding" and I think that due to caching, when the model infers a long context, the cached values of cos_cached and sin_cached are updated to adapt to the longer context. This causes the issue when the model infers a shorter context again.

The text was updated successfully, but these errors were encountered:

amyeroberts · 2023-08-04T10:15:36Z

cc @gante @ArthurZucker

ArthurZucker · 2023-08-04T12:35:32Z

Hey! Thanks for reporting, this is a duplicate of #25104. Will link it in the PR as well

github-actions · 2023-09-03T08:02:11Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

i4never · 2023-10-16T03:16:31Z

Hey! Thanks for reporting, this is a duplicate of #25104. Will link it in the PR as well

No, they're not same. I understand #25104 is about the trade off between using kv cache and rotary embed inconsistence. But when you freeze everything during generation including random seeds, same input should give same output sequence.

The dynamic ntk rotary will only recalculate if input seq is longer than cached. What if the longest sequence is predicted at first? Cached embed will never change again. PR #25308 is a correct fix without extra calculate. I think it should be merged.
@gante

ArthurZucker · 2023-10-16T08:46:12Z

I see. Makes sense for me @gante if you can have a look! 🤗

gante · 2023-10-23T15:56:26Z

@i4never I agree, it is a limitation of the technique when implemented as the authors suggest. #25308 is not the correct fix either -- we should only resize the sin and cos caches down to the original size, as smaller values will likely have a negative impact.

Would you like to open a PR to fix it? :)

i4never · 2023-10-24T08:18:11Z

@i4never I agree, it is a limitation of the technique when implemented as the authors suggest. #25308 is not the correct fix either -- we should only resize the sin and cos caches down up to the original size, as smaller values will likely have a negative impact.

Would you like to open a PR to fix it? :)

#27033

LetianLee mentioned this issue Aug 4, 2023

Fixed "Dynamic" issue in LlamaDynamicNTKScalingRotaryEmbedding #25308

Closed

5 tasks

github-actions bot closed this as completed Sep 11, 2023

i4never mentioned this issue Oct 24, 2023

Fix LlamaDynamicNTKScalingRotaryEmbedding cache #27033

Closed

5 tasks

i4never mentioned this issue Oct 26, 2023

添加长度外推 TigerResearch/TigerBot#126

Merged

ArthurZucker mentioned this issue Nov 6, 2023

Current implementation for DynamicNTKScalingRotaryEmbedding in modeling_llama.py does not update cos, sin correctly. #27226

Closed

ArthurZucker mentioned this issue Nov 20, 2023

LlamaRotaryEmbedding (wrong cache value when casting model to float16/bfloat16) #25681

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Dynamic" Issue in LlamaDynamicNTKScalingRotaryEmbedding - Long context inference will impact short context inference. #25306

"Dynamic" Issue in LlamaDynamicNTKScalingRotaryEmbedding - Long context inference will impact short context inference. #25306

LetianLee commented Aug 4, 2023

amyeroberts commented Aug 4, 2023

ArthurZucker commented Aug 4, 2023

github-actions bot commented Sep 3, 2023

i4never commented Oct 16, 2023 •

edited

Loading

ArthurZucker commented Oct 16, 2023

gante commented Oct 23, 2023 •

edited

Loading

i4never commented Oct 24, 2023

"Dynamic" Issue in LlamaDynamicNTKScalingRotaryEmbedding - Long context inference will impact short context inference. #25306

"Dynamic" Issue in LlamaDynamicNTKScalingRotaryEmbedding - Long context inference will impact short context inference. #25306

Comments

LetianLee commented Aug 4, 2023

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

amyeroberts commented Aug 4, 2023

ArthurZucker commented Aug 4, 2023

github-actions bot commented Sep 3, 2023

i4never commented Oct 16, 2023 • edited Loading

ArthurZucker commented Oct 16, 2023

gante commented Oct 23, 2023 • edited Loading

i4never commented Oct 24, 2023

i4never commented Oct 16, 2023 •

edited

Loading

gante commented Oct 23, 2023 •

edited

Loading