Fixed "Dynamic" issue in LlamaDynamicNTKScalingRotaryEmbedding #25308

LetianLee · 2023-08-04T02:49:23Z

What does this PR do?

In "LlamaDynamicNTKScalingRotaryEmbedding" , when the Llama model infers a long context, the cached values of cos_cached and sin_cached are updated to adapt to the longer context. This causes the issue when the model infers a shorter context again.

This PR rewrites the forward function in the LlamaDynamicNTKScalingRotaryEmbedding class. It ensures that the _set_cos_sin_cache function is executed as long as the input length is not equal to the cached length. Meanwhile, the inv_freq will not be saved when it's changed to adapt to the long context. Here is my code for this class:

class LlamaDynamicNTKScalingRotaryEmbedding(LlamaRotaryEmbedding):
    """LlamaRotaryEmbedding extended with Dynamic NTK scaling. Credits to the Reddit users /u/bloc97 and /u/emozilla"""

    def __init__(self, dim, max_position_embeddings=2048, base=10000, device=None, scaling_factor=1.0):
        self.scaling_factor = scaling_factor
        super().__init__(dim, max_position_embeddings, base, device)

    def _set_cos_sin_cache(self, seq_len, device, dtype):
        self.max_seq_len_cached = seq_len

        inv_freq = self.inv_freq.to(device)
        if seq_len > self.max_position_embeddings:
            base = self.base * (
                (self.scaling_factor * seq_len / self.max_position_embeddings) - (self.scaling_factor - 1)
            ) ** (self.dim / (self.dim - 2))
            inv_freq = 1.0 / (base ** (torch.arange(0, self.dim, 2).float().to(device) / self.dim))

        t = torch.arange(self.max_seq_len_cached, device=device, dtype=self.inv_freq.dtype)

        freqs = torch.einsum("i,j->ij", t, inv_freq)
        # Different from paper, but it uses a different permutation in order to obtain the same calculation
        emb = torch.cat((freqs, freqs), dim=-1)
        self.register_buffer("cos_cached", emb.cos()[None, None, :, :].to(dtype), persistent=False)
        self.register_buffer("sin_cached", emb.sin()[None, None, :, :].to(dtype), persistent=False)
    
    def forward(self, x, seq_len=None):
        # x: [bs, num_attention_heads, seq_len, head_size]
        if seq_len != self.max_seq_len_cached:
            self._set_cos_sin_cache(seq_len=seq_len, device=x.device, dtype=x.dtype)

        return (
            self.cos_cached[:, :, :seq_len, ...].to(dtype=x.dtype),
            self.sin_cached[:, :, :seq_len, ...].to(dtype=x.dtype),
        )

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
"Dynamic" Issue in LlamaDynamicNTKScalingRotaryEmbedding - Long context inference will impact short context inference. #25306
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Hi @sgugger , would you please help me review it? Thanks!

sgugger · 2023-08-04T07:16:39Z

cc @ArthurZucker and @gante

HuggingFaceDocBuilderDev · 2023-08-04T07:34:03Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

github-actions · 2023-09-03T08:02:10Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Fixed "Dynamic" issue in LlamaDynamicNTKScalingRotaryEmbedding

0d58b04

ArthurZucker mentioned this pull request Aug 4, 2023

Inconsistent Rotation Base for Dynamic NTK Scaling RoPE #25104

Closed

4 tasks

github-actions bot closed this Sep 11, 2023

i4never mentioned this pull request Oct 16, 2023

"Dynamic" Issue in LlamaDynamicNTKScalingRotaryEmbedding - Long context inference will impact short context inference. #25306

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed "Dynamic" issue in LlamaDynamicNTKScalingRotaryEmbedding #25308

Fixed "Dynamic" issue in LlamaDynamicNTKScalingRotaryEmbedding #25308

LetianLee commented Aug 4, 2023

sgugger commented Aug 4, 2023

HuggingFaceDocBuilderDev commented Aug 4, 2023

github-actions bot commented Sep 3, 2023

Fixed "Dynamic" issue in LlamaDynamicNTKScalingRotaryEmbedding #25308

Fixed "Dynamic" issue in LlamaDynamicNTKScalingRotaryEmbedding #25308

Conversation

LetianLee commented Aug 4, 2023

What does this PR do?

Before submitting

Who can review?

sgugger commented Aug 4, 2023

HuggingFaceDocBuilderDev commented Aug 4, 2023

github-actions bot commented Sep 3, 2023