Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Add frequency-based RoPE support for Llama 3.1 models #719

Closed
1 task done
frances720 opened this issue Sep 9, 2024 · 3 comments
Closed
1 task done
Assignees

Comments

@frances720
Copy link

Proposal

Add support for frequency-based RoPE (Rotary Position Embedding) smoothing in the TransformerLens library to match Llama 3.1’s architecture.

Motivation

Llama 3.1 uses frequency-based smoothing in its positional embeddings to handle long-range dependencies more effectively. However, the current version of TransformerLens does not support this feature, limiting the ability to properly analyze Llama 3.1 models.

Pitch

Implement frequency-based RoPE smoothing to enhance positional encoding in Llama 3.1 models. This would improve TransformerLens’s compatibility with Llama 3.1 and provide a better tool for analyzing long-sequence tasks.

Alternatives

Continue using TransformerLens with standard RoPE, but this would not fully support Llama 3.1’s unique architecture.
Screen Shot 2024-09-08 at 10 09 07 PM

Checklist

  • I have checked that there is no similar issue in the repo (required)
@frances720
Copy link
Author

I have a PR for it but when I ran git push --set-upstream origin frances/llama31_rope it returned 403

@bryce13950
Copy link
Collaborator

@frances720 Sorry for the late reply! It appears that you may be trying to write your branch to the TransformerLens repo? You need to make your PR from your fork. If you need help on this, you can reach me on the slack channel. Let me know if you need an invite!

@bryce13950
Copy link
Collaborator

This has been resolved in a recent release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants