You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add support for frequency-based RoPE (Rotary Position Embedding) smoothing in the TransformerLens library to match Llama 3.1’s architecture.
Motivation
Llama 3.1 uses frequency-based smoothing in its positional embeddings to handle long-range dependencies more effectively. However, the current version of TransformerLens does not support this feature, limiting the ability to properly analyze Llama 3.1 models.
Pitch
Implement frequency-based RoPE smoothing to enhance positional encoding in Llama 3.1 models. This would improve TransformerLens’s compatibility with Llama 3.1 and provide a better tool for analyzing long-sequence tasks.
Alternatives
Continue using TransformerLens with standard RoPE, but this would not fully support Llama 3.1’s unique architecture.
Checklist
I have checked that there is no similar issue in the repo (required)
The text was updated successfully, but these errors were encountered:
@frances720 Sorry for the late reply! It appears that you may be trying to write your branch to the TransformerLens repo? You need to make your PR from your fork. If you need help on this, you can reach me on the slack channel. Let me know if you need an invite!
Proposal
Add support for frequency-based RoPE (Rotary Position Embedding) smoothing in the TransformerLens library to match Llama 3.1’s architecture.
Motivation
Llama 3.1 uses frequency-based smoothing in its positional embeddings to handle long-range dependencies more effectively. However, the current version of TransformerLens does not support this feature, limiting the ability to properly analyze Llama 3.1 models.
Pitch
Implement frequency-based RoPE smoothing to enhance positional encoding in Llama 3.1 models. This would improve TransformerLens’s compatibility with Llama 3.1 and provide a better tool for analyzing long-sequence tasks.
Alternatives
Continue using TransformerLens with standard RoPE, but this would not fully support Llama 3.1’s unique architecture.
Checklist
The text was updated successfully, but these errors were encountered: