Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cohere command r #42

Open
flaviusburca opened this issue Jun 4, 2024 · 1 comment
Open

Cohere command r #42

flaviusburca opened this issue Jun 4, 2024 · 1 comment

Comments

@flaviusburca
Copy link

Is it possible to adapt this to cohere command-r models ?

@Mooler0410
Copy link
Collaborator

Hi! If the model mentioned is CohereForAI/c4ai-command-r-v01, we believe it's possible. It uses typical RoPE. We quickly checked its implementation in Hugging Face's Transformers library. It looks pretty similar to Llama. You can refer to our Llama implementation to modify Cohere's code.

One thing that could matter is that CohereForAI/c4ai-command-r-v01 uses a very large RoPE theta—8,000,000.0, which is much larger than that of other models. This may cause the empirical rule for selecting good hyperparameters (group size, neighbor window) to fail. You may need to try several combinations to find a better one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants