-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LongLora for both full and lora fine-tuning #1350
base: main
Are you sure you want to change the base?
Conversation
@rasbt to answer your previous question: LongLora is not enabled by default as both |
Thanks! I think LongLoraArgs might be better, especially if it can be used in multiple approaches, e.g., |
I've just trained a model with python litgpt/finetune/lora.py \
--config=/teamspace/studios/this_studio/litgpt/config_hub/finetune/mistral-7b/longlora.yaml \
--checkpoint_dir=/teamspace/studios/this_studio/litgpt/checkpoints/mistralai/Mistral-7B-Instruct-v0.1 One generation that I've obtained with python litgpt/generate/base.py \
--checkpoint_dir ../out/finetune/lora-mistral-7b/final \
--prompt="Recommend a movie for me to watch during the weekend and explain the reason." \
--max_new_tokens=128 is the following:
|
Nice, this is a good sign that things work! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR so far. This looks great. I'll probably have to make multiple passes over it because it's a lot of changes, but thanks for doing this so far!
Regarding the formatting, I see there's a lot of stylistic changes, which are a bit distracting as they show up in the file diffs. I would say maybe don't apply those here just so we can focus on the core changes.
"""Whether to enable LongLora.""" | ||
n_groups: int = 4 | ||
"""Number of groups to divide the sequence length into.""" | ||
context_length: int = 8192 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder here what happens if the model has a longer context already. A good test case could be LongChat (supported in LitGPT).
I wonder if this should be a factor (2x the original context length) or None by default and then infer 2x the original context length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've put a double check: one in the validate_longlora_args
where if LongLora is used and longlora_context_length <= model.block_size
then a warning is raised and LongLora is disabled; the other one before the model creation where I increase the model block-size and RoPE-condense-ratio only if LongLora is enabled and longlora_context_length > model.block_size
. I can remove the second check and we can fallback to None as default, and in that case infer the 2x
context_length: int = 8192 | ||
"""Length of the enlarged context window.""" | ||
trainable_params: str = "wte,norm,ln" | ||
"""List of comma-separated parameters to train in LongLora.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the other options? Are "wte,norm,ln"
the only allowed ones or are there more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, I wasn't clear. I meant more like what are the supported options here? What values can a user typically put in? ...
But you probably can't use Literal here because of the various combinations within that string. But in the comments, maybe could you mention which of the terms within that comma-separated string are supported?
Sorry, i didn't get it. I was looking at the model and if i'm not missing something i think that those are the only ones left other than the LoRA layers (controlled by the arguments in the finetune/lora.py
script). I can add a check to prevent the user to input anything other a combination of those three layers?
litgpt/deploy/serve.py
Outdated
temperature: float = 0.8, | ||
top_k: int = 50, | ||
max_new_tokens: int = 50) -> None: | ||
def __init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the cleanup here!
if resume is True: | ||
resume = max(out_dir.rglob("step-*/*.pth"), key=(lambda p: int(p.parent.name.split("-")[1]))) | ||
if resume: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if resume is True
and if resume
are equivalent, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those lines were in lines 142-143 before and I put those there because I need to resume the previous LongLoraArgs and overwrite the current ones. I think that you were doing that also before because resume
can be both a boolean (and in that case the checkpoint file will be loaded from the folder with the greater step) and a path to a checkpoint file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the other options? Are
"wte,norm,ln"
the only allowed ones or are there more?
In the paper the authors have specified that to increase the context length while using LoRA and be effective you also need to fine-tune the embedding layer and every norm layer (ref. Table 2) without specifying anything else. I put there the defaults for the LoRA fine-tuning and leave it to the user the experimentation with other values
Oh sorry, I wasn't clear. I meant more like what are the supported options here? What values can a user typically put in? E.g., analogous to Line 46 in b9ddd8b
But you probably can't use |
Sorry for the long silence, and thanks again for this great PR! I have just been a bit swamped with work lately but hopefully can circle back to it some time. |
Absolutely no worries! Thank you for taking the time to watch this! |
Follow up of #1346.
This PR introduces LongLora as in #1237 for both the LoRa and full fine-tuning, while also enabling it during generation.
cc @rasbt