Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LongLora for both full and lora fine-tuning #1350

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

belerico
Copy link
Contributor

Follow up of #1346.

This PR introduces LongLora as in #1237 for both the LoRa and full fine-tuning, while also enabling it during generation.

cc @rasbt

@belerico belerico changed the title Add LongLora Add LongLora for both full and lora fine-tuning Apr 24, 2024
@belerico
Copy link
Contributor Author

belerico commented Apr 24, 2024

@rasbt to answer your previous question: LongLora is not enabled by default as both longlora_context_length and longlora_n_groups are None, but i agree with you to have a simpler way to enable it. As you suggested i can add a LongLoraArgs as you have done in the galore branch: in this way i can also check those args in a separate function (like
the validate_train_args)

@rasbt
Copy link
Collaborator

rasbt commented Apr 24, 2024

Thanks! I think LongLoraArgs might be better, especially if it can be used in multiple approaches, e.g., full and lora

@belerico
Copy link
Contributor Author

I've just trained a model with

python litgpt/finetune/lora.py \
--config=/teamspace/studios/this_studio/litgpt/config_hub/finetune/mistral-7b/longlora.yaml \
--checkpoint_dir=/teamspace/studios/this_studio/litgpt/checkpoints/mistralai/Mistral-7B-Instruct-v0.1

One generation that I've obtained with

python litgpt/generate/base.py \
--checkpoint_dir ../out/finetune/lora-mistral-7b/final \
--prompt="Recommend a movie for me to watch during the weekend and explain the reason." \
--max_new_tokens=128

is the following:

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Recommend a movie for me to watch during the weekend and explain the reason.

### Response:
I recommend the movie "Inception" directed by Christopher Nolan. This is an excellent sci-fi thriller that will keep you on the edge of your seat throughout the entire film. The story follows a professional thief, Dom (played by Leonardo DiCaprio), who is able to steal information from someone's subconscious while they dream. Dom is offered a chance at clemance in exchange for performing the near-impossible task of planting an idea into someone's mind, an act known as Inception.

This movie is a fantastic choice for the weekend because it's not only entertaining, but it also challenges you to think critically about the concepts presented within the film. The plot is twisting and turning, keeping you engaged from beginning to end. Additionally, the special effects and visuals are stunning, making for a truly immersive viewing experience. Moreover, with its all-star cast, including Joseph Gordon-Levitt, Ellen Page, and Tom Hardy, you know you're in for a treat.

Overall, "Inception" is an outstanding choice for the weekend because it provides an exciting and thought-provoking movie experience that is sure to leave a lasting
Time for inference 2: 15.43 sec total, 16.59 tokens/sec
Memory used: 14.67 GB

@rasbt
Copy link
Collaborator

rasbt commented Apr 25, 2024

Nice, this is a good sign that things work!

Copy link
Collaborator

@rasbt rasbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR so far. This looks great. I'll probably have to make multiple passes over it because it's a lot of changes, but thanks for doing this so far!

Regarding the formatting, I see there's a lot of stylistic changes, which are a bit distracting as they show up in the file diffs. I would say maybe don't apply those here just so we can focus on the core changes.

litgpt/args.py Outdated Show resolved Hide resolved
"""Whether to enable LongLora."""
n_groups: int = 4
"""Number of groups to divide the sequence length into."""
context_length: int = 8192
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder here what happens if the model has a longer context already. A good test case could be LongChat (supported in LitGPT).

I wonder if this should be a factor (2x the original context length) or None by default and then infer 2x the original context length.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put a double check: one in the validate_longlora_args where if LongLora is used and longlora_context_length <= model.block_size then a warning is raised and LongLora is disabled; the other one before the model creation where I increase the model block-size and RoPE-condense-ratio only if LongLora is enabled and longlora_context_length > model.block_size. I can remove the second check and we can fallback to None as default, and in that case infer the 2x

context_length: int = 8192
"""Length of the enlarged context window."""
trainable_params: str = "wte,norm,ln"
"""List of comma-separated parameters to train in LongLora."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the other options? Are "wte,norm,ln" the only allowed ones or are there more?

Copy link
Contributor Author

@belerico belerico Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, I wasn't clear. I meant more like what are the supported options here? What values can a user typically put in? ...

But you probably can't use Literal here because of the various combinations within that string. But in the comments, maybe could you mention which of the terms within that comma-separated string are supported?

Sorry, i didn't get it. I was looking at the model and if i'm not missing something i think that those are the only ones left other than the LoRA layers (controlled by the arguments in the finetune/lora.py script). I can add a check to prevent the user to input anything other a combination of those three layers?

temperature: float = 0.8,
top_k: int = 50,
max_new_tokens: int = 50) -> None:
def __init__(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the cleanup here!

Comment on lines +125 to +127
if resume is True:
resume = max(out_dir.rglob("step-*/*.pth"), key=(lambda p: int(p.parent.name.split("-")[1])))
if resume:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if resume is True and if resume are equivalent, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those lines were in lines 142-143 before and I put those there because I need to resume the previous LongLoraArgs and overwrite the current ones. I think that you were doing that also before because resume can be both a boolean (and in that case the checkpoint file will be loaded from the folder with the greater step) and a path to a checkpoint file

Copy link
Contributor Author

@belerico belerico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the other options? Are "wte,norm,ln" the only allowed ones or are there more?

In the paper the authors have specified that to increase the context length while using LoRA and be effective you also need to fine-tune the embedding layer and every norm layer (ref. Table 2) without specifying anything else. I put there the defaults for the LoRA fine-tuning and leave it to the user the experimentation with other values

@rasbt
Copy link
Collaborator

rasbt commented Apr 25, 2024

What are the other options? Are "wte,norm,ln" the only allowed ones or are there more?
In the paper the authors have specified that to increase the context length while using LoRA and be effective you also need to fine-tune the embedding layer and every norm layer (ref. Table 2) without specifying anything else. I put there the defaults for the LoRA fine-tuning and leave it to the user the experimentation with other values

Oh sorry, I wasn't clear. I meant more like what are the supported options here? What values can a user typically put in? E.g., analogous to

precision: Literal["bf16-true", "bf16-mixed", "32-true", None] = None,

But you probably can't use Literal here because of the various combinations within that string. But in the comments, maybe could you mention which of the terms within that comma-separated string are supported?

@belerico belerico requested a review from rasbt April 25, 2024 23:18
@rasbt
Copy link
Collaborator

rasbt commented May 29, 2024

Sorry for the long silence, and thanks again for this great PR! I have just been a bit swamped with work lately but hopefully can circle back to it some time.

@belerico
Copy link
Contributor Author

Sorry for the long silence, and thanks again for this great PR! I have just been a bit swamped with work lately but hopefully can circle back to it some time.

Absolutely no worries! Thank you for taking the time to watch this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants