-
-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LoRA] Adds support for bias in LoRA #5733
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add an argument to the engine enable_lora_bias
and avoid initializing the bias tensors if it's false? If the user knows none of their loras will have bias, we can save memory.
@Yard1 Thanks for reviewing the PR. I have added the enable_lora_bias flag (default set to false), which prevents the allocation of lora bias tensors when false. |
Related: #5930 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, can we also add an e2e test?
@Yard1 Thanks for reviewing. I've added an e2e test for the lora_bias support. |
@followumesh you need to run |
@followumesh apologies, this needs another conflict resolution! |
@@ -64,6 +64,64 @@ def dec(*args, **kwargs): | |||
return dec | |||
|
|||
|
|||
def apply_bias( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to add this inside PunicaWrapper.add_lora()
? There could be an optional bias
argument in add_lora()
and then the logic of testing whether the bias is None
and doing the index computation could be moved inside this function. It seems to me that it would eliminate repeated code lines in this file, but I don't know all the details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maxdebayser I don't have a strong opinion, but this was to keep the changes out of punica wrapper because of it not being directly releated to punica.
vllm/lora/utils.py
Outdated
assert parts[0] == "base_model" | ||
assert parts[1] == "model" | ||
if parts[-1] == "weight": | ||
assert parts[-2] == "lora_A" or parts[-2] == "lora_B" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion is failing a couple of lora_modules that we have:
>>> lora_modules[0].split(".")
['base_model', 'model', 'lm_head', 'weight']
>>> lora_modules[1].split(".")
['base_model', 'model', 'model', 'embed_tokens', 'weight']
Still investigating if this would have thrown an error with the previous code as well...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@followumesh actually it looks like this was a bad merge .. it's reverting a recent change that was made to improve the error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prashantgupta24 Can you point to a lora module I can test with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try to find one, also reverting this change gives me the error
ValueError: base_model.model.lm_head.weight is unsupported LoRA weight
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To summarize, since vLLM expects only LoRA weights in the safetensors file, it was actually an error in our lora adapter that we had lm_head within it. But I think the original error is now being reverted because of this change. Ideally, the error should be
ValueError: base_model.model.lm_head.weight is unsupported LoRA weight
Instead, an assertion error assert parts[-2] == "lora_A" or parts[-2] == "lora_B"
is now being thrown, which doesn't provide the right detail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, @followumesh I think the changes here are not actually related to the main PR changes, wonder if you could revert (and any other changes in same category)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prashantgupta24 Can you check now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this looks better. I'll let you know when I get a chance to test it!
Motivation
PEFT, https://github.com/foundation-model-stack/fms-hf-tuning includes support for tuning LoRA bias. This PR enables bias for lora, so the adapters with bias will work with vLLM.
Changes Included