-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LoRA weight underflows #49
Comments
Thank you for the detailed report! This issue is important. I found the issue that some LoRA modules are not trained with higher rank and 'fp16'. But I did not realize it was due to the underflow. The solution seems to be good. I am not very familiar with PyTorch and would appreciate any suggestions you may have. For the backward compatibility, I think your suggestion is best. |
Putting alpha in module parameter indeed is better. |
Thank you for your comment. I will update the training script and the extension as soon as possible. |
I've implemented the scaling feature like this. If you have any comments, I would appreciate it.
|
I see no problem. |
Thank you for your comment! |
I think this issue has been resolved. Thank you again for your work! Please re-open if you have any questions :) |
Soes anyone have an example of a SD1 lora with this for me to test and add support for? |
Hi, @AUTOMATIC1111 Thank you for supporting LoRA in web UI! I've uploaded the LoRA model to my blog. The post is written in Japanese, but you will find The LoRA is trained with SD 1.5, and activation word is The state_dict now have
And LoRA are scaled by sd-webui-additional-networks/scripts/lora_compvis.py Lines 44 to 67 in 4f0b17d
I hope this helps! |
Thanks! Great work on your training repos! |
Looks good!
Because OpenClip (Text Encoder used in SD2) uses torch.nn.MultiheadAttention in ResidualAttentionBlock in their Transformer, instead of processing Q/K/V/out independently. It will be possible to override forward of the ResidualAttentionBlock, but I'm not sure how to do it, so I merge weights for MultiheadAttention. Btw, I found a bug that alpha is not used when merging weights, I fixed that. |
I encountered the same problem as #41, which states "LoRAs have no effects".
Background
I'm using SSDT to train LoRAs, the LoRA layers implementations are from loralib, which have weight scaling = (rank / alpha).
So if I want to use a alpha = 1, rank-16 LoRA produced by SSDT in AddNet, scale should be set to 1/16.
Some users found this additional scaling not convenient, I added an unscale weight option to scale weight by (alpha / rank) when converting SSDT checkpoint to AddNet format.
Investigation
I looked the state dict after unscaling.
All of the tensors have pretty small values, which can cause low numerical stability. ~20% of them have zero values, in this 20%, 15% in text encoder, 85% in UNet.
Experiment
In one LoRA (rank=16, alpha=1) I trained,
Conclusion and Solution
I suspect those zeros are products of underflow, which probably is the cause of #41.
Those underflows happens more often if rank is high.
At training time, add option: "alpha" to scale LoRA like loralib. Save alpha to LoRA metadata.
At inference time, add option: "scale weight" to scale LoRA weight by rank / alpha.
Backward Compatibility
Unfortunately, as you can imagine, almost all existing LoRAs have already underflowed.
If "scale weight" is enabled, for still using old LoRAs, if a LoRA have no alpha in metadata, do not scale.
Additional: NaNs
After AUTOMATIC1111/stable-diffusion-webui@9991967, when generating images, those underflowed LoRAs sometimes produces NaN errors.
Some users reported loss=NaN when using https://github.com/Linaqruf/kohya-trainer/ and https://github.com/Mikubill/naifu-diffusion/, especially at high rank. I suspect that's related to this issue.
The text was updated successfully, but these errors were encountered: