Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Clarification] LORA layer scaling lr #363

Open
SavvaI opened this issue Apr 1, 2023 · 1 comment
Open

[Clarification] LORA layer scaling lr #363

SavvaI opened this issue Apr 1, 2023 · 1 comment

Comments

@SavvaI
Copy link

SavvaI commented Apr 1, 2023

I am apologising in advance if I misunderstood something.
My issue comes from the observation, that when I raise the lora rank (4 -> 512) the observable effective learning rate (difference between the sampled images through training) drastically drops.
So I come to the source code and see https://github.com/kohya-ss/sd-scripts/blob/c93cbbc373daff7827395b6ca5bde91733890722/networks/lora.py#L52
self.scale = alpha / self.lora_dim
In my understanding, the right way to implement the equalised learning rate https://arxiv.org/abs/1812.04948 , should be the following:
self.scale = alpha / (in_dim**0.5) / (self.lora_dim**0.5)
(in_dim**0.5) divider for the down_sample layer and (self.lora_dim**0.5) for the up_sample layer.
Thank you.

@kohya-ss
Copy link
Owner

kohya-ss commented Apr 9, 2023

The alpha is based on the following paper and the repo.
https://arxiv.org/abs/2106.09685
https://github.com/microsoft/LoRA

And the introducing of the alpha was discussed this issue:
kohya-ss/sd-webui-additional-networks#49

I am not good at math, so I'm not sure these are better or not than your description, but I hope they will help for clarification.

wkpark pushed a commit to wkpark/sd-scripts that referenced this issue Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants