No module 'DAdaptAdan' #770

SileNTViP · 2023-05-11T08:42:19Z

Replace 'DAdaptAdan' with 'DAdaptAdam'

bmaltais · 2023-05-11T09:37:47Z

We're you trying to train using the new DadapAdan?

SileNTViP · 2023-05-11T09:49:24Z

Yes. AttributeError: module 'dadaptation' has no attribute 'DAdaptAdan'. Did you mean: 'DAdaptAdam'?

bmaltais · 2023-05-11T12:56:54Z

Look like the version of the dadaptation module installed does not support the new optimizers... I will test upgrading to the newer one and bring it in the next release if there are no issues with it.

bmaltais · 2023-05-11T13:10:56Z

Look like the new dadaptation module is causing issues: RuntimeError: Setting different lr values in different parameter groups is only supported for values of 0

I will wait for kohya to sort out the issues with dadaptation before upgrading. Until then those new dadaptation methods will not work.

idlebg · 2023-05-11T17:33:23Z

pip(3) install -U dadaptation

or force.. here it did not update for some reason..

testing it with my upcoming Fusion AI model

impressive difference on some situations.... figuring out people and faces with Adan

will try to summarize as still investigating as huge difference on difference when playing additionally with the Scheduler
wdecay and the 3 betas for the Adan

from left to right
normal da . then only Adan with different betas and scheduler tuning

bmaltais · 2023-05-11T18:10:19Z

The v2.0 version no longer alow to set LoRA Text Encoder LR and Uner LR to be different... this is why I felt I should not upgrade it... Have you been training LoRA using Dadaptation with different LR of TE and UNet?

idlebg · 2023-05-11T19:24:17Z

nt LR of TE and UNet?

currently trying all at lr1
to figure out how it handles compared to regular v2 if LR is 1 for all.
also "weight_decay=0.01" "betas=0.99,0.9,0.99" vs
"weight_decay=0.02" "betas=0.98,0.92,0.99" "eps=1e-8"
and testing how adan works with all Schedulers and offcet noice variants.
so far without much knowledge, it gets easily 0.092 with def. instalation

give me an hour to have more results how it actually handles small and big sets as rates in the long run.

I started experimenting as ver 1 did NOT support multi GPU sharing out of the box without deep seed and additional code change.... while adafactor worked out of the box with a few a100 ("relative_step=True" "scale_parameter=True" )

Now eager test v2 new Added support for PyTorch's Fully Sharded Data Parallel.

But prior that testing a normal 4900 to see what it does.

Thanks for the heads up about the LR.
will move and update on all findings later on
so far it does not handle LR with more epoch repeats....
(above is more)

need some time to summarize and test if the shareded GPU training works now

update doc and minor fix

bmaltais added the enhancement New feature or request label May 11, 2023

SileNTViP closed this as completed Jun 3, 2023

bmaltais pushed a commit that referenced this issue Aug 20, 2023

Merge pull request #770 from kohya-ss/dev

014c4b4

update doc and minor fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No module 'DAdaptAdan' #770

No module 'DAdaptAdan' #770

SileNTViP commented May 11, 2023

bmaltais commented May 11, 2023

SileNTViP commented May 11, 2023

bmaltais commented May 11, 2023

bmaltais commented May 11, 2023

idlebg commented May 11, 2023

bmaltais commented May 11, 2023

idlebg commented May 11, 2023

No module 'DAdaptAdan' #770

No module 'DAdaptAdan' #770

Comments

SileNTViP commented May 11, 2023

bmaltais commented May 11, 2023

SileNTViP commented May 11, 2023

bmaltais commented May 11, 2023

bmaltais commented May 11, 2023

idlebg commented May 11, 2023

bmaltais commented May 11, 2023

idlebg commented May 11, 2023