-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long Pauses Between Epochs #36
Comments
This is something related with the latest accelerate version... I can't do much about it until kohya upgrade accelerate to another version as part of his supported code base... but glad you raised it so others are not taken aback by it ;-) |
So I'm a bit confused. The latest version of |
It is using the CPU far too much as the memory usage ramps up, slowly, then it dumps to CUDA, then epoch where the cycle starts over. Considering Automatic1111 works fine, and is using the GPU (1 gig more of it) I can say it is something in Kohya's code. |
I definitely agree. I also do not want to use repeats as epochs is a better measurement of progress, and repeats produce different results than just using epochs. It almost seems like a workaround for whatever is wrong with the code. |
I noticed that there are long pauses between epochs, around 35-40 seconds. What is the reasoning behind this and is there a way to lower it significantly or disable it entirely?
I looked through the code and was not able to find anything specific to pausing between epochs. The other db extension that is for auto's ui has an option to pause between epochs (default 60s) but can be lowered to 1s if desired.
This also applies to native training.
Edit: Looks like it hangs on line
206
intrain_db.py
every epoch.Thanks.
The text was updated successfully, but these errors were encountered: