You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found the reason that you wonder.
In the run_loop function,
defrun_loop(self):
saved=Falsewhile (
notself.lr_anneal_stepsorself.step<self.lr_anneal_stepsorself.global_step<self.total_training_steps
):
batch, cond=next(self.data)
self.run_step(batch, cond)
saved=Falseif (
self.global_stepandself.save_interval!=-1andself.global_step%self.save_interval==0
):
self.save()
saved=Trueth.cuda.empty_cache()
# Run for a finite amount of time in integration tests.ifos.environ.get("DIFFUSION_TRAINING_TEST", "") andself.step>0:
returnifself.global_step%self.log_interval==0:
logger.dumpkvs()
The condition not self.lr_anneal_steps always evaluates to True if lr_anneal_steps is left at its default value of 0.
You can temporarily fix the issue by removing not self.lr_anneal_steps or self.step < self.lr_anneal_steps.
Reduce total_training_steps.Then,Why does the training not stop when the steps are reduced
The text was updated successfully, but these errors were encountered: