You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I know it's mentioned in the paper that this version of scaling directly parametrize scale instead of exponent, however an unintended side effects is that when the scale goes close to zero it can get into negatives due to some larger random gradient updates, which causes a NaN.
Fix is simple, in our adaptation for our in house models we changed it to torch.sqrt(torch.abs(scale) + eps). The eps is added for preserving gradients (so it never reaches zero). I guess a biased ReLU also probably works, along with other non linear functions.
The text was updated successfully, but these errors were encountered:
k-diffusion/k_diffusion/models/image_transformer_v2.py
Line 111 in 21d12c9
I know it's mentioned in the paper that this version of scaling directly parametrize scale instead of exponent, however an unintended side effects is that when the scale goes close to zero it can get into negatives due to some larger random gradient updates, which causes a NaN.
Fix is simple, in our adaptation for our in house models we changed it to
torch.sqrt(torch.abs(scale) + eps)
. The eps is added for preserving gradients (so it never reaches zero). I guess a biased ReLU also probably works, along with other non linear functions.The text was updated successfully, but these errors were encountered: