You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the ResNet9_Barlow_Twins.ipynb notebook, the model and the training function compiled and trained on ~13 epochs successfully, however subsequently the loss abruptly becomes nan. This in turn was occuring due to the gradient becoming nan.
Debugging:
torch.autograd.set_detect_anomaly(True) was used to trace what part of the code was causing there to be nan values.
Error was traced to be:RuntimeError: Function 'PowBackward0' returned nan values in its 0th output.
Gradient clipping was used to ensure that the gradients do not explode. Division by zero was also prevented at all stages by adding a small positive constant wherever required.
We also tried using Facebook research's implementation of the Barlow Twins loss function and the LARS optimizer.
The text was updated successfully, but these errors were encountered:
In the ResNet9_Barlow_Twins.ipynb notebook, the model and the training function compiled and trained on ~13 epochs successfully, however subsequently the loss abruptly becomes
nan
. This in turn was occuring due to the gradient becomingnan
.Debugging:
torch.autograd.set_detect_anomaly(True) was used to trace what part of the code was causing there to be
nan
values.Error was traced to be: RuntimeError: Function 'PowBackward0' returned nan values in its 0th output.
A simpler model (AlexNet) was used in the AlexNet_Barlow_Twins.ipynb notebook. The error persisted.
Gradient clipping was used to ensure that the gradients do not explode. Division by zero was also prevented at all stages by adding a small positive constant wherever required.
We also tried using Facebook research's implementation of the Barlow Twins loss function and the LARS optimizer.
The text was updated successfully, but these errors were encountered: