Replies: 1 comment
-
As far as I know, D-Adaptaiton and Adafactor are optimizers that automatically adjust the learning rate. You can turn off the 8-bit Adam and use Unet: 1.0, Te: 0.5. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Anyone had a reference to what optimizers do ? what's the difference between them ?
Beta Was this translation helpful? Give feedback.
All reactions