LR Scheduler and Optimizer #11
lukas-blecher
started this conversation in
Ideas
Replies: 2 comments 4 replies
-
I am current training as well, I got BLEU=0.89 1 time, but then it fluctuated between 0.7 to 0.8x all the time. I guess there is something to do with the lr as you mentioned. I am planning to make it faster if possible for deployment. |
Beta Was this translation helpful? Give feedback.
2 replies
-
|
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Ideas for training speed up by choosing optimal optimization algorithms.
As optimizer I only ever tried Adam and AdamW. They seem to perform quite well but I had the feeling that AdamW is a better fit.
Now for the LR Scheduler.
It has a quite big effect on the training progress. I've mostly used OneCycleLR until now. But the loss either stagnated or got even worse after some time. That's why I continued the training after a couple of epochs with a "fresh" OneCycle.
Maybe using a cyclic scheduler from the start would be the way to go. Something like CosineAnnealingLR.
Does anybody have experience with other schedulers/optimizers?
Beta Was this translation helpful? Give feedback.
All reactions