-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solver switching support & implementation of Nesterov's accelerated grad... #741
Conversation
virtual void SnapshotSolverState(SolverState * state); | ||
virtual void RestoreSolverState(const SolverState& state); | ||
void SnapshotSolverState(SolverState * state); | ||
void RestoreSolverState(const SolverState& state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why make any of these three methods non-virtual? other solvers might have more/other state to store, other PreSolve steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact this change does nothing; these methods are all implicitly virtual, and the virtual keyword should be restored (see #590).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeffdonahue @longjon Thanks for the helpful comments! I've restored the virtual keywords accordingly.
Hey @qipeng, thanks for this PR -- this looks like really great work! While I'll be excited to see NAG & AdaGrad in Caffe, I'm inclined to hold any new code to a higher standard than existing code in terms of testing. Would you be willing to write some unit tests for your new solvers, e.g. checking that a known correct update value is computed for some small problems? You could refer to the existing |
Hi @jeffdonahue , thanks for the comments! Since I'm just starting to use/contribute to this code base, I might not be familiar with writing proper test cases for solvers but I'll definitely try to. That said, it would be really helpful if you could spare a small amount of time to write a test for SGDSolver. And yes, I intend to keep my solvers branch public and alive so others in the community can feel free to use them. |
Hi @kloudkl , I read both issues and they are really insightful. As for AdaGrad, my implementation was according to Duchi's original paper on AdaGrad, without the fancy variations. But I don't imagine it to be difficult if the variants are not radically different --- if you could point me to their respective references, I'll try to read them and extend this solver further on my branch. As for NAG, my implementation referred to Ilya Sutskever's PhD thesis, which should be the same as the equations you listed in #53 with only one exception: I'm working within the |
Let's be agile. Before the original AdaGrad works correctly, it's a low priority to implement any other variant version. |
} | ||
|
||
template Solver<float>* GetSolver(const SolverParameter& param); | ||
template Solver<double>* GetSolver(const SolverParameter& param); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that if a template function is defined in the header file, you won't need the above two lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments! It's been addressed in a recent commit.
@qipeng this is really nice work that I look forward to merging. Can you rebase and add tests now that @jeffdonahue has given an example by the SGDSolver tests in #855? Thanks! |
@shelhamer I just made a commit including solver unit tests for Nesterov and AdaGrad. Some code are commented out, and here are some explanations:
Also it should be noted that given the framework of caffe where only one gradient evaluation and one update can be performed in each iteration, I implemented Nesterov in a slightly different way (I call it "step back and over-step" in the code), so basically the only problem is that after the solver finishes solving, the parameter is at |
Great work @qipeng! I consolidated the tests into one file to avoid duplicating so much code -- I made a PR to your branch. I'd like to see a couple more things before merging this:
|
oops, sorry about my first TODO bullet point; I just reread comment (1) from your previous post -- I'll think about it a bit. |
@jeffdonahue Thanks for the comments! I've added sample scripts for MNIST, as well as a sanity check of momentum for AdaGradSolver in the last commit. |
Thanks @qipeng! Made a quick pass through your added code and made a comment, but will do a more thorough (and quite possibly final!) review later. |
@jeffdonahue Thanks for the careful review! Turns out after caffe's main executable's been unified, I forgot to add solver switch to that code, so I've been testing SGDSolver on that .prototxt file for a while. Now everything should be fixed. :) |
4278286
to
c01f07a
Compare
…radient and AdaGrad
TestGradientBasedSolver
Thanks for the fixes! This is merged into dev. I did a bit of grooming on the MNIST autoencoder example, including of the existing mnist_autoencoder.prototxt: -use the new MNIST LMDBs rather than LevelDBs I also changed the existing MNIST autoencoder SGD solver to work much better, starting at a learning rate 1e-2 with a step policy, instead of fixed at 1e-4, and made the symmetric change to the Nesterov solver (the adagrad solver already uses base_lr: 0.01 and adjusts its own learning rate), and stopping at iteration 65K instead of 4 million or whatever it was before. I also updated the training scripts to run from the root directory per the new convention. After all those fixes, I also added AdaGrad:
SGD:
Nesterov:
|
@qipeng thanks for the solvers and @jeffdonahue thanks for the grooming. @jeffdonahue Ideally you should keep the same format for merge commit messages i.e. "Merge pull request #741 from qipeng/solvers" to make it easier to go back and find the thread for a merge. |
train_net.cpp is reverted (re-deprecated) in dev. sorry about the merge commit message; will try to keep in github format during manual merges in the future. |
...ient and AdaGrad