-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizers support for parameter groups #523
Conversation
Support complex types in Adam.
Updated release notes.
@NiklasGustafsson before jumping into this, what is the motivation to move optimizer implementations into C# code? Every implementation copied from/modeled after the original PyTorch code is a maintenance burden, and a huge potential for future divergence from PyTorch in subtle but potentially very important details. Also error prone. That approach and its consequences are one of the major reasons I am not using TensorFlow.NET project. |
Yes, excellent question. I completely agree that the fact that we're just interfacing with native code is attractive. Without a question, it is why we have come as far as we have as quickly as we have. There are a couple of areas, like determining whether types are float, complex, etc. doesn't seem worth going to C++ for, but otherwise, TorchSharp is and should remain, a .NET wrapper of LibTorch. There are only (even before this PR) three areas where there is significant managed code:
The situation with the optimizers is that only about half of the ones available in 1.10 are available in native code, so there was already a number of managed code implementations. In fact, I dragged my feet for a long time, some months back, because I didn't want to tackle the managed code optimizers -- this was before the DisposeScope addition, when expressing tensor computations in .NET was nasty. So, faced with a disruptive change, i.e. adding parameter groups, and the challenge of representing / encoding parameter groups and their options in two different ways, and having to continue to maintain two very different approaches to implementation, I felt that it would be better to have just one, like Pytorch does. So, that's the rationale, for better or worse. |
src/TorchSharp/NN/Optimizer.cs
Outdated
public override Tensor step(Func<Tensor> closure = null) | ||
{ | ||
return _step<ParamGroup>(group => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't really check the correctness of the optimizer implementation.
A few things:
- It needs a link to either the paper or to the PyTorch implementation
- It needs a reproducible test for every valid options combination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs a link to either the paper or to the PyTorch implementation
That's a good point. The pytorch implementation has all the paper links, so that's easy.
It needs a reproducible test for every valid options combination.
I'll see how much I can pull off of that. The combinatorics of that are forbidding. It's certainly worth having more unit tests with non-defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing tests could be pretty easy with TestOptimizer(Model, Optimizer, float targetLoss)
helper function that would do a reproducible run.
The problem would be to know what the target loss should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem would be to know what the target loss should be.
Yes. I can think of doing it two ways:
- Write all the unit tests in both Python and .NET and use the Python values as 'expected.' Tons of work.
- Run the unit tests once, look at the loss and use that as the target loss.
|
||
optimizer.step(); | ||
} | ||
Assert.True(finalLoss < initialLoss); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, we should do a reproducible training here, and check the loss exactly (+- error margin)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to think about that. In the past, we've struggled with reproducible RNG sequences in the unit tests, since the tests are run in parallel. As long as there aren't any hidden random number API calls that use the global RNG, it should be doable by using Generator objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, I don't know how Dropout layers generate the mask, but we can just avoid Dropout in unit tests.
@lostmsu -- your review and comments are much appreciated! |
@lostmsu You made a right observation. One critical difference between TorchSharp and Tensorflow.NET is that the % of Unit Test code coverage of LibTorch API and TorchSharp (in my impression) is significantly much higher in TorchSharp compared the situation in Tensorflow.NET. The transparency of code coverage and regular alignment with the latest pytorch in Torchsharp is also a key advantage. |
The PR comments, except for the request to revert optimizers to native code, have been taken into account. The optimizer implementation request will be tracked in its own issue (#531) and addressed as long as the long-term cost is not found to be prohibitive. |
This is a massive PR.
The main thrust is to support parameter groups in the optimizers and LR schedulers. This is a massive and breaking change.
The change also involves moving all the optimizer implementations to managed code, except for LBFGS, which does not allow more than a single parameter group, and therefore does not require the move.
On the way to implementing this support, a few things were discovered missing and/or incorrect, and were therefore fixed.
@lostmsu, @oluwandabira -- if you have a chance to take a look, too, that would be wonderful. I still haven't heard about the right way to include individuals in the 'Request Review' UI.