Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizers support for parameter groups #523

Merged
merged 29 commits into from
Feb 22, 2022

Conversation

NiklasGustafsson
Copy link
Contributor

This is a massive PR.

The main thrust is to support parameter groups in the optimizers and LR schedulers. This is a massive and breaking change.

The change also involves moving all the optimizer implementations to managed code, except for LBFGS, which does not allow more than a single parameter group, and therefore does not require the move.

On the way to implementing this support, a few things were discovered missing and/or incorrect, and were therefore fixed.

@lostmsu, @oluwandabira -- if you have a chance to take a look, too, that would be wonderful. I still haven't heard about the right way to include individuals in the 'Request Review' UI.

This was linked to issues Feb 18, 2022
@lostmsu
Copy link
Contributor

lostmsu commented Feb 18, 2022

@NiklasGustafsson before jumping into this, what is the motivation to move optimizer implementations into C# code? Every implementation copied from/modeled after the original PyTorch code is a maintenance burden, and a huge potential for future divergence from PyTorch in subtle but potentially very important details. Also error prone. That approach and its consequences are one of the major reasons I am not using TensorFlow.NET project.

@NiklasGustafsson
Copy link
Contributor Author

@lostmsu:

Yes, excellent question.

I completely agree that the fact that we're just interfacing with native code is attractive. Without a question, it is why we have come as far as we have as quickly as we have. There are a couple of areas, like determining whether types are float, complex, etc. doesn't seem worth going to C++ for, but otherwise, TorchSharp is and should remain, a .NET wrapper of LibTorch.

There are only (even before this PR) three areas where there is significant managed code:

  1. TorchVision (but I moved code to C++ to avoid the dispose problem)
  2. Optimizers (half of them were already in managed code)
  3. LR Schedulers.

The situation with the optimizers is that only about half of the ones available in 1.10 are available in native code, so there was already a number of managed code implementations. In fact, I dragged my feet for a long time, some months back, because I didn't want to tackle the managed code optimizers -- this was before the DisposeScope addition, when expressing tensor computations in .NET was nasty.

So, faced with a disruptive change, i.e. adding parameter groups, and the challenge of representing / encoding parameter groups and their options in two different ways, and having to continue to maintain two very different approaches to implementation, I felt that it would be better to have just one, like Pytorch does.

So, that's the rationale, for better or worse.

src/TorchSharp/NN/Module.cs Show resolved Hide resolved
src/TorchSharp/NN/LRScheduler.cs Show resolved Hide resolved
src/TorchSharp/NN/Optimizer.cs Outdated Show resolved Hide resolved
src/TorchSharp/NN/Optimizer.cs Outdated Show resolved Hide resolved
src/TorchSharp/NN/Optimizer.cs Outdated Show resolved Hide resolved
Comment on lines 903 to 905
public override Tensor step(Func<Tensor> closure = null)
{
return _step<ParamGroup>(group => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't really check the correctness of the optimizer implementation.

A few things:

  1. It needs a link to either the paper or to the PyTorch implementation
  2. It needs a reproducible test for every valid options combination.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs a link to either the paper or to the PyTorch implementation

That's a good point. The pytorch implementation has all the paper links, so that's easy.

It needs a reproducible test for every valid options combination.

I'll see how much I can pull off of that. The combinatorics of that are forbidding. It's certainly worth having more unit tests with non-defaults.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing tests could be pretty easy with TestOptimizer(Model, Optimizer, float targetLoss) helper function that would do a reproducible run.

The problem would be to know what the target loss should be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem would be to know what the target loss should be.

Yes. I can think of doing it two ways:

  1. Write all the unit tests in both Python and .NET and use the Python values as 'expected.' Tons of work.
  2. Run the unit tests once, look at the loss and use that as the target loss.

src/TorchSharp/NN/Optimizer.cs Outdated Show resolved Hide resolved
src/TorchSharp/NN/Optimizer.cs Outdated Show resolved Hide resolved

optimizer.step();
}
Assert.True(finalLoss < initialLoss);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, we should do a reproducible training here, and check the loss exactly (+- error margin)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to think about that. In the past, we've struggled with reproducible RNG sequences in the unit tests, since the tests are run in parallel. As long as there aren't any hidden random number API calls that use the global RNG, it should be doable by using Generator objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, I don't know how Dropout layers generate the mask, but we can just avoid Dropout in unit tests.

test/TorchSharpTest/TestTraining.cs Outdated Show resolved Hide resolved
@NiklasGustafsson
Copy link
Contributor Author

@lostmsu -- your review and comments are much appreciated!

@GeorgeS2019
Copy link

@NiklasGustafsson before jumping into this, what is the motivation to move optimizer implementations into C# code? Every implementation copied from/modeled after the original PyTorch code is a maintenance burden, and a huge potential for future divergence from PyTorch in subtle but potentially very important details. Also error prone. That approach and its consequences are one of the major reasons I am not using TensorFlow.NET project.

@lostmsu You made a right observation. One critical difference between TorchSharp and Tensorflow.NET is that the % of Unit Test code coverage of LibTorch API and TorchSharp (in my impression) is significantly much higher in TorchSharp compared the situation in Tensorflow.NET.

The transparency of code coverage and regular alignment with the latest pytorch in Torchsharp is also a key advantage.

@NiklasGustafsson
Copy link
Contributor Author

The PR comments, except for the request to revert optimizers to native code, have been taken into account. The optimizer implementation request will be tracked in its own issue (#531) and addressed as long as the long-term cost is not found to be prohibitive.

@NiklasGustafsson NiklasGustafsson merged commit 68a8e01 into dotnet:main Feb 22, 2022
@NiklasGustafsson NiklasGustafsson deleted the optimizers branch August 5, 2022 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AdamW bug on v0.96.0 Add support for OptimizerParamGroup
3 participants