Optimizers support for parameter groups #523

NiklasGustafsson · 2022-02-18T03:21:45Z

This is a massive PR.

The main thrust is to support parameter groups in the optimizers and LR schedulers. This is a massive and breaking change.

The change also involves moving all the optimizer implementations to managed code, except for LBFGS, which does not allow more than a single parameter group, and therefore does not require the move.

On the way to implementing this support, a few things were discovered missing and/or incorrect, and were therefore fixed.

@lostmsu, @oluwandabira -- if you have a chance to take a look, too, that would be wonderful. I still haven't heard about the right way to include individuals in the 'Request Review' UI.

…parameters.

Support complex types in Adam.

Updated release notes.

lostmsu · 2022-02-18T05:01:26Z

@NiklasGustafsson before jumping into this, what is the motivation to move optimizer implementations into C# code? Every implementation copied from/modeled after the original PyTorch code is a maintenance burden, and a huge potential for future divergence from PyTorch in subtle but potentially very important details. Also error prone. That approach and its consequences are one of the major reasons I am not using TensorFlow.NET project.

NiklasGustafsson · 2022-02-18T15:14:58Z

@lostmsu:

Yes, excellent question.

I completely agree that the fact that we're just interfacing with native code is attractive. Without a question, it is why we have come as far as we have as quickly as we have. There are a couple of areas, like determining whether types are float, complex, etc. doesn't seem worth going to C++ for, but otherwise, TorchSharp is and should remain, a .NET wrapper of LibTorch.

There are only (even before this PR) three areas where there is significant managed code:

TorchVision (but I moved code to C++ to avoid the dispose problem)
Optimizers (half of them were already in managed code)
LR Schedulers.

The situation with the optimizers is that only about half of the ones available in 1.10 are available in native code, so there was already a number of managed code implementations. In fact, I dragged my feet for a long time, some months back, because I didn't want to tackle the managed code optimizers -- this was before the DisposeScope addition, when expressing tensor computations in .NET was nasty.

So, faced with a disruptive change, i.e. adding parameter groups, and the challenge of representing / encoding parameter groups and their options in two different ways, and having to continue to maintain two very different approaches to implementation, I felt that it would be better to have just one, like Pytorch does.

So, that's the rationale, for better or worse.

test/TorchSharpTest/TestTorchTensorBugs.cs

src/TorchSharp/NN/Module.cs

src/TorchSharp/NN/LRScheduler.cs

src/TorchSharp/NN/Optimizer.cs

lostmsu · 2022-02-18T17:44:51Z

src/TorchSharp/NN/Optimizer.cs

+            public override Tensor step(Func<Tensor> closure = null)
+            {
+                return _step<ParamGroup>(group => {


I can't really check the correctness of the optimizer implementation.

A few things:

It needs a link to either the paper or to the PyTorch implementation

It needs a reproducible test for every valid options combination.

It needs a link to either the paper or to the PyTorch implementation

That's a good point. The pytorch implementation has all the paper links, so that's easy.

It needs a reproducible test for every valid options combination.

I'll see how much I can pull off of that. The combinatorics of that are forbidding. It's certainly worth having more unit tests with non-defaults.

Writing tests could be pretty easy with TestOptimizer(Model, Optimizer, float targetLoss) helper function that would do a reproducible run.

The problem would be to know what the target loss should be.

The problem would be to know what the target loss should be.

Yes. I can think of doing it two ways:

Write all the unit tests in both Python and .NET and use the Python values as 'expected.' Tons of work.

Run the unit tests once, look at the loss and use that as the target loss.

src/TorchSharp/NN/Optimizer.cs

lostmsu · 2022-02-18T17:57:09Z

test/TorchSharpTest/TestTraining.cs

+
+                optimizer.step();
+            }
+            Assert.True(finalLoss < initialLoss);


As mentioned above, we should do a reproducible training here, and check the loss exactly (+- error margin)

I have to think about that. In the past, we've struggled with reproducible RNG sequences in the unit tests, since the tests are run in parallel. As long as there aren't any hidden random number API calls that use the global RNG, it should be doable by using Generator objects.

For example, I don't know how Dropout layers generate the mask, but we can just avoid Dropout in unit tests.

test/TorchSharpTest/TestTraining.cs

NiklasGustafsson · 2022-02-18T18:19:32Z

@lostmsu -- your review and comments are much appreciated!

GeorgeS2019 · 2022-02-19T14:31:52Z

@NiklasGustafsson before jumping into this, what is the motivation to move optimizer implementations into C# code? Every implementation copied from/modeled after the original PyTorch code is a maintenance burden, and a huge potential for future divergence from PyTorch in subtle but potentially very important details. Also error prone. That approach and its consequences are one of the major reasons I am not using TensorFlow.NET project.

@lostmsu You made a right observation. One critical difference between TorchSharp and Tensorflow.NET is that the % of Unit Test code coverage of LibTorch API and TorchSharp (in my impression) is significantly much higher in TorchSharp compared the situation in Tensorflow.NET.

The transparency of code coverage and regular alignment with the latest pytorch in Torchsharp is also a key advantage.

NiklasGustafsson · 2022-02-22T22:47:56Z

The PR comments, except for the request to revert optimizers to native code, have been taken into account. The optimizer implementation request will be tracked in its own issue (#531) and addressed as long as the long-term cost is not found to be prohibitive.

NiklasGustafsson added 25 commits February 7, 2022 16:24

Added overloads for addcdiv

56f0ae2

Started migrating optimizer implementations to managed code.

1ab1e49

Migrated Adam and AdamW implementations to managed code.

c5bfe57

Migrated the Adagrad optimizer to managed code.

875124b

The new optimizer construction APIs take named_parameters instead of …

35cde78

…parameters.

Migrating F# examples to new APIs.

c7fe487

Adding is_complex()

956c3ea

Started support for parameter groups.

d051e4d

WIP -- adding parameter groups support to optimizers.

e25832c

Further optimizer work.

9895206

Manual merge.

3407919

Moved more optimizers to support parameter groups.

e348e8e

Finished convering optimizers to managed code.

45e196a

Adjusted LR schedulers to handle parameter groups.

a6c2cdc

Added test for dotnet#516

61ae757

Update version number.

dec5dc4

Reverted version number.

3ccdeb6

Merge branch 'main' into optimizers

091b463

Adding a couple of minor APIs that were missing.

9d861eb

Support complex types in Adam.

Update version number.

275400f

Added is_leaf and retain_grad functions.

dc022ee

Temporary fix.

376eeb6

Adding "named_parameters" version for all optimizer factories.

2dfb4f1

Updated release notes.

Merge branch 'main' into optimizers

452de01

Updates to release notes and developer guide.

07f4657

NiklasGustafsson requested a review from dsyme February 18, 2022 03:21

This was linked to issues Feb 18, 2022

Add support for OptimizerParamGroup #495

Closed

AdamW bug on v0.96.0 #516

Closed

NiklasGustafsson commented Feb 18, 2022

View reviewed changes

test/TorchSharpTest/TestTorchTensorBugs.cs Show resolved Hide resolved

lostmsu suggested changes Feb 18, 2022

View reviewed changes

NiklasGustafsson added 2 commits February 18, 2022 17:22

Initial round of PR responses.

e0718d5

More unit tests.

ada3ef3

NiklasGustafsson added 2 commits February 22, 2022 10:59

Manual merge.

ee43d9c

Updated version number and release notes.

9aabb84

NiklasGustafsson mentioned this pull request Feb 22, 2022

With parameter groups introduced in managed code, revert previous optimizers to native code. #531

Closed

NiklasGustafsson merged commit 68a8e01 into dotnet:main Feb 22, 2022

NiklasGustafsson deleted the optimizers branch August 5, 2022 14:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizers support for parameter groups #523

Optimizers support for parameter groups #523

NiklasGustafsson commented Feb 18, 2022

lostmsu commented Feb 18, 2022 •

edited

Loading

NiklasGustafsson commented Feb 18, 2022

lostmsu Feb 18, 2022

NiklasGustafsson Feb 18, 2022

lostmsu Feb 18, 2022

NiklasGustafsson Feb 19, 2022

lostmsu Feb 18, 2022

NiklasGustafsson Feb 18, 2022

NiklasGustafsson Feb 18, 2022

NiklasGustafsson commented Feb 18, 2022

GeorgeS2019 commented Feb 19, 2022

NiklasGustafsson commented Feb 22, 2022

Optimizers support for parameter groups #523

Optimizers support for parameter groups #523

Conversation

NiklasGustafsson commented Feb 18, 2022

lostmsu commented Feb 18, 2022 • edited Loading

NiklasGustafsson commented Feb 18, 2022

lostmsu Feb 18, 2022

Choose a reason for hiding this comment

NiklasGustafsson Feb 18, 2022

Choose a reason for hiding this comment

lostmsu Feb 18, 2022

Choose a reason for hiding this comment

NiklasGustafsson Feb 19, 2022

Choose a reason for hiding this comment

lostmsu Feb 18, 2022

Choose a reason for hiding this comment

NiklasGustafsson Feb 18, 2022

Choose a reason for hiding this comment

NiklasGustafsson Feb 18, 2022

Choose a reason for hiding this comment

NiklasGustafsson commented Feb 18, 2022

GeorgeS2019 commented Feb 19, 2022

NiklasGustafsson commented Feb 22, 2022

lostmsu commented Feb 18, 2022 •

edited

Loading