Add Optimizer classes to the new frontend #4280

thiagocrepaldi · 2020-06-18T22:42:31Z

optim.config._OptimizerConfig, optim.config.AdamConfig, optim.config.LambConfig and optim.config.SGDConfig classes

orttraining/orttraining/python/training/optim/config.py

spandantiwari

LGTM.

* Add ORTTrainerOptions class for the new pytorch frontend (#4382) Add ORTTrainerOptions class and some placeholders * Add _ORTTrainerModelDesc to perform validation for model description (#4416) * Add Loss Scaler classes to the new frontend (#4306) * Add TrainStepInfo used on the new frontend API (#4256) * Add Optimizer classes to the new frontend (#4280) * Add LRScheduler implementation (#4357) * Add basic ORTTrainer API (#4435) This PR presents the public API for ORTTrainer for the short term development. It also validates and saves input parameters, which will be used in the next stages, such as building ONNX model, post processing the model and configuring the training session * Add opset_version into ORTTrainerOptions and change type of ORTTrainer.loss_fn (#4592) * Update ModelDescription and minor fix on ORTTrainer ctor (#4605) * Update ModelDescription and minor fix on ORTTrainer/ORTTrainerOptions This PR keeps the public API intact, but changes how model description is stored on the backend Currently, users creates a dict with two lists of tuples. One list called 'inputs' and each tuple has the following format tuple(name, shape). The second list is called 'outputs' and each tuple can be either tuple(name, shape) or tuple(name, shape, is_loss). With this PR, when this dict is passed in to ORTTrainer, it is fully validated as usual. However, tuples are internally replaced by namedtuples and all output tuples will have tuple(name, shape, is_loss) format instead of is_loss being optionally present. Additionally to that normalization in the internal representation (which eases coding), two internal methods were created to replace a namedtuple(name, shape) to namedtuple(name, shape, dtype) or namedtuple(name, shape, is_loss, dtype) dependeing whether the tuple is an input or output. This is necessary as ORTTRainer finds out data types of each input/output during model export to onnx. Finally, a minor fix was done on ORTTrainer. It could initialize ORTTrainerOptions incorrectly when options=None * Rename input name for test * Add ONNX Model Export to New Frontend (#4612) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <[email protected]> * Create training session + minor improvements (#4668) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Save ONNX model in file (#4671) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add eval step (#4674) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add train_step (#4677) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add LR Scheduler (#4694) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <[email protected]> * Add deterministic compute tests (#4716) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <[email protected]> * Add legacy vs experimental ORTTrainer accuracy comparison (#4727) Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <[email protected]> * Add Mixed precision/LossScaler + several fixes (#4739) Additionally to the mixed precision/loss scaler code, this PR includes: * Fix CUDA training * Add optimization_step into TrainStepInfo class * Refactor LRSCheduler to use optimization_step instead of step * Updated several default values at ORTTrainerOptions * Add initial Gradient Accumulation supported. Untested * Fix ONNX model post processing * Refactor unit tests * Add ONNX BERT example + minor fixes (#4757) * Fix training issue when passing ONNX file into ORTTrainer Co-authored-by: Thiago Crepaldi <[email protected]> Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> * Add Dynamic Shape support (#4758) * Update DeepSpeed Zero Stage option to a separate option group (#4772) * Add support to fetches (#4777) * Add Gradient Accumulation Steps support (#4793) * Fix Dynamic Axes feature and add unit test (#4795) * Add frozen weights test (#4807) * Move new pytorch front-end to 'experimental' namespace (#4814) * Fix build Co-authored-by: Rayan-Krishnan <[email protected]> Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

thiagocrepaldi added training issues related to ONNX Runtime training; typically submitted using template Frontend labels Jun 18, 2020

thiagocrepaldi requested review from BowenBao and spandantiwari June 18, 2020 22:42

thiagocrepaldi requested a review from a team as a code owner June 18, 2020 22:42

thiagocrepaldi assigned thiagocrepaldi and rayankrish Jun 18, 2020

thiagocrepaldi linked an issue Jun 18, 2020 that may be closed by this pull request

[WIP] New PyTorch frontend API #4176

Closed

thiagocrepaldi requested a review from rayankrish June 18, 2020 22:43

thiagocrepaldi force-pushed the thiagofc/new_frontend/OptimizerConfig branch from 3fb07ab to f28252f Compare June 23, 2020 21:28

liqunfu reviewed Jun 25, 2020

View reviewed changes

orttraining/orttraining/python/training/optim/config.py Outdated Show resolved Hide resolved

thiagocrepaldi force-pushed the thiagofc/new_frontend/OptimizerConfig branch from b7cf3a4 to 1913bf9 Compare June 26, 2020 22:58

spandantiwari reviewed Jun 29, 2020

View reviewed changes

orttraining/orttraining/python/training/optim/config.py Show resolved Hide resolved

spandantiwari reviewed Jun 29, 2020

View reviewed changes

orttraining/orttraining/python/training/optim/config.py Show resolved Hide resolved

spandantiwari reviewed Jun 29, 2020

View reviewed changes

orttraining/orttraining/python/training/optim/config.py Outdated Show resolved Hide resolved

thiagocrepaldi changed the base branch from master to feature/new_pytorch_frontend July 2, 2020 23:13

thiagocrepaldi force-pushed the thiagofc/new_frontend/OptimizerConfig branch from 6185286 to 65a1803 Compare July 8, 2020 23:52

rayankrish approved these changes Jul 10, 2020

View reviewed changes

rayankrish reviewed Jul 10, 2020

View reviewed changes

orttraining/orttraining/python/training/optim/config.py Outdated Show resolved Hide resolved

thiagocrepaldi force-pushed the thiagofc/new_frontend/OptimizerConfig branch from 262feaa to e8a5e49 Compare July 15, 2020 21:26

thiagocrepaldi requested review from liqunfu and spandantiwari July 17, 2020 00:02

spandantiwari approved these changes Jul 17, 2020

View reviewed changes

Thiago Crepaldi added 7 commits July 17, 2020 09:57

Initial implementation

823458f

Fix bugs and add tests

8d69cbc

Update _OptimizerConfig constructor to fulfill param_groups

d0f46d3

Fix param_groups behavior for Adam and Lamb

c12b384

Rename Optimizer classes

51bd478

Add base_lrs to the _OptimizerConfig

d90b622

Address Spandan feedback

b109768

Thiago Crepaldi added 7 commits July 17, 2020 10:15

Rename test file

e151e2a

Fix unit test

c94b46d

Rename _OptimizerConfig params

e108035

Update documentation

39a1ae2

Refactor AdamConfig.weight_decay_mode

08ff37b

Add doc examples

d06ffe9

Fix optim.__init__.py

7719466

thiagocrepaldi force-pushed the thiagofc/new_frontend/OptimizerConfig branch from 521371a to 7719466 Compare July 17, 2020 17:19

thiagocrepaldi merged commit 71ef790 into feature/new_pytorch_frontend Jul 17, 2020

thiagocrepaldi deleted the thiagofc/new_frontend/OptimizerConfig branch July 17, 2020 17:21

thiagocrepaldi mentioned this pull request Jul 31, 2020

[WIP] New PyTorch frontend API #4176

Closed

thiagocrepaldi removed a link to an issue Jul 31, 2020

[WIP] New PyTorch frontend API #4176

Closed

thiagocrepaldi pushed a commit that referenced this pull request Aug 12, 2020

Add Optimizer classes to the new frontend (#4280)

dc152f1

thiagocrepaldi pushed a commit that referenced this pull request Aug 14, 2020

Add Optimizer classes to the new frontend (#4280)

be32a99

thiagocrepaldi pushed a commit that referenced this pull request Aug 15, 2020

Add Optimizer classes to the new frontend (#4280)

076be40

thiagocrepaldi pushed a commit that referenced this pull request Aug 15, 2020

Add Optimizer classes to the new frontend (#4280)

da8b45b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Optimizer classes to the new frontend #4280

Add Optimizer classes to the new frontend #4280

thiagocrepaldi commented Jun 18, 2020 •

edited

Loading

spandantiwari left a comment

Add Optimizer classes to the new frontend #4280

Add Optimizer classes to the new frontend #4280

Conversation

thiagocrepaldi commented Jun 18, 2020 • edited Loading

spandantiwari left a comment

Choose a reason for hiding this comment

thiagocrepaldi commented Jun 18, 2020 •

edited

Loading