[MXNET-535] Fix bugs in LR Schedulers and add warmup #11234

rahul003 · 2018-06-11T23:59:54Z

Description

Adds warmup to all LR schedulers. Of two modes, linear increase and constant warmup
Also fixes inconsistencies/bugs where base_lr is not taken by MultiFactorScheduler and FactorScheduler.
Added tests for LR schedulers

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

piiswrong · 2018-06-18T21:55:02Z

python/mxnet/lr_scheduler.py

@@ -153,18 +153,57 @@ class PolyScheduler(LRScheduler):

    """

-    def __init__(self, max_update, base_lr=0.01, pwr=2):


don't remove base_lr, it will break API.
Pass it to super init instead

piiswrong · 2018-06-18T22:01:59Z

python/mxnet/lr_scheduler.py

+        if warmup_steps <= 0:
+            raise ValueError("Warmup steps has to be positive")
+        self.warmup_steps = warmup_steps
+        self.lrs_updates = {}


what's the point of this cache? Looks like it will always miss

We would have for each batch, number of calls to call equal to the number of learnable parameter arrays.

piiswrong · 2018-06-18T22:06:28Z

python/mxnet/lr_scheduler.py

+                self.lrs_updates[num_update] = self.lr_begin + increase
+            else:
+                if isinstance(self.scheduler, PolyScheduler):
+                    self.lrs_updates[num_update] = self.scheduler(num_update - self.warmup_steps)


Why special case for PolyScheduler?
Is num_update - self.warmup_steps standard? Does Tf or Pytorch do it this way?
Why not num_update directly?

PolyScheduler, and CosineScheduler(not implemented here) reduce lr from a "starting lr" to an "ending lr" smoothly, for example from 0.1 to 0.

With warmup, we first increase the lr from a small value (e.g. 0) to the starting lr, then apply the main scheduler. Assuming we have warmup for the first 5 epochs, and the total training epochs is 90, then the effective number of epochs for the poly scheduler is 85.

As for piecewise-constant/factor scheduler, it decays the lr only at certain points, on which the warmup stage has no effect.

Yeah for the above reason, but I'm updating the code to remove this special case, and pass a wamup_steps param to such schedulers so we can handle it cleanly

rahul003 · 2018-06-27T22:45:49Z

Improved how warmup is handled, please review

rahul003 · 2018-07-05T17:52:45Z

@piiswrong Could you please review? This has a fix for an important bug where the MultiFactorScheduler didn't take a base_lr previously. This meant that the example/image_classification/ scripts didn't use the given LR correctly. It would drop from x to 0.001 and 0.0001 regardless of the LR given.

rahul003 · 2018-07-19T06:41:59Z

@piiswrong @szha @eric-haibin-lin Please review

rahul003 · 2018-08-14T17:05:45Z

@szha @eric-haibin-lin please review

eric-haibin-lin · 2018-08-15T22:46:46Z

python/mxnet/lr_scheduler.py

@@ -29,8 +30,31 @@ class LRScheduler(object):
    base_lr : float, optional
        The initial learning rate.
    """
-    def __init__(self, base_lr=0.01):
+    def __init__(self, base_lr=0.01, warmup_steps=0, warmup_begin_lr=0, warmup_mode='linear'):


Do you mind adding documentation for warmup_begin_lr?

There was some for the inherited classes, but not for this base abstract class. Anyway, now added for all. Please check.

piiswrong · 2018-08-15T22:51:45Z

python/mxnet/lr_scheduler.py

    """

-    def __init__(self, max_update, base_lr=0.01, pwr=2):
-        super(PolyScheduler, self).__init__(base_lr)
+    def __init__(self, max_update, base_lr=0.01, final_lr=0,


why did you remove pwr? This is API breakage

I've not removed it. Git is getting confused :/ It thinks I've changed PolyScheduler to CosineScheduler when in fact I've modified PolyScheduler and added a new CosineScheduler.

Please refer #11234 (comment)

rahul003 · 2018-08-17T22:01:26Z

Hopefully this will give committers confidence to merge

Interfaces

CosineScheduler

def __init__(self, max_update, base_lr=0.01, final_lr=0,
                 warmup_steps=0, warmup_begin_lr=0, warmup_mode='linear'):

PolyScheduler

This PR

def __init__(self, max_update, base_lr=0.01, pwr=2, final_lr=0,
                 warmup_steps=0, warmup_begin_lr=0, warmup_mode='linear'):

Earlier

def __init__(self, max_update, base_lr=0.01, pwr=2):

MultiFactorScheduler

This PR

def __init__(self, step, factor=1, base_lr=0.01, warmup_steps=0, warmup_begin_lr=0,
                 warmup_mode='linear'):

Earlier

def __init__(self, step, factor=1):

FactorScheduler

This PR

def __init__(self, step, factor=1, stop_factor_lr=1e-8, base_lr=0.01,
                 warmup_steps=0, warmup_begin_lr=0, warmup_mode='linear'):

Earlier

def __init__(self, step, factor=1, stop_factor_lr=1e-8):

Scheduler

This PR

def __init__(self, base_lr=0.01,
                 warmup_steps=0, warmup_begin_lr=0, warmup_mode='linear'):

Earlier

def __init__(self, base_lr=0.01):

Plots of LR decay from unit tests

* Add warmup and fix inconsistencies with learning rate schedulers * add comments * remove assert

rahul003 requested a review from szha as a code owner June 11, 2018 23:59

rahul003 changed the title ~~[MXNET-535] Add Warmup Learning Rate Scheduler and fix inconsistencies in LR Schedulers~~ [MXNET-535] Add Warmup Learning Rate Scheduler and fix bugs in LR Schedulers Jun 18, 2018

piiswrong suggested changes Jun 18, 2018

View reviewed changes

rahul003 changed the title ~~[MXNET-535] Add Warmup Learning Rate Scheduler and fix bugs in LR Schedulers~~ [MXNET-535] [WIP] Add Warmup Learning Rate Scheduler and fix bugs in LR Schedulers Jun 21, 2018

rahul003 changed the title ~~[MXNET-535] [WIP] Add Warmup Learning Rate Scheduler and fix bugs in LR Schedulers~~ [MXNET-535] Add Warmup Learning Rate Scheduler and fix bugs in LR Schedulers Jul 5, 2018

rahul003 changed the title ~~[MXNET-535] Add Warmup Learning Rate Scheduler and fix bugs in LR Schedulers~~ [MXNET-535] Add Warmup to learning rate schedulers and fix bugs in LR Schedulers Jul 6, 2018

rahul003 changed the title ~~[MXNET-535] Add Warmup to learning rate schedulers and fix bugs in LR Schedulers~~ [MXNET-535] Fix bugs in LR Schedulers and add warmup Jul 24, 2018

Add warmup and fix inconsistencies with learning rate schedulers

9c96c40

rahul003 force-pushed the warmup branch from 255ef9b to 9c96c40 Compare August 7, 2018 07:35

eric-haibin-lin reviewed Aug 15, 2018

View reviewed changes

piiswrong reviewed Aug 15, 2018

View reviewed changes

add comments

44a5af4

rahul003 force-pushed the warmup branch from 52397d9 to e80424e Compare August 17, 2018 22:10

remove assert

9ca2dd9

rahul003 force-pushed the warmup branch from e80424e to 9ca2dd9 Compare August 17, 2018 22:11

eric-haibin-lin merged commit 48d2155 into apache:master Aug 26, 2018

anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this pull request Sep 19, 2018

[MXNET-535] Fix bugs in LR Schedulers and add warmup (apache#11234)

0b58ceb

* Add warmup and fix inconsistencies with learning rate schedulers * add comments * remove assert

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MXNET-535] Fix bugs in LR Schedulers and add warmup #11234

[MXNET-535] Fix bugs in LR Schedulers and add warmup #11234

rahul003 commented Jun 11, 2018 •

edited

Loading

piiswrong Jun 18, 2018

piiswrong Jun 18, 2018

rahul003 Jun 20, 2018

piiswrong Jun 18, 2018

hetong007 Jun 20, 2018

rahul003 Jun 20, 2018

rahul003 commented Jun 27, 2018

rahul003 commented Jul 5, 2018

rahul003 commented Jul 19, 2018

rahul003 commented Aug 14, 2018

eric-haibin-lin Aug 15, 2018

rahul003 Aug 17, 2018

piiswrong Aug 15, 2018

rahul003 Aug 16, 2018 •

edited

Loading

rahul003 Aug 17, 2018

rahul003 commented Aug 17, 2018 •

edited

Loading

		@@ -153,18 +153,57 @@ class PolyScheduler(LRScheduler):

		"""

		def __init__(self, max_update, base_lr=0.01, pwr=2):

[MXNET-535] Fix bugs in LR Schedulers and add warmup #11234

[MXNET-535] Fix bugs in LR Schedulers and add warmup #11234

Conversation

rahul003 commented Jun 11, 2018 • edited Loading

Description

Checklist

Essentials

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rahul003 commented Jun 27, 2018

rahul003 commented Jul 5, 2018

rahul003 commented Jul 19, 2018

rahul003 commented Aug 14, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rahul003 Aug 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rahul003 commented Aug 17, 2018 • edited Loading

Interfaces

CosineScheduler

PolyScheduler

This PR

Earlier

MultiFactorScheduler

This PR

Earlier

FactorScheduler

This PR

Earlier

Scheduler

This PR

Earlier

Plots of LR decay from unit tests

rahul003 commented Jun 11, 2018 •

edited

Loading

rahul003 Aug 16, 2018 •

edited

Loading

rahul003 commented Aug 17, 2018 •

edited

Loading