Learning rate decay #4955

guyko81 · 2019-10-16T13:46:53Z

I have been thinking about robust model building on Important Outliers - outliers that matters so we don't want the model to add predictions closer to the average value. That led me the idea that implementing a learning rate decay would add the model more flexibility as in the beginning it could catch the trivial rules (a trivial case decision tree in the beginning) due to high learning rate and later in the training it could fine tune the result.

I have made my own GBTree with this additional feature and found that the model learnt much faster as well! Not a big coding but great improvement in performance. I have not tested the robustness (e.g. overfitting) of the model on many datasets but the one I used on showed promising results - no overfit.

I used learning_rate_start=0.5, learning_rate_min=0.01 and lr_decay=0.95.
First iteration:
lr = learning_rate_start
At each iteration afterwards the following rule applies:
lr = max(learning_rate_min, lr*lr_decay)

The text was updated successfully, but these errors were encountered:

trivialfis · 2019-10-16T14:59:22Z

We have a callback for setting learning rate although I haven't really use it myself. Would it help even simplifying things further?

guyko81 · 2019-10-16T15:53:24Z

My mistake, I was not aware of this callback.
However a rename might be useful, maybe learning_rate_schedule, rather than reset_learning_rate?

And an example code would be nice too.
Should the following work? (I can't try right now, no XGBoost on work computer)

def learning_rate_decay(boosting_round, num_boost_round):
    learning_rate_start = 0.5
    learning_rate_min = 0.01
    lr_decay = 0.95
    lr = learning_rate_start * np.power(lr_decay, boosting_round)
    return max(learning_rate_min, lr)

xgclassifier = xgb.train(params, xgb_dtrain, num_rounds,  callbacks = [xgb.callback.reset_learning_rate(learning_rate_decay)])

and with the new naming (I think it's more obvious):
xgclassifier = xgb.train(params, xgb_dtrain, num_rounds, callbacks = [xgb.callback.learning_rate_schedule(learning_rate_decay)])

trivialfis · 2019-10-19T19:24:26Z

@guyko81 Thanks for the suggestion. I agree with your naming scheme, will keep this open so we can have better documentation for callbacks.

hcho3 · 2020-09-27T07:57:07Z

To new contributors: If you're reading this and interested in writing the document for learning rate decay, please comment here. Feel free to ping me with questions. I am available for help.

trivialfis added the doc label Dec 20, 2019

trivialfis mentioned this issue Jun 18, 2020

[POC] Implement callback for dask. #5612

Closed

5 tasks

hcho3 added good first issue hacktoberfest labels Sep 27, 2020

trivialfis mentioned this issue Oct 4, 2020

Rework Python callback functions. #6199

Merged

trivialfis closed this as completed in #6199 Oct 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learning rate decay #4955

Learning rate decay #4955

guyko81 commented Oct 16, 2019

trivialfis commented Oct 16, 2019

guyko81 commented Oct 16, 2019

trivialfis commented Oct 19, 2019

hcho3 commented Sep 27, 2020

Learning rate decay #4955

Learning rate decay #4955

Comments

guyko81 commented Oct 16, 2019

trivialfis commented Oct 16, 2019

guyko81 commented Oct 16, 2019

trivialfis commented Oct 19, 2019

hcho3 commented Sep 27, 2020