Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[XGBoost4J-Spark] Early stopping and best iteration #6893

Closed
candalfigomoro opened this issue Apr 22, 2021 · 22 comments · Fixed by #7252
Closed

[XGBoost4J-Spark] Early stopping and best iteration #6893

candalfigomoro opened this issue Apr 22, 2021 · 22 comments · Fixed by #7252

Comments

@candalfigomoro
Copy link

This has been asked before (e.g #3140 (comment)) but no answer was ever given.

In XGBoost4J-Spark we can use early stopping by using setNumEarlyStoppingRounds.

  1. When I call transform(), does it use by default the best iteration (the best number of trees) or the best iteration + num_early_stopping_rounds?
  2. If it uses the best iteration + num_early_stopping_rounds, how can I extract the value of the best iteration so I can set treeLimit to the best iteration?

Thanks

@trivialfis
Copy link
Member

@wbo4958 probably has some insight.

@candalfigomoro
Copy link
Author

@CodingCat

@candalfigomoro
Copy link
Author

@hcho3

@wbo4958
Copy link
Contributor

wbo4958 commented Apr 30, 2021

@candalfigomoro According to the code https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j/src/main/java/ml/dmlc/xgboost4j/java/XGBoost.java#L253. Looks like "it uses the best iteration + num_early_stopping_rounds". And I have no idea how to get the values of the best iteration, Seems we need to support this.

@candalfigomoro
Copy link
Author

candalfigomoro commented Apr 30, 2021

@candalfigomoro According to the code https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j/src/main/java/ml/dmlc/xgboost4j/java/XGBoost.java#L253. Looks like "it uses the best iteration + num_early_stopping_rounds". And I have no idea how to get the values of the best iteration, Seems we need to support this.

@wbo4958
Thank you for your reply.
I think we need to expose bestScore and bestIteration attributes to be consistent with the python package (see https://xgboost.readthedocs.io/en/latest/python/python_intro.html#early-stopping) and also because I think it's a pretty important feature.

@trivialfis
Copy link
Member

@wbo4958 If you are interested in the feature, you can use the SetAttr function in XGBoost to store these attributes inside the model. Also, we support model slicing to slice up trees for retuning the best model. Feel free to ping me if you have any questions.

@wbo4958
Copy link
Contributor

wbo4958 commented Apr 30, 2021

Ok, will add this feature.

@naveenkb
Copy link
Contributor

Hi @wbo4958 - I too wanted to use this feature in spark. Just wanted to know if you were able to work on it ? If not I can give it a try.

@wbo4958
Copy link
Contributor

wbo4958 commented Jun 11, 2021

Hi @wbo4958 - I too wanted to use this feature in spark. Just wanted to know if you were able to work on it ? If not I can give it a try.

Sry, @naveenkb, I am busy with other things recently, pls help to add it . Thx very much

@naveenkb
Copy link
Contributor

XGBoostClassificationModel object has a method called getVersion(). Not much info in the documentation. Based on the experimentation I did, booster.getVersion() / 2 always returns the latest iteration even with early stopping. So ( (booster.getVersion() / 2) - earlyStoppingRound ) gives the bestIteration. Can anyone confirm this or if there are any cases when this won't work ?

@trivialfis or @wbo4958 or @CodingCat ?

@candalfigomoro
Copy link
Author

@naveenkb
Suppose that you set max iterations=100, num_early_stopping_rounds=10 and the best iteration is iteration 95. If you take the number of iterations - num_early_stopping_rounds you get iteration 90 instead of iteration 95. So it doesn't work when num_early_stopping_rounds > max iterations - best iteration. The clean solution would be to expose bestScore and bestIteration.

@naveenkb
Copy link
Contributor

@wbo4958 If you are interested in the feature, you can use the SetAttr function in XGBoost to store these attributes inside the model. Also, we support model slicing to slice up trees for retuning the best model. Feel free to ping me if you have any questions.

I have added bestIteration using SetAttr function. Regarding model slicing, I wanted to confirm that it is not implemented in Java yet right ? Please let me know if I am missing something

@candalfigomoro
Copy link
Author

@wbo4958 If you are interested in the feature, you can use the SetAttr function in XGBoost to store these attributes inside the model. Also, we support model slicing to slice up trees for retuning the best model. Feel free to ping me if you have any questions.

I have added bestIteration using SetAttr function. Regarding model slicing, I wanted to confirm that it is not implemented in Java yet right ? Please let me know if I am missing something

There's a treeLimit parameter (see https://xgboost.readthedocs.io/en/latest/jvm/scaladocs/xgboost4j-spark/ml/dmlc/xgboost4j/scala/spark/XGBoostClassificationModel.html#setTreeLimit(value:Int):XGBoostClassificationModel.this.type), but I've never tried it.

@trivialfis
Copy link
Member

trivialfis commented Jun 17, 2021

We are in the process of replacing that parameter with more robust iteration_range. Python and R have already made the transition, and JVM is the next.

@candalfigomoro
Copy link
Author

@naveenkb
Are you going to submit a Pull Request to expose bestIteration and bestScore?

@naveenkb
Copy link
Contributor

naveenkb commented Jul 8, 2021

@candalfigomoro Sure. Sorry for the delay. I will raise a PR in few days.

@jon-targaryen1995
Copy link

jon-targaryen1995 commented Jul 13, 2021

Hello,

I was going through the parameters of the XGBoost 4J spark mentioned in

https://xgboost.readthedocs.io/en/latest/jvm/scaladocs/xgboost4j-spark/ml/dmlc/xgboost4j/scala/spark/XGBoostClassificationModel.html#setTreeLimit(value:Int):XGBoostClassificationModel.this.type

The definition of numEarlyStoppingRounds: is as follows:

If non-zero, the training will be stopped after a specified number of consecutive increases in any evaluation metric.

But shouldn't it be "the training will be stopped after a specified number of consecutive non-increase (same or decrease) in any evaluation metric"

Is there any parameter through which I can set a threshold for early stopping rounds? If the evaluation metric doesn't improve by at-least the threshold within early stopping rounds, the training stops.

Thanks,
Akshay

@candalfigomoro
Copy link
Author

But shouldn't it be "the training will be stopped after a specified number of consecutive non-increase (same or decrease) in any evaluation metric"

This is tricky because some metrics need to be minimized (e.g. MSE) while other metrics need to be maximized (e.g. accuracy). See also the setMaximizeEvaluationMetrics() method.

@jon-targaryen1995
Copy link

@candalfigomoro @naveenkb

How do you expose the bestIteration and bestScore attained during training?
Is it implemented in the package?

@naveenkb
Copy link
Contributor

naveenkb commented Aug 1, 2021

How do you expose the bestIteration and bestScore attained during training?
Is it implemented in the package?

val xgbClassificationModel = xgbClassifier.fit(train)

val bestScore = xgbClassificationModel.nativeBooster.getAttr("bestScore")
val bestIteration = xgbClassificationModel.nativeBooster.getAttr("bestIteration")

@trivialfis
Copy link
Member

TODO: Follow up with documents.

@trivialfis trivialfis mentioned this issue Aug 2, 2021
5 tasks
@Shadyelgewily
Copy link

Shadyelgewily commented Aug 17, 2021

This feature would very much be appreciated for XGBoost4J (non-spark) library as well. We have a situation where the evaluation function does not necessarily decrease as the loss decreases. In fact, in our situation the evaluation function can increase when the loss decreases too far. This is deliberate: we use a quantile loss function and a custom evaluation metric to ensure that the loss function does not decrease to zero (if the loss is zero, the predictions are no longer quantiles).

The current implementation means that the model that is returned after early stopping rounds is far from optimal for many of our models, while a good performance was reached at earlier iterations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants