Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] No warning if single alias of num_boost_round is passed #6548

Closed
wants to merge 3 commits into from

Conversation

vnherdeiro
Copy link
Contributor

@vnherdeiro vnherdeiro commented Jul 16, 2024

I see a misleading warning log line when I pass a 'num_iterations' argument my training params. There should be no alias warning in that case, only when more than one alias for the number of boosted trees is passed.

By picking the last of the alias in the possibles (available in params), the "tie breaking behaviour" should be conserved.

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your interest in lightgbm.

Can you please share a minimal, reproducible example that demonstrates the behavior you're claiming this PR fixes? You can use this as a reference: #6014 (comment)

Please also see the significant prior discussion in #6324

@jameslamb jameslamb changed the title No warning if single alias of num_boost_round is passed [python-package] No warning if single alias of num_boost_round is passed Jul 16, 2024
@vnherdeiro
Copy link
Contributor Author

Thanks for the welcome, feedback and advice @jameslamb

First, I just pushed a possible fix for the failing tests 🤞

About the aim of the MR, at my work I use LightGBM extensively in some AWS Sagemaker instances (thanks for contributing to this awesome package!). When I set the parameter n_estimators my screen is populated with the UserWarning line (because of repeated training instances). There may some non-repeat warning config missing on that cloud instance (because locally I cannot reproduce the message to show more than one time), but I also find that the warning message is not helpful. I had to go through the docs and code to be sure my parametrization was working as intended (why warn if I am using a single alias?).

Here are the before/after screenshots:
Screenshot 2024-07-18 at 22 58 35
Screenshot 2024-07-18 at 22 58 57

I read the discussion in #6324 with interest. I would defend that a warning should be raised even if the value of two+ aliases is the same. The reason being, on such instance, the user is most likely mis-using the parametrization and should be made aware of the redundant input information.

@jmoralez
Copy link
Collaborator

In your case the warning is showing up because the train function has an argument num_boost_round, which defaults to 100, so LightGBM is seeing n_estimators=1, num_boost_round=100 and warning you that it's choosing n_estimators. I believe in this case the warning is correct.

@vnherdeiro
Copy link
Contributor Author

@jmoralez I will check it tomorrow =) Will let you know!

@jameslamb
Copy link
Collaborator

@vnherdeiro there is still quite a bit to do to implement the plan agreed to in #6458. For example, cv() needs to be updated with similar behavior as train(), and tests covering those 2 functions and the scikit-learn interface need to be added.

I would really like to fix this as part of the next release (thank you for reminding us!), and we are under some time pressure to get that release out in the next 2 weeks (see #6522).

In this interest of time, would it be ok with you if I close this and put up a separate pull request addressing this? We would really love to have you contribute to LightGBM, but I think this particular change might just be a difficult first contribution.

@vnherdeiro
Copy link
Contributor Author

@jmoralez Have tested your point, indeed I had tried using different aliases, even 'num_boost_round' but always inside params, hence I couldn't get rid of the warning. Thanks for the help!

@jameslamb I am not familiar with the API and never used cv. I believe you this change needs a bigger scope. Agreed to close this PR. I do want to contribute to this project. Do you have any lower hanging fruit to suggest?

@jameslamb
Copy link
Collaborator

I believe you this change needs a bigger scope. Agreed to close this PR.

Thanks very much for understanding!

I'm working on this in #6579, so I'll close this PR.

I do want to contribute to this project. Do you have any lower hanging fruit to suggest?

Thanks very much, we appreciate it! #6361 or #6498 would be good places to start. You can @-mention me on those (or on a draft pull request you open) if you want any help.

When you do that, please use a branch other than master on your fork. I think you'll find that easier to work with, especially if you have multiple pull requests open at the same time.

@jameslamb jameslamb closed this Jul 30, 2024
@jameslamb
Copy link
Collaborator

@vnherdeiro if you're interested in seeing how I implemented + tested this, please see #6579.

Thanks again for your work here and for bringing this issue back to our attention. It's long overdue for a fix.

@vnherdeiro
Copy link
Contributor Author

@jameslamb Thanks for suggesting the changes above as possibilities of contributing. I checked your MR for this issue btw, the new behaviour looks good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants