-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] avoid unnecessary computation and add tests for Dataset set_reference() method #4587
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we leave # Check for empty data
and # Check for non-existing reference
untouched placed after setting # Set known references
. I think it is misleading that some attributes are set (categorical_feature
, colnames
and predictor
) but the whole method fails with fatal error. Doesn't it leave object in a corrupted, ok maybe inconsistent, state?
Moreover, I might be wrong but
self$set_categorical_feature(categorical_feature = reference$.__enclos_env__$private$categorical_feature)
will simply crash without
# Reference is unknown
if (!lgb.is.Dataset(reference)) {
stop("set_reference: Can only use lgb.Dataset as a reference")
}
placed before in case of wrong reference type.
Ah yes, you're right! Didn't notice that those should also be moved up sooner. I'll make that change here. |
This refactoring is also making me realize that that method doesn't protect against the case where you run
I'll fix that here too |
Ok, just moved these validations up further. I realized while doing this that there are currently no direct tests on You can confirm that there are no direct tests on that method by running the following on git grep -E "set.*ref" R-package/tests |
/gha run r-valgrind Workflow R valgrind tests has been triggered! 🚀 Status: success ✔️. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this PR, the order of code lines in set_reference()
function looks good to me, thanks!
Just two minor comments regarding new tests.
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Noticed while working on #4586.
Dataset$set_reference()
allows changing the reference Dataset that a givenDataset
is based on, which affects (for example) how features are binned.Using
Dataset$set_reference()
to setreference
to the existing object doesn't have any effect on theDataset
object. This PR proposes moving the existing check for that all the way up to the beginning of that method, to avoid any other unnecessary computation and slightly speed up this method for that case.