Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] avoid unnecessary computation and add tests for Dataset set_reference() method #4587

Merged
merged 8 commits into from
Sep 10, 2021

Conversation

jameslamb
Copy link
Collaborator

Noticed while working on #4586.

Dataset$set_reference() allows changing the reference Dataset that a given Dataset is based on, which affects (for example) how features are binned.

Using Dataset$set_reference() to set reference to the existing object doesn't have any effect on the Dataset object. This PR proposes moving the existing check for that all the way up to the beginning of that method, to avoid any other unnecessary computation and slightly speed up this method for that case.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we leave # Check for empty data and # Check for non-existing reference untouched placed after setting # Set known references. I think it is misleading that some attributes are set (categorical_feature, colnames and predictor) but the whole method fails with fatal error. Doesn't it leave object in a corrupted, ok maybe inconsistent, state?

Moreover, I might be wrong but

self$set_categorical_feature(categorical_feature = reference$.__enclos_env__$private$categorical_feature)

will simply crash without

        # Reference is unknown
        if (!lgb.is.Dataset(reference)) {
          stop("set_reference: Can only use lgb.Dataset as a reference")
        }

placed before in case of wrong reference type.

@jameslamb
Copy link
Collaborator Author

Ah yes, you're right! Didn't notice that those should also be moved up sooner. I'll make that change here.

@jameslamb
Copy link
Collaborator Author

This refactoring is also making me realize that that method doesn't protect against the case where you run set_reference(reference = NULL). Right now on master, that results in

Error in self$set_colnames(colnames = reference$get_colnames()) : 
  attempt to apply non-function

I'll fix that here too

@jameslamb jameslamb changed the title [R-package] avoid unnecessary computation in Dataset set_reference() method [R-package] avoid unnecessary computation and add tests for Dataset set_reference() method Sep 2, 2021
@jameslamb
Copy link
Collaborator Author

Ok, just moved these validations up further. I realized while doing this that there are currently no direct tests on Dataset$set_reference(), so added some of to be more confident in this change.

You can confirm that there are no direct tests on that method by running the following on master.

git grep -E "set.*ref" R-package/tests

@jameslamb
Copy link
Collaborator Author

jameslamb commented Sep 4, 2021

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/1201242577

Status: success ✔️.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this PR, the order of code lines in set_reference() function looks good to me, thanks!
Just two minor comments regarding new tests.

R-package/tests/testthat/test_dataset.R Outdated Show resolved Hide resolved
R-package/tests/testthat/test_dataset.R Show resolved Hide resolved
@jameslamb jameslamb merged commit a08c37f into master Sep 10, 2021
@jameslamb jameslamb deleted the r/self-reference branch September 10, 2021 04:20
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants