[R-package] avoid unnecessary computation and add tests for Dataset set_reference() method #4587

jameslamb · 2021-09-02T06:45:04Z

Noticed while working on #4586.

Dataset$set_reference() allows changing the reference Dataset that a given Dataset is based on, which affects (for example) how features are binned.

Using Dataset$set_reference() to set reference to the existing object doesn't have any effect on the Dataset object. This PR proposes moving the existing check for that all the way up to the beginning of that method, to avoid any other unnecessary computation and slightly speed up this method for that case.

…method

StrikerRUS

Why do we leave # Check for empty data and # Check for non-existing reference untouched placed after setting # Set known references. I think it is misleading that some attributes are set (categorical_feature, colnames and predictor) but the whole method fails with fatal error. Doesn't it leave object in a corrupted, ok maybe inconsistent, state?

Moreover, I might be wrong but

self$set_categorical_feature(categorical_feature = reference$.__enclos_env__$private$categorical_feature)

will simply crash without

        # Reference is unknown
        if (!lgb.is.Dataset(reference)) {
          stop("set_reference: Can only use lgb.Dataset as a reference")
        }

placed before in case of wrong reference type.

jameslamb · 2021-09-02T15:14:54Z

Ah yes, you're right! Didn't notice that those should also be moved up sooner. I'll make that change here.

jameslamb · 2021-09-02T19:04:56Z

This refactoring is also making me realize that that method doesn't protect against the case where you run set_reference(reference = NULL). Right now on master, that results in

Error in self$set_colnames(colnames = reference$get_colnames()) : 
  attempt to apply non-function

I'll fix that here too

jameslamb · 2021-09-02T21:14:53Z

Ok, just moved these validations up further. I realized while doing this that there are currently no direct tests on Dataset$set_reference(), so added some of to be more confident in this change.

You can confirm that there are no direct tests on that method by running the following on master.

git grep -E "set.*ref" R-package/tests

jameslamb · 2021-09-04T15:58:25Z

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/1201242577

Status: success ✔️.

StrikerRUS

With this PR, the order of code lines in set_reference() function looks good to me, thanks!
Just two minor comments regarding new tests.

R-package/tests/testthat/test_dataset.R

Co-authored-by: Nikita Titov <[email protected]>

github-actions · 2023-08-23T16:26:09Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

[R-package] avoid unnecessary computation in Dataset set_reference() …

912be29

…method

jameslamb added the maintenance label Sep 2, 2021

jameslamb requested review from Laurae2, shiyu1994 and StrikerRUS September 2, 2021 06:45

Merge branch 'master' into r/self-reference

5d4d174

StrikerRUS reviewed Sep 2, 2021

View reviewed changes

jameslamb added 2 commits September 2, 2021 13:52

re-arrange conditions

a4d9300

Merge branch 'master' into r/self-reference

24ee8de

jameslamb changed the title ~~[R-package] avoid unnecessary computation in Dataset set_reference() method~~ [R-package] avoid unnecessary computation and add tests for Dataset set_reference() method Sep 2, 2021

do more validation upfront and add tests

8703a73

jameslamb requested a review from StrikerRUS September 4, 2021 15:57

jameslamb added the awaiting review label Sep 4, 2021

StrikerRUS approved these changes Sep 7, 2021

View reviewed changes

R-package/tests/testthat/test_dataset.R Outdated Show resolved Hide resolved

R-package/tests/testthat/test_dataset.R Show resolved Hide resolved

StrikerRUS removed the awaiting review label Sep 7, 2021

StrikerRUS and others added 3 commits September 9, 2021 16:27

Merge branch 'master' into r/self-reference

e5e3e0b

Update R-package/tests/testthat/test_dataset.R

e0889ee

Co-authored-by: Nikita Titov <[email protected]>

Update R-package/tests/testthat/test_dataset.R

9daae12

jameslamb merged commit a08c37f into master Sep 10, 2021

jameslamb deleted the r/self-reference branch September 10, 2021 04:20

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R-package] avoid unnecessary computation and add tests for Dataset set_reference() method #4587

[R-package] avoid unnecessary computation and add tests for Dataset set_reference() method #4587

jameslamb commented Sep 2, 2021

StrikerRUS left a comment

jameslamb commented Sep 2, 2021

jameslamb commented Sep 2, 2021

jameslamb commented Sep 2, 2021

jameslamb commented Sep 4, 2021 •

edited by guolinke

Loading

StrikerRUS left a comment

github-actions bot commented Aug 23, 2023

[R-package] avoid unnecessary computation and add tests for Dataset set_reference() method #4587

[R-package] avoid unnecessary computation and add tests for Dataset set_reference() method #4587

Conversation

jameslamb commented Sep 2, 2021

StrikerRUS left a comment

Choose a reason for hiding this comment

jameslamb commented Sep 2, 2021

jameslamb commented Sep 2, 2021

jameslamb commented Sep 2, 2021

jameslamb commented Sep 4, 2021 • edited by guolinke Loading

StrikerRUS left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 23, 2023

jameslamb commented Sep 4, 2021 •

edited by guolinke

Loading