Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] fix segfaults caused by missing Booster and Dataset handles (fixes #4208) #4586

Merged
merged 12 commits into from
Sep 25, 2021

Conversation

jameslamb
Copy link
Collaborator

Fixes #4208, as part of #4310.

Currently, the R package can segfault if Booster or Dataset methods are called on an object where the handle to the C++ side is null. This situation can happen if, for example, someone uses saveRDS() on one of these objects and then loads it with readRDS(). That is exactly what happens when restarting an R session, if you have R configured to save your workspace and reload it automatically.

This PR proposes changes to ensure that an informative error is raised in such situations, instead of a segfault.

Notes for Reviewers

#4296 documents the desire for {lightgbm} to guarantee that a Booster / Dataset saved with saveRDS() is usable after loading it with readRDS(). That might be addressed in a future release, but it's out of scope for release 3.3.0 and this PR. At least as of this PR, users will experience informative errors instead of their R sessions crashing.

I was not able to trigger the _AssertDatasetHandleNotNull() error message in tests, since all paths user-facing paths through the Dataset object either already have protective checks using lgb.is.null.handle() or raise other errors before even trying to invoke C++ code. But I still think those asserts are worth adding, to catch errors that might be introduced in future refactorings of the R package or in other code paths that are missed in tests.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Sep 2, 2021

/gha run r-solaris

Workflow Solaris CRAN check has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/1195959874

solaris-x86-patched: https://builder.r-hub.io/status/lightgbm_3.2.1.99.tar.gz-1b63d7a7e8194345af9ede7ebb6b95ae
solaris-x86-patched-ods: https://builder.r-hub.io/status/lightgbm_3.2.1.99.tar.gz-48da4a8f64fc459abfa454d49ea6dc40
Reports also have been sent to LightGBM public e-mail: http://www.yopmail.com/lightgbm_rhub_checks
Status: success ✔️.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Sep 2, 2021

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/1195960141

Status: failure ❌.

@jameslamb jameslamb changed the title WIP: [R-package] fix segfaults caused by missing Booster and Dataset handles (fixes #4208) [R-package] fix segfaults caused by missing Booster and Dataset handles (fixes #4208) Sep 2, 2021
@jameslamb jameslamb marked this pull request as ready for review September 2, 2021 23:16
@jameslamb
Copy link
Collaborator Author

valgrind just timed out after 3 hours, trying again 😫

https://github.com/microsoft/LightGBM/runs/3500789328?check_suite_focus=true

@jameslamb
Copy link
Collaborator Author

jameslamb commented Sep 3, 2021

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/1196366556

Status: success ✔️.

@jameslamb
Copy link
Collaborator Author

I see this in one place in the test logs for the valgrind job: https://github.com/microsoft/LightGBM/runs/3501825714?check_suite_focus=true

......................Error in self$construct() : 
  Attempting to create a Dataset without any raw data. This can happen if you have called Dataset$finalize() or if this Dataset was saved with saveRDS(). To avoid this error in the future, use lgb.Dataset.save() or Dataset$save_binary() to save lightgbm Datasets.

Hard to tell exactly which test, but I see saveRDS.lgb.Booster() and readRDS.lgb.Booster(): shortly after, so it must be between these lines.

context("saveRDS.lgb.Booster() and readRDS.lgb.Booster()")

Interestingly, I don't see that same error in the R 4.0 Windows MSVC job.

https://github.com/microsoft/LightGBM/pull/4586/checks?check_run_id=3500712973

And I'm also curious why that is showing up in the logs but the valgrind test didn't actually fail.

Will have to investigate it tomorrow.

@jameslamb jameslamb mentioned this pull request Sep 3, 2021
21 tasks
@jameslamb
Copy link
Collaborator Author

Ah ok, figured it out! The error in the logs of #4586 (comment) is not an issue.

It's coming from this new test added in this PR:

test_that("Booster$new() using a Dataset with a null handle should raise an informative error and not segfault", {
    data(agaricus.train, package = "lightgbm")
    train <- agaricus.train
    dtrain <- lgb.Dataset(train$data, label = train$label)
    dtrain$construct()
    tmp_file <- tempfile(fileext = ".bin")
    saveRDS(dtrain, tmp_file)
    rm(dtrain)
    dtrain <- readRDS(tmp_file)
    expect_error({
        bst <- Booster$new(train_set = dtrain)
    }, regexp = "lgb.Booster: cannot create Booster handle")
})

That "error" is intentionally being caught in this try-catch:

stop("lgb.Booster: cannot create Booster handle")

So when users actually run code similar to that test, they'll get the error "cannot create Booster handle" but then also see the printed text about "Attempting to create a Dataset...".

image

Interestingly, I don't see that same error in the R 4.0 Windows MSVC job.

This also is not a problem. It's probably because of how we have to redirect stderr in Windows CI jobs.

function Run-R-Code-Redirect-Stderr {

@jameslamb
Copy link
Collaborator Author

jameslamb commented Sep 10, 2021

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/1219956491

Status: failure ❌.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Sep 10, 2021

/gha run r-solaris

Workflow Solaris CRAN check has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/1219956598

solaris-x86-patched: https://builder.r-hub.io/status/lightgbm_3.2.1.99.tar.gz-1883c3e26f194d248848dc03add599c4
Reports also have been sent to LightGBM public e-mail: https://yopmail.com?lightgbm_rhub_checks
Status: failure ❌.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has a lot of potential merge conflicts with #4597 PR. Which one would you like to merge first?

R-package/src/lightgbm_R.cpp Outdated Show resolved Hide resolved
R-package/src/lightgbm_R.cpp Outdated Show resolved Hide resolved
R-package/tests/testthat/test_dataset.R Outdated Show resolved Hide resolved
@jameslamb
Copy link
Collaborator Author

This PR has a lot of potential merge conflicts with #4597 PR. Which one would you like to merge first?

Let's merge #4597 first. I'd rather take on the pain of merge conflicts than inflict them on a non-maintainer 😂

@jameslamb
Copy link
Collaborator Author

I'll wait to update this until #4613 is also merged.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Sep 24, 2021

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/1268255361

Status: success ✔️.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Sep 24, 2021

/gha run r-solaris

Workflow Solaris CRAN check has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/1268255876

solaris-x86-patched: https://builder.r-hub.io/status/lightgbm_3.2.1.99.tar.gz-10422aa86126471181da15ab410b3506
solaris-x86-patched-ods: https://builder.r-hub.io/status/lightgbm_3.2.1.99.tar.gz-631293fa70e74412ab732377555a63e7
Reports also have been sent to LightGBM public e-mail: https://yopmail.com?lightgbm_rhub_checks
Status: success ✔️.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Sep 24, 2021

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/1270176694

Status: success ✔️.

@jameslamb
Copy link
Collaborator Author

jameslamb commented Sep 24, 2021

/gha run r-solaris

Workflow Solaris CRAN check has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/1270176972

solaris-x86-patched: https://builder.r-hub.io/status/lightgbm_3.2.1.99.tar.gz-1aecf804abfa4417a2b4e10e1f7275f4
solaris-x86-patched-ods: https://builder.r-hub.io/status/lightgbm_3.2.1.99.tar.gz-d7bfdc73fab743d6b7443781fc6130d1
Reports also have been sent to LightGBM public e-mail: https://yopmail.com?lightgbm_rhub_checks
Status: success ✔️.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

@shiyu1994 shiyu1994 merged commit f8010d6 into master Sep 25, 2021
@StrikerRUS StrikerRUS deleted the fix/r-segfaults branch September 26, 2021 00:01
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[R-package] R handles produce segmentation faults when de-serialized
3 participants