[python] raise an informative error instead of segfaulting when custom objective produces incorrect output #4815

yaxxie · 2021-11-19T18:56:53Z

Using the following code can cause a segfault due to LGBM_BoosterUpdateOneIterCustom making an assumption that the passed float arrays match the length of the training data.

import lightgbm
import numpy

X = numpy.random.randn(10_000_000, 5)
Y = numpy.random.choice([0, 1], 10_000_000)

ds = lightgbm.Dataset(X, Y)

def bad_grads(x, y):
    return numpy.random.randn(2), numpy.random.rand(2)

lightgbm.train({}, ds, fobj=bad_grads)

A faulty fobj function can cause at best totally incorrect boosting and at worst segmentation fault. This small patch prevents this from occurring. I noticed it while adding support for LGBM_BoosterUpdateOneIterCustom to the julia package (see https://github.com/IQVIA-ML/LightGBM.jl/pull/114/files)

StrikerRUS · 2021-11-20T01:32:26Z

@yaxxie Thanks a lot for this PR!

making an assumption that the passed float arrays match the length of the training data.

This assumption is wrong. grad and hess should be length of n_samples * num_model_per_iteration. num_model_per_iteration equals to 1 for regression and binary classification and #classes for multiclass classification.

Also, what do you think about moving this check to cpp side

LightGBM/src/boosting/goss.hpp

Line 59 in d517ba1

bool TrainOneIter(const score_t* gradients, const score_t* hessians) override {

LightGBM/src/boosting/gbdt.cpp

Line 371 in b0137de

bool GBDT::TrainOneIter(const score_t* gradients, const score_t* hessians) {

so that all language packages will benefit from it?

yaxxie · 2021-11-20T10:26:30Z

Thanks @StrikerRUS I figured the assumption would be wrong when I saw some failing tests

If we move this to the CPP side, it would require an API change, right?

StrikerRUS · 2021-11-20T13:45:25Z

If we move this to the CPP side, it would require an API change, right?

Sorry, could you please clarify what API changes do you mean?

I believe something like

int64_t total_size = static_cast<int64_t>(num_data_) * num_tree_per_iteration_;
CHECK_EQ(hessians.size(), hessians.size());
CHECK_EQ(hessians.size(), total_size);

would be enough.

yaxxie · 2021-11-20T15:12:51Z

Happy to try this out, but I can't see how it would work. When the data array comes from outside the lib, how do we know its size?

StrikerRUS · 2021-11-20T21:19:24Z

When the data array comes from outside the lib, how do we know its size?

Sorry, didn't get your question.

yaxxie · 2021-11-23T20:39:32Z

The gradients and hessians variables are arrays of float, not std::vector with size methods, and this makes sense because sometimes they are generated internally (so the gradients_ and hessians_ variables are std::vector), but when they come from outside the library (as in being passed the pointers from LGBM_BoosterUpdateOneIterCustom) we do not (cannot) know their sizes.

The other C-API points will pass size variables where they need to know them (such as when we construct from mat and we need a size variable there, or when we retrieve string names and we need to know how much was allocated to the underlying buffer so as not to overwrite past allocated memory).

This is why the API can segfault when passed incorrectly sized gradients, since, an underlying assumption is made about the size of the allocated buffers which is not (and cannot) be verified by the library itself. We could have the user pass the sizes to the API call, but this would then still require changes at all language implementations.

I'm happy for someone to point out where I've gone wrong with the above.

(I tried the code sample provided, it won't compile because float*'s don't have size method)

/home/yaxattax/git/LightGBM/include/LightGBM/utils/log.h:35:9: note: in definition of macro ‘CHECK’
   35 |   if (!(condition))                                                         \
      |         ^~~~~~~~~
/home/yaxattax/git/LightGBM/src/boosting/gbdt.cpp:385:3: note: in expansion of macro ‘CHECK_EQ’
  385 |   CHECK_EQ(hessians.size(), gradients.size());
      |   ^~~~~~~~
/home/yaxattax/git/LightGBM/src/boosting/gbdt.cpp:386:21: error: request for member ‘size’ in ‘hessians’, which is of non-class type ‘const score_t*’ {aka ‘const float*’}
  386 |   CHECK_EQ(hessians.size(), total_size);
      |                     ^~~~

…lassifiers

shiyu1994 · 2021-11-27T05:48:35Z

@yaxxie Thanks for working on this! Yes, I think there should be a change in C API. Currently the C API only accepts the pointers to the gradients and hessians. If we want to know the length of the array allocated outside C API, we must add new parameters. But the change in C API may require changes in the code of every language packages.

So maybe doing the check in the Python and R side is preferable. @StrikerRUS WDYT.

yaxxie · 2021-11-28T01:29:56Z

So maybe doing the check in the Python and R side is preferable.

We'd also want to update the docs for C-API to make it explicit that there is a length expectation for the array of floats passed.

StrikerRUS · 2021-11-28T23:39:07Z

@yaxxie Sorry, I didn't notice what type gradients and hessians actually are. I confused them with gradients_ and hessians_.

@shiyu1994

If we want to know the length of the array allocated outside C API, we must add new parameters. But the change in C API may require changes in the code of every language packages.

Can we use the fact that those arrays should always be the length of static_cast<int64_t>(num_data_) * num_tree_per_iteration_? I believe new parameters for actual lengths makes a little sense here, as we already know their size in correct API usage scenario.

So maybe doing the check in the Python and R side is preferable.

If we cannot do any checks with raw pointers, I'm OK with this way.

shiyu1994 · 2021-11-30T07:30:35Z

Can we use the fact that those arrays should always be the length of static_cast<int64_t>(num_data_) * num_tree_per_iteration_?

@StrikerRUS I think fix itself is the guarantee this. To check the size in C API side, we must know the exact length of the allocated array in the Python side.

StrikerRUS

Please consider checking some my minor suggestions below:

python-package/lightgbm/basic.py

Co-authored-by: Nikita Titov <[email protected]>

yaxxie · 2021-12-01T00:11:54Z

@StrikerRUS Thanks will get around to doing these properly soon
I want to add a simple test and also I think it would be worthwhile updating the C-API documentation to state this requirement upon the caller -- could you point me in the right place to make this change?

StrikerRUS · 2021-12-02T01:38:15Z

@yaxxie Thank you!
Basic tests in which you call C API directly should be added in the following file:
https://github.com/microsoft/LightGBM/blob/master/tests/c_api_test/test_.py
For basic Python tests where you don't use train() or cv() function from engine.py file you should use this file
https://github.com/microsoft/LightGBM/blob/master/tests/python_package_test/test_basic.py
If you want to add cpp tests, they should go to somewhere in this folder
https://github.com/microsoft/LightGBM/tree/master/tests/cpp_tests

As for modifying C API docs, edit comments in Doxygen format in this file
https://github.com/microsoft/LightGBM/blob/master/include/LightGBM/c_api.h

Co-authored-by: Nikita Titov <[email protected]>

yaxxie · 2021-12-21T17:50:28Z

@StrikerRUS Anything else needed?

StrikerRUS · 2021-12-22T01:27:48Z

@jameslamb @shiyu1994 Would you like to be a second reviewer for this PR?

jameslamb

Thanks very much for this!

I followed the conversation with @StrikerRUS and @shiyu1994 and understand why we've chosen to do this check on the Python / R side instead of in C/C++.

Please see two small suggestions to make the tests slightly stricter. Would you please also write up a feature request at https://github.com/microsoft/LightGBM/issues documenting the need to do this same work for the R package?

jameslamb · 2021-12-22T02:16:11Z

tests/python_package_test/test_basic.py

+    bad_bst_multi = lgb.Booster({'objective': "none", "num_class": len(classes)}, ds_multiclass)
+    good_bst_multi = lgb.Booster({'objective': "none", "num_class": len(classes)}, ds_multiclass)
+    good_bst_binary.update(fobj=_good_gradients)
+    with pytest.raises(ValueError):


Suggested change

with pytest.raises(ValueError):

with pytest.raises(ValueError, match="number of models per one iteration (1)"):

Could you please use match to look for a specific error message? That way, this test won't silently pass if a change results in lightgbm raising a different, unrelated ValueError.

jameslamb · 2021-12-22T02:16:40Z

tests/python_package_test/test_basic.py

+    with pytest.raises(ValueError):
+        bad_bst_binary.update(fobj=_bad_gradients)
+    good_bst_multi.update(fobj=_good_gradients)
+    with pytest.raises(ValueError):


Suggested change

with pytest.raises(ValueError):

with pytest.raises(ValueError, match="number of models per one iteration (3)"):

jameslamb · 2021-12-22T16:14:51Z

@yaxxie @StrikerRUS I just changed the title of this PR to hopefully be a bit more informative for the purposes of release notes

yaxxie · 2021-12-22T23:26:04Z

@jameslamb I opened #4905 and pushed commit to address your remarks. Please do let me know if anything else is required.

jameslamb

Thanks for the test changes and for opening #4905! Please see a few more small suggestions.

jameslamb · 2021-12-23T04:26:32Z

tests/python_package_test/test_basic.py

+    bad_bst_multi = lgb.Booster({'objective': "none", "num_class": len(classes)}, ds_multiclass)
+    good_bst_multi = lgb.Booster({'objective': "none", "num_class": len(classes)}, ds_multiclass)
+    good_bst_binary.update(fobj=_good_gradients)
+    with pytest.raises(ValueError, match="number of models per one iteration \(1\)"):


Suggested change

with pytest.raises(ValueError, match="number of models per one iteration \(1\)"):

with pytest.raises(ValueError, match="number of models per one iteration \\(1\\)"):

Please see these linting errors seen in https://github.com/microsoft/LightGBM/runs/4613473477?check_suite_focus=true

./tests/python_package_test/test_basic.py:604:78: W605 invalid escape sequence '('
./tests/python_package_test/test_basic.py:604:81: W605 invalid escape sequence ')'
./tests/python_package_test/test_basic.py:607:79: W605 invalid escape sequence '('
./tests/python_package_test/test_basic.py:607:95: W605 invalid escape sequence ')'

Please don't escape any symbols for the readability purpose. Just add a * symbol at the end:

Suggested change

with pytest.raises(ValueError, match="number of models per one iteration \(1\)"):

with pytest.raises(ValueError, match="number of models per one iteration (1) *"):

The escape is necessary; ( and ) are characters which means something to the regular expression engine. I'll switch to using re.escape

jameslamb · 2021-12-23T04:26:43Z

tests/python_package_test/test_basic.py

+    with pytest.raises(ValueError, match="number of models per one iteration \(1\)"):
+        bad_bst_binary.update(fobj=_bad_gradients)
+    good_bst_multi.update(fobj=_good_gradients)
+    with pytest.raises(ValueError, match=f"number of models per one iteration \({len(classes)}\)"):


Suggested change

with pytest.raises(ValueError, match=f"number of models per one iteration \({len(classes)}\)"):

with pytest.raises(ValueError, match=f"number of models per one iteration \\({len(classes)}\\)"):

jameslamb · 2021-12-23T04:34:55Z

tests/python_package_test/test_basic.py

+    X = np.random.randn(100, 5)
+    y_binary = np.random.choice([0, 1], 100)
+    classes = [0, 1, 2]
+    y_multiclass = np.random.choice(classes, 100)


Suggested change

X = np.random.randn(100, 5)

y_binary = np.random.choice([0, 1], 100)

classes = [0, 1, 2]

y_multiclass = np.random.choice(classes, 100)

X = np.random.randn(100, 5)

y_binary = np.array([0] * 50 + [1] * 50)

classes = [0, 1, 2]

y_multiclass = np.random.choice([0] * 33 + [1] * 33 + [2] * 34)

Sorry, just thought of this...can you please remove the randomness from this data construction? Since you're using completely-random data and not testing the produced models, the values don't really matter.

Choosing randomly and using such a small amount of data makes it possible that these tests could fail randomly due to situations like "y_binary is all 0s". It may seem like a small probability, but consider that the Python tests run about 40 times on every commit to every pull request in this project.

yaxxie · 2021-12-23T19:48:35Z

@jameslamb let me know if a946d28 addresses your concerns

jameslamb

looks ok to me, thanks very much for the help!

tests/python_package_test/test_basic.py

shiyu1994 · 2021-12-27T14:48:04Z

include/LightGBM/c_api.h

@@ -572,6 +572,9 @@ LIGHTGBM_C_EXPORT int LGBM_BoosterRefit(BoosterHandle handle,
 /*!
 * \brief Update the model by specifying gradient and Hessian directly
 *        (this can be used to support customized loss functions).
+ * \note
+ * The length of the arrays referenced by ``grad`` and ``hess`` must be equal to
+ * ``num_class * num_train_data``, this is not verified by the library, the caller must ensure this.


Should this be this IS verified by the library, or simply delete this last sentence? Because we are actually verifying this through this pull request.

This is the C-API docs -- the context here is the lightgbm.so library, rather than python or R libraries. What we're saying is that a caller to this function LGBM_BoosterUpdateOneIterCustom is responsible to ensure that the condition is met. I'm happy to tweak the wording, but what the python library does as a convenience to the user is not applicable here.

…s as a fobj function

StrikerRUS

Thanks for fixing custom objective function signature in recent commit.

github-actions · 2023-08-23T14:23:07Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

fix for bad grads causing segfault

04ac04d

yaxxie requested review from chivee, henry0312, hzy46, jameslamb, shiyu1994, StrikerRUS and tongwu-sh as code owners November 19, 2021 18:56

jameslamb added the fix label Nov 19, 2021

StrikerRUS mentioned this pull request Nov 20, 2021

adding support for LGBM_BoosterUpdateOneIterCustom IQVIA-ML/LightGBM.jl#114

Merged

Merge branch 'microsoft:master' into master

ed7a593

adjust checking criteria to properly reflect reality of multi-class c…

850e37c

…lassifiers

yaxxie added 2 commits November 27, 2021 10:37

fix styling

a857308

Line break before operator

a50e990

StrikerRUS requested changes Nov 30, 2021

View reviewed changes

python-package/lightgbm/basic.py Outdated Show resolved Hide resolved

python-package/lightgbm/basic.py Outdated Show resolved Hide resolved

python-package/lightgbm/basic.py Outdated Show resolved Hide resolved

yaxxie and others added 2 commits November 30, 2021 23:48

Update python-package/lightgbm/basic.py

e7585b0

Co-authored-by: Nikita Titov <[email protected]>

Update python-package/lightgbm/basic.py

6e1fe69

Co-authored-by: Nikita Titov <[email protected]>

StrikerRUS added the in progress label Dec 10, 2021

yaxxie and others added 2 commits December 16, 2021 18:03

Update include/LightGBM/c_api.h

7ea3481

Co-authored-by: Nikita Titov <[email protected]>

Merge branch 'microsoft:master' into master

d392cd6

jameslamb requested changes Dec 22, 2021

View reviewed changes

jameslamb changed the title ~~fix for bad grads causing segfault~~ [python] raise an informative error instead of segfaulting when custom objective produces incorrect output Dec 22, 2021

yaxxie added 2 commits December 22, 2021 23:20

PR comments

3162b89

Merge branch 'microsoft:master' into master

c22e5d8

yaxxie mentioned this pull request Dec 22, 2021

[R-package] Check size of custom objective function outputs #4905

Closed

match argument is a regex and our expression has brackets ..

bd8cf2c

jameslamb requested changes Dec 23, 2021

View reviewed changes

yaxxie added 2 commits December 23, 2021 19:47

rework tests

a946d28

Merge branch 'microsoft:master' into master

654ca80

jameslamb approved these changes Dec 23, 2021

View reviewed changes

yaxxie added 2 commits December 23, 2021 19:56

isorting imports

e499241

Merge branch 'microsoft:master' into master

ec3317f

StrikerRUS requested changes Dec 26, 2021

View reviewed changes

tests/python_package_test/test_basic.py Outdated Show resolved Hide resolved

tests/python_package_test/test_basic.py Outdated Show resolved Hide resolved

shiyu1994 reviewed Dec 27, 2021

View reviewed changes

yaxxie added 2 commits December 28, 2021 03:53

updating test to relfect that the python APi does not take pres/label…

2f6b647

…s as a fobj function

Merge branch 'microsoft:master' into master

d567472

StrikerRUS approved these changes Dec 29, 2021

View reviewed changes

shiyu1994 merged commit af5b40e into microsoft:master Dec 30, 2021

StrikerRUS mentioned this pull request Jan 6, 2022

[DO NOT MERGE] Release 3.3.2 #4930

Closed

13 tasks

jmoralez mentioned this pull request Jun 27, 2022

[R-package] raise an informative error when custom objective produces incorrect output (fixes #5323) #5329

Merged

jameslamb mentioned this pull request Oct 7, 2022

[DO NOT MERGE] Release v3.3.3 #5525

Closed

40 tasks

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python] raise an informative error instead of segfaulting when custom objective produces incorrect output #4815

[python] raise an informative error instead of segfaulting when custom objective produces incorrect output #4815

yaxxie commented Nov 19, 2021

StrikerRUS commented Nov 20, 2021

yaxxie commented Nov 20, 2021

StrikerRUS commented Nov 20, 2021

yaxxie commented Nov 20, 2021

StrikerRUS commented Nov 20, 2021

yaxxie commented Nov 23, 2021 •

edited

Loading

shiyu1994 commented Nov 27, 2021

yaxxie commented Nov 28, 2021

StrikerRUS commented Nov 28, 2021

shiyu1994 commented Nov 30, 2021

StrikerRUS left a comment

yaxxie commented Dec 1, 2021

StrikerRUS commented Dec 2, 2021

yaxxie commented Dec 21, 2021

StrikerRUS commented Dec 22, 2021

jameslamb left a comment

jameslamb Dec 22, 2021

jameslamb Dec 22, 2021

jameslamb commented Dec 22, 2021

yaxxie commented Dec 22, 2021

jameslamb left a comment

jameslamb Dec 23, 2021

StrikerRUS Dec 23, 2021

yaxxie Dec 23, 2021

jameslamb Dec 23, 2021

jameslamb Dec 23, 2021

yaxxie commented Dec 23, 2021

jameslamb left a comment

shiyu1994 Dec 27, 2021

yaxxie Dec 27, 2021

StrikerRUS left a comment

github-actions bot commented Aug 23, 2023

	with pytest.raises(ValueError):
	with pytest.raises(ValueError, match="number of models per one iteration (1)"):

	with pytest.raises(ValueError, match=f"number of models per one iteration \({len(classes)}\)"):
	with pytest.raises(ValueError, match=f"number of models per one iteration \\({len(classes)}\\)"):

[python] raise an informative error instead of segfaulting when custom objective produces incorrect output #4815

[python] raise an informative error instead of segfaulting when custom objective produces incorrect output #4815

Conversation

yaxxie commented Nov 19, 2021

StrikerRUS commented Nov 20, 2021

yaxxie commented Nov 20, 2021

StrikerRUS commented Nov 20, 2021

yaxxie commented Nov 20, 2021

StrikerRUS commented Nov 20, 2021

yaxxie commented Nov 23, 2021 • edited Loading

shiyu1994 commented Nov 27, 2021

yaxxie commented Nov 28, 2021

StrikerRUS commented Nov 28, 2021

shiyu1994 commented Nov 30, 2021

StrikerRUS left a comment

Choose a reason for hiding this comment

yaxxie commented Dec 1, 2021

StrikerRUS commented Dec 2, 2021

yaxxie commented Dec 21, 2021

StrikerRUS commented Dec 22, 2021

jameslamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jameslamb commented Dec 22, 2021

yaxxie commented Dec 22, 2021

jameslamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yaxxie commented Dec 23, 2021

jameslamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StrikerRUS left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 23, 2023

yaxxie commented Nov 23, 2021 •

edited

Loading