Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when seed exceeds 32-bit integer max #5637

Closed
pplonski opened this issue May 6, 2020 · 10 comments · Fixed by #5643
Closed

Error when seed exceeds 32-bit integer max #5637

pplonski opened this issue May 6, 2020 · 10 comments · Fixed by #5643

Comments

@pplonski
Copy link

pplonski commented May 6, 2020

I'm using Xgboost in my AutoML package. I'm currently adding my package to automlbenchmark When running the benchmark test I got very strange error about seed value. The error is below. The minimal code example to reproduce the error is below. When I'm using smaller seed values every thing works as expected.

import xgboost # 1.0.2 version
import numpy as np

X = np.random.rand(100,4)
y = np.random.randint(0,2,100)
train = xgboost.DMatrix(X, label=y)
model = xgboost.train({"seed": 3657279955}, train)

Error message:

---------------------------------------------------------------------------
XGBoostError                              Traceback (most recent call last)
<ipython-input-135-653bcbdd849c> in <module>
----> 1 model = xgboost.train({"seed": 3657279955}, train)

~/Downloads/numerai_datasets/venv/lib/python3.6/site-packages/xgboost/training.py in train(params, dtrain, num_boost_round, evals, obj, feval, maximize, early_stopping_rounds, evals_result, verbose_eval, xgb_model, callbacks)
    207                            evals=evals,
    208                            obj=obj, feval=feval,
--> 209                            xgb_model=xgb_model, callbacks=callbacks)
    210 
    211 

~/Downloads/numerai_datasets/venv/lib/python3.6/site-packages/xgboost/training.py in _train_internal(params, dtrain, num_boost_round, evals, obj, feval, xgb_model, callbacks)
     72         # Skip the first update if it is a recovery step.
     73         if version % 2 == 0:
---> 74             bst.update(dtrain, i, obj)
     75             bst.save_rabit_checkpoint()
     76             version += 1

~/Downloads/numerai_datasets/venv/lib/python3.6/site-packages/xgboost/core.py in update(self, dtrain, iteration, fobj)
   1247             _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
   1248                                                     ctypes.c_int(iteration),
-> 1249                                                     dtrain.handle))
   1250         else:
   1251             pred = self.predict(dtrain, training=True)

~/Downloads/numerai_datasets/venv/lib/python3.6/site-packages/xgboost/core.py in _check_call(ret)
    187     """
    188     if ret != 0:
--> 189         raise XGBoostError(py_str(_LIB.XGBGetLastError()))
    190 
    191 

XGBoostError: Invalid Parameter format for seed expect int but value='3657279955'
@hcho3
Copy link
Collaborator

hcho3 commented May 6, 2020

Seed should fit into signed 32-bit integer value, which can be at most 2147483647.

@hcho3 hcho3 closed this as completed May 6, 2020
@hcho3
Copy link
Collaborator

hcho3 commented May 6, 2020

@pplonski Do you have a use case that specifically calls for a seed value exceeding 2147483647?

@pplonski
Copy link
Author

pplonski commented May 6, 2020

In the automlbenchmark there are seed values larger than 2147483647.

And python3 int is 64 bits.

@hcho3
Copy link
Collaborator

hcho3 commented May 6, 2020

@pplonski Can you elaborate more on the benefit of 64-bit seeds? I notice that one of the packages used in automlbenchmark is using 16-bit seed: https://github.com/openml/automlbenchmark/blob/2f3bb4a6637ea8875abbf7c06f8df649b7f5e2b0/frameworks/AutoWEKA/exec.py#L60.

I'd like to keep 32-bit seeds if possible, since lots of XGBoost model files in the wild use 32-bit seed in them, and I'd like to keep them compatible with latest XGBoost (*).

(*) The compatibility issue is because the binary model format assumes a fixed layout, and changing the seed field to 64-bit will change the layout. The issue is avoided if the model is saved with JSON (supported in 1.0.0+), but there are still many old models lurking around.

@hcho3 hcho3 reopened this May 6, 2020
@hcho3 hcho3 changed the title Error when seed value is too large Use 64-bit seed? May 6, 2020
@pplonski
Copy link
Author

pplonski commented May 6, 2020

If the package is accepting int as a seed then there can be situations that will throw such error. I think for consistency reason it will be nice to fix it or at least add to the docs that seed should be 32-bits.

The automlbenchmark package is using 64-bit seed. The Auto-WEKA is one of the packages that it is testing.

@hcho3
Copy link
Collaborator

hcho3 commented May 6, 2020

For now I'll add a check on the Python side so that error message will be clearer.

@hcho3 hcho3 changed the title Use 64-bit seed? Error when seed exceeds 32-int integer max May 6, 2020
@trivialfis
Copy link
Member

Seed is not saved.

@hcho3
Copy link
Collaborator

hcho3 commented May 6, 2020

@trivialfis Oops, my bad. I forgot that generic parameter is not part of the legacy binary format. We could potentially upgrade to 64-bit seed then?

@hcho3
Copy link
Collaborator

hcho3 commented May 6, 2020

@pplonski #5638 adds the error message.

@hcho3 hcho3 changed the title Error when seed exceeds 32-int integer max Error when seed exceeds 32-bit integer max May 6, 2020
@trivialfis
Copy link
Member

@hcho3

We could potentially upgrade to 64-bit seed then?

I think so. ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants