Error when seed exceeds 32-bit integer max #5637

pplonski · 2020-05-06T08:21:59Z

I'm using Xgboost in my AutoML package. I'm currently adding my package to automlbenchmark When running the benchmark test I got very strange error about seed value. The error is below. The minimal code example to reproduce the error is below. When I'm using smaller seed values every thing works as expected.

import xgboost # 1.0.2 version
import numpy as np

X = np.random.rand(100,4)
y = np.random.randint(0,2,100)
train = xgboost.DMatrix(X, label=y)
model = xgboost.train({"seed": 3657279955}, train)

Error message:

---------------------------------------------------------------------------
XGBoostError                              Traceback (most recent call last)
<ipython-input-135-653bcbdd849c> in <module>
----> 1 model = xgboost.train({"seed": 3657279955}, train)

~/Downloads/numerai_datasets/venv/lib/python3.6/site-packages/xgboost/training.py in train(params, dtrain, num_boost_round, evals, obj, feval, maximize, early_stopping_rounds, evals_result, verbose_eval, xgb_model, callbacks)
    207                            evals=evals,
    208                            obj=obj, feval=feval,
--> 209                            xgb_model=xgb_model, callbacks=callbacks)
    210 
    211 

~/Downloads/numerai_datasets/venv/lib/python3.6/site-packages/xgboost/training.py in _train_internal(params, dtrain, num_boost_round, evals, obj, feval, xgb_model, callbacks)
     72         # Skip the first update if it is a recovery step.
     73         if version % 2 == 0:
---> 74             bst.update(dtrain, i, obj)
     75             bst.save_rabit_checkpoint()
     76             version += 1

~/Downloads/numerai_datasets/venv/lib/python3.6/site-packages/xgboost/core.py in update(self, dtrain, iteration, fobj)
   1247             _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
   1248                                                     ctypes.c_int(iteration),
-> 1249                                                     dtrain.handle))
   1250         else:
   1251             pred = self.predict(dtrain, training=True)

~/Downloads/numerai_datasets/venv/lib/python3.6/site-packages/xgboost/core.py in _check_call(ret)
    187     """
    188     if ret != 0:
--> 189         raise XGBoostError(py_str(_LIB.XGBGetLastError()))
    190 
    191 

XGBoostError: Invalid Parameter format for seed expect int but value='3657279955'

The text was updated successfully, but these errors were encountered:

hcho3 · 2020-05-06T08:35:49Z

Seed should fit into signed 32-bit integer value, which can be at most 2147483647.

hcho3 · 2020-05-06T08:45:54Z

@pplonski Do you have a use case that specifically calls for a seed value exceeding 2147483647?

pplonski · 2020-05-06T08:48:47Z

In the automlbenchmark there are seed values larger than 2147483647.

And python3 int is 64 bits.

hcho3 · 2020-05-06T08:55:17Z

@pplonski Can you elaborate more on the benefit of 64-bit seeds? I notice that one of the packages used in automlbenchmark is using 16-bit seed: https://github.com/openml/automlbenchmark/blob/2f3bb4a6637ea8875abbf7c06f8df649b7f5e2b0/frameworks/AutoWEKA/exec.py#L60.

I'd like to keep 32-bit seeds if possible, since lots of XGBoost model files in the wild use 32-bit seed in them, and I'd like to keep them compatible with latest XGBoost (*).

(*) The compatibility issue is because the binary model format assumes a fixed layout, and changing the seed field to 64-bit will change the layout. The issue is avoided if the model is saved with JSON (supported in 1.0.0+), but there are still many old models lurking around.

pplonski · 2020-05-06T09:01:29Z

If the package is accepting int as a seed then there can be situations that will throw such error. I think for consistency reason it will be nice to fix it or at least add to the docs that seed should be 32-bits.

The automlbenchmark package is using 64-bit seed. The Auto-WEKA is one of the packages that it is testing.

hcho3 · 2020-05-06T09:02:41Z

For now I'll add a check on the Python side so that error message will be clearer.

trivialfis · 2020-05-06T09:22:39Z

Seed is not saved.

hcho3 · 2020-05-06T09:26:18Z

@trivialfis Oops, my bad. I forgot that generic parameter is not part of the legacy binary format. We could potentially upgrade to 64-bit seed then?

hcho3 · 2020-05-06T09:31:02Z

@pplonski #5638 adds the error message.

trivialfis · 2020-05-06T09:44:07Z

@hcho3

We could potentially upgrade to 64-bit seed then?

I think so. ;-)

pplonski mentioned this issue May 6, 2020

add mljar-supervised framework openml/automlbenchmark#105

Merged

hcho3 closed this as completed May 6, 2020

hcho3 reopened this May 6, 2020

hcho3 changed the title ~~Error when seed value is too large~~ Use 64-bit seed? May 6, 2020

hcho3 changed the title ~~Use 64-bit seed?~~ Error when seed exceeds 32-int integer max May 6, 2020

hcho3 changed the title ~~Error when seed exceeds 32-int integer max~~ Error when seed exceeds 32-bit integer max May 6, 2020

trivialfis added the type: bug label May 6, 2020

trivialfis mentioned this issue May 7, 2020

Support 64bit seed. #5643

Merged

trivialfis closed this as completed in #5643 May 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when seed exceeds 32-bit integer max #5637

Error when seed exceeds 32-bit integer max #5637

pplonski commented May 6, 2020

hcho3 commented May 6, 2020

hcho3 commented May 6, 2020

pplonski commented May 6, 2020 •

edited

Loading

hcho3 commented May 6, 2020 •

edited

Loading

pplonski commented May 6, 2020

hcho3 commented May 6, 2020

trivialfis commented May 6, 2020

hcho3 commented May 6, 2020

hcho3 commented May 6, 2020

trivialfis commented May 6, 2020

Error when seed exceeds 32-bit integer max #5637

Error when seed exceeds 32-bit integer max #5637

Comments

pplonski commented May 6, 2020

hcho3 commented May 6, 2020

hcho3 commented May 6, 2020

pplonski commented May 6, 2020 • edited Loading

hcho3 commented May 6, 2020 • edited Loading

pplonski commented May 6, 2020

hcho3 commented May 6, 2020

trivialfis commented May 6, 2020

hcho3 commented May 6, 2020

hcho3 commented May 6, 2020

trivialfis commented May 6, 2020

pplonski commented May 6, 2020 •

edited

Loading

hcho3 commented May 6, 2020 •

edited

Loading