Training fails when using external memory version for large datasets with instance weights #5866

prvnsmpth · 2020-07-07T17:00:37Z

I am attempting to train an XGBoost model using a large dataset that I cannot completely load in memory. So I decided to use the external memory feature of XGBoost training, like so:

dtrain = xgb.DMatrix(f"data/train.libsvm#train.cache", feature_names=feature_names)

Now I also need to be able to specify instance weights, so I tried to do that by specifying the weights in a separate train.libsvm.weight file:

$ head -5 data/train.libsvm.weight
5.28486226776928e-7
5.28486226776928e-7
5.28486226776928e-7
5.28486226776928e-7
5.28486226776928e-7

However, training fails with the following error:

[22:15:43] 11252555x174 matrix with 44859377 entries loaded from data/train.libsvm#train.cache
[22:15:46] 11252555 weights are loaded from data/train.libsvm.weight
Traceback (most recent call last):
  File "train.py", line 104, in <module>
    train(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4])
  File "train.py", line 66, in train
    model = xgb.train(params, dtrain, num_rounds, watchlist)
  File "/home/praveen/auto-test-web/auto-test-web/src/ml/venv/lib/python3.8/site-packages/xgboost/training.py", line 208, in train
    return _train_internal(params, dtrain,
  File "/home/praveen/auto-test-web/auto-test-web/src/ml/venv/lib/python3.8/site-packages/xgboost/training.py", line 75, in _train_internal
    bst.update(dtrain, i, obj)
  File "/home/praveen/auto-test-web/auto-test-web/src/ml/venv/lib/python3.8/site-packages/xgboost/core.py", line 1367, in update
    _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
  File "/home/praveen/auto-test-web/auto-test-web/src/ml/venv/lib/python3.8/site-packages/xgboost/core.py", line 190, in _check_call
    raise XGBoostError(py_str(_LIB.XGBGetLastError()))
xgboost.core.XGBoostError: [22:15:46] /workspace/src/tree/updater_gpu_hist.cu:952: Exception in gpu_hist: [22:15:46] /workspace/src/common/hist_util.cu:287: Check failed: weights.size() == page.offset.Size() - 1 (11252555 vs. 921785

So from the error message, it appears the weights file is recognized and weights for all 1125255 instances are loaded. However, since we are training in batches (using the external memory feature), we only load 921785 instances and the requirement that the size of the weights vector should equal that of the training dataset doesn't hold.

I have also tried specifying weights in the LibSVM input file directly - by replacing the label entry with label:weight, but I get the exact same error.

I'm using XGBoost version 1.1.0.

The text was updated successfully, but these errors were encountered:

trivialfis · 2020-07-07T17:58:29Z

@rongou Could you please help taking a look?

rongou · 2020-07-07T20:05:44Z

Yeah looks like batched sketching with weights is not supported. Shouldn't be too hard to fix. I'll send out a PR.

rongou mentioned this issue Jul 7, 2020

fix device sketch with weights in external memory mode #5870

Merged

trivialfis closed this as completed in #5870 Jul 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training fails when using external memory version for large datasets with instance weights #5866

Training fails when using external memory version for large datasets with instance weights #5866

prvnsmpth commented Jul 7, 2020

trivialfis commented Jul 7, 2020

rongou commented Jul 7, 2020

Training fails when using external memory version for large datasets with instance weights #5866

Training fails when using external memory version for large datasets with instance weights #5866

Comments

prvnsmpth commented Jul 7, 2020

trivialfis commented Jul 7, 2020

rongou commented Jul 7, 2020