Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training fails when using external memory version for large datasets with instance weights #5866

Closed
prvnsmpth opened this issue Jul 7, 2020 · 2 comments · Fixed by #5870
Closed

Comments

@prvnsmpth
Copy link

I am attempting to train an XGBoost model using a large dataset that I cannot completely load in memory. So I decided to use the external memory feature of XGBoost training, like so:

dtrain = xgb.DMatrix(f"data/train.libsvm#train.cache", feature_names=feature_names)

Now I also need to be able to specify instance weights, so I tried to do that by specifying the weights in a separate train.libsvm.weight file:

$ head -5 data/train.libsvm.weight
5.28486226776928e-7
5.28486226776928e-7
5.28486226776928e-7
5.28486226776928e-7
5.28486226776928e-7

However, training fails with the following error:

[22:15:43] 11252555x174 matrix with 44859377 entries loaded from data/train.libsvm#train.cache
[22:15:46] 11252555 weights are loaded from data/train.libsvm.weight
Traceback (most recent call last):
  File "train.py", line 104, in <module>
    train(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4])
  File "train.py", line 66, in train
    model = xgb.train(params, dtrain, num_rounds, watchlist)
  File "/home/praveen/auto-test-web/auto-test-web/src/ml/venv/lib/python3.8/site-packages/xgboost/training.py", line 208, in train
    return _train_internal(params, dtrain,
  File "/home/praveen/auto-test-web/auto-test-web/src/ml/venv/lib/python3.8/site-packages/xgboost/training.py", line 75, in _train_internal
    bst.update(dtrain, i, obj)
  File "/home/praveen/auto-test-web/auto-test-web/src/ml/venv/lib/python3.8/site-packages/xgboost/core.py", line 1367, in update
    _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
  File "/home/praveen/auto-test-web/auto-test-web/src/ml/venv/lib/python3.8/site-packages/xgboost/core.py", line 190, in _check_call
    raise XGBoostError(py_str(_LIB.XGBGetLastError()))
xgboost.core.XGBoostError: [22:15:46] /workspace/src/tree/updater_gpu_hist.cu:952: Exception in gpu_hist: [22:15:46] /workspace/src/common/hist_util.cu:287: Check failed: weights.size() == page.offset.Size() - 1 (11252555 vs. 921785

So from the error message, it appears the weights file is recognized and weights for all 1125255 instances are loaded. However, since we are training in batches (using the external memory feature), we only load 921785 instances and the requirement that the size of the weights vector should equal that of the training dataset doesn't hold.

I have also tried specifying weights in the LibSVM input file directly - by replacing the label entry with label:weight, but I get the exact same error.

I'm using XGBoost version 1.1.0.

@trivialfis
Copy link
Member

@rongou Could you please help taking a look?

@rongou
Copy link
Contributor

rongou commented Jul 7, 2020

Yeah looks like batched sketching with weights is not supported. Shouldn't be too hard to fix. I'll send out a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants