Regression demo is broken #5709

hcho3 · 2020-05-26T02:27:38Z

The demo demo/regression is not working.

$ python mapfeat.py
$ python mknfold.py machine.txt 1
$ ../../xgboost machine.conf
[19:27:05] 175x36 matrix with 1225 entries loaded from machine.txt.train
[19:27:05] 34x34 matrix with 238 entries loaded from machine.txt.test
terminate called after throwing an instance of 'dmlc::Error'
  what():  [19:27:05] /home/phcho/Desktop/xgboost/src/learner.cc:1062: Check failed: 
  learner_model_param_.num_feature == p_fmat->Info().num_col_ (36 vs. 34) : 
  Number of columns does not match number of features in booster.
Stack trace:
  [bt] (0) ../../xgboost(+0x2b8bc) [0x561330f978bc]
  [bt] (1) ../../xgboost(+0x133546) [0x56133109f546]
  [bt] (2) ../../xgboost(+0x138f3a) [0x5613310a4f3a]
  [bt] (3) ../../xgboost(+0x253aa) [0x561330f913aa]
  [bt] (4) ../../xgboost(+0x27e58) [0x561330f93e58]
  [bt] (5) /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f0ee3a94b97]
  [bt] (6) ../../xgboost(+0x238ba) [0x561330f8f8ba]


Aborted (core dumped)

The text was updated successfully, but these errors were encountered:

hcho3 · 2020-05-26T02:28:49Z

When loading from LIBSVM, we should be able to append empty columns to run predictions, because the test data may not necessarily have all the features. This is because the LIBSVM format does not specify the number of columns in the data.

trivialfis · 2020-05-26T09:19:28Z

I prefer leaving it to data preprocessing.

hcho3 · 2020-05-26T09:38:10Z

I prefer leaving it to data preprocessing.

Not sure what you mean by this. The demo uses a built-in LIBSVM parser, so there is no data preprocessing step.

hcho3 · 2020-05-26T09:46:38Z

More details: in the test set machine.txt.test, the features 34 and 35 are all missing. So the largest feature ID in machine.txt.test is 33, and XGBoost infers the dimension of the test set to be 34x34.

My hope is that XGBoost should handle this scenario gracefully, i.e. test matrices whose last columns are empty. Rather than throwing error, XGBoost should treat all feature values as missing for features 34 and 35. I believe that's how XGBoost used to behave, since the regression demo was around since XGBoost version 0.60.

hcho3 · 2020-05-28T10:28:24Z

Idea: Use training argument in XGBoosterPredict() so that we can accept LIBSVM test data with fewer number of columns than training data.

trivialfis added the type: bug label May 28, 2020

trivialfis mentioned this issue Jul 5, 2020

Add XGBoosterGetNumFeature #5856

Merged

trivialfis mentioned this issue Jul 29, 2020

Fix prediction heuristic #5955

Merged

trivialfis closed this as completed in #5955 Jul 29, 2020

This was referenced Sep 8, 2020

Release xgboost 1.2 with GPU support aws/sagemaker-xgboost-container#133

Closed

Release xgboost 1.2 with GPU support aws/sagemaker-xgboost-container#134

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression demo is broken #5709

Regression demo is broken #5709

hcho3 commented May 26, 2020

hcho3 commented May 26, 2020

trivialfis commented May 26, 2020 •

edited

Loading

hcho3 commented May 26, 2020

hcho3 commented May 26, 2020

hcho3 commented May 28, 2020

Regression demo is broken #5709

Regression demo is broken #5709

Comments

hcho3 commented May 26, 2020

hcho3 commented May 26, 2020

trivialfis commented May 26, 2020 • edited Loading

hcho3 commented May 26, 2020

hcho3 commented May 26, 2020

hcho3 commented May 28, 2020

trivialfis commented May 26, 2020 •

edited

Loading