Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression demo is broken #5709

Closed
hcho3 opened this issue May 26, 2020 · 5 comments · Fixed by #5955
Closed

Regression demo is broken #5709

hcho3 opened this issue May 26, 2020 · 5 comments · Fixed by #5955

Comments

@hcho3
Copy link
Collaborator

hcho3 commented May 26, 2020

The demo demo/regression is not working.

$ python mapfeat.py
$ python mknfold.py machine.txt 1
$ ../../xgboost machine.conf
[19:27:05] 175x36 matrix with 1225 entries loaded from machine.txt.train
[19:27:05] 34x34 matrix with 238 entries loaded from machine.txt.test
terminate called after throwing an instance of 'dmlc::Error'
  what():  [19:27:05] /home/phcho/Desktop/xgboost/src/learner.cc:1062: Check failed: 
  learner_model_param_.num_feature == p_fmat->Info().num_col_ (36 vs. 34) : 
  Number of columns does not match number of features in booster.
Stack trace:
  [bt] (0) ../../xgboost(+0x2b8bc) [0x561330f978bc]
  [bt] (1) ../../xgboost(+0x133546) [0x56133109f546]
  [bt] (2) ../../xgboost(+0x138f3a) [0x5613310a4f3a]
  [bt] (3) ../../xgboost(+0x253aa) [0x561330f913aa]
  [bt] (4) ../../xgboost(+0x27e58) [0x561330f93e58]
  [bt] (5) /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f0ee3a94b97]
  [bt] (6) ../../xgboost(+0x238ba) [0x561330f8f8ba]


Aborted (core dumped)
@hcho3
Copy link
Collaborator Author

hcho3 commented May 26, 2020

When loading from LIBSVM, we should be able to append empty columns to run predictions, because the test data may not necessarily have all the features. This is because the LIBSVM format does not specify the number of columns in the data.

@trivialfis
Copy link
Member

trivialfis commented May 26, 2020

I prefer leaving it to data preprocessing.

@hcho3
Copy link
Collaborator Author

hcho3 commented May 26, 2020

I prefer leaving it to data preprocessing.

Not sure what you mean by this. The demo uses a built-in LIBSVM parser, so there is no data preprocessing step.

@hcho3
Copy link
Collaborator Author

hcho3 commented May 26, 2020

More details: in the test set machine.txt.test, the features 34 and 35 are all missing. So the largest feature ID in machine.txt.test is 33, and XGBoost infers the dimension of the test set to be 34x34.

My hope is that XGBoost should handle this scenario gracefully, i.e. test matrices whose last columns are empty. Rather than throwing error, XGBoost should treat all feature values as missing for features 34 and 35. I believe that's how XGBoost used to behave, since the regression demo was around since XGBoost version 0.60.

@hcho3
Copy link
Collaborator Author

hcho3 commented May 28, 2020

Idea: Use training argument in XGBoosterPredict() so that we can accept LIBSVM test data with fewer number of columns than training data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants