Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LightGBM vs XGBoost accuracy/speed #3417

Closed
RAMitchell opened this issue Jun 28, 2018 · 12 comments
Closed

LightGBM vs XGBoost accuracy/speed #3417

RAMitchell opened this issue Jun 28, 2018 · 12 comments

Comments

@RAMitchell
Copy link
Member

@kaz-Anova recently pointed out that XGBoost is falling behind LightGBM in accuracy on recent Kaggle competitions.
image

This is referring to the "tree_method":"hist" algorithm. If anyone has time it would be nice to figure out the root cause of this.

Speed is also a priority but I think less so than accuracy.

cc @hcho3

@hcho3
Copy link
Collaborator

hcho3 commented Jun 28, 2018

@kaz-Anova @RAMitchell Is the issue specific with CPU-hist or do you see the same issue with GPU-hist as well?

@RAMitchell
Copy link
Member Author

This is re. cpu version. Not sure about gpu version yet. In my recent experiments gpu hist is commonly outperforming cpu hist by a very small amount.

@hcho3
Copy link
Collaborator

hcho3 commented Jun 28, 2018

@RAMitchell I am willing to investigate it if there is a re-producible example that demonstrates the lower model accuracy. At any rate, I wrote the CPU-hist code when I was pretty new to XGBoost, so I'd like to come back to it and make it better. (One of the glaring short-coming is that it doesn't support distributed training yet.)

@CodingCat
Copy link
Member

@hcho3 is there any recent plan to support distributed training?

@hcho3
Copy link
Collaborator

hcho3 commented Jun 28, 2018

@CodingCat Not any, as far as I am aware of. I had some people inquire me about a distributed hist updater. How important do you think it is? There are some commonalities between 'approx' and 'hist', one of which being that quantiles are used as split candidates. The major difference is that for 'hist', you start by quantizing the data matrix, enabling some optimizations.

EDIT. If distributed 'hist' is deemed to be important, I can bring it up to my manager to carve out time to have it implemented.
EDIT2. For this summer, I am mentoring an intern who is like to improve distributed training in XGBoost.

@CodingCat
Copy link
Member

regarding the importance of hist, I would say in many companies like my current employer, distributed training is the major use case and having a faster algorithm is definitely helpful for these users

@RAMitchell
Copy link
Member Author

RAMitchell commented Jun 28, 2018

@hcho3 here are a few data points from my recent experiments (https://github.com/RAMitchell/GBM-Benchmarks)
comparison
I think your hist algorithm is extensively used so it would be extremely high value to get some development time on it by you or others.

@hcho3
Copy link
Collaborator

hcho3 commented Jun 29, 2018

@RAMitchell Thanks for posting the benchmarks. I will take a look at it. As for the hist algorithm, yes, I'll try to get dev time with either myself or someone else.

@CodingCat Can we arrange for an in-person meeting within next two weeks? (I am currently in Seattle.) I'd like to hear more about your thoughts with regard to future priorities for XGBoost development. The intern I am mentoring would like to meet you as well. If you are available for a meeting, please e-mail me at chohyu01 (at) cs.washington.edu.

@jq
Copy link

jq commented Jul 2, 2018

+1 for this feature @hcho3

@Laurae2
Copy link
Contributor

Laurae2 commented Jul 15, 2018

@hcho3 Also to take into account, xgboost CPU histogram is slow mainly because it uses 64 threads (32 physical cores). 1 thread is sometimes faster in my benchmarks for very similar parameters, even with the frequency advantage.

Here is an example for 500 iterations on Bosch (depth 6) using an i7-7700K, 4.5 GHz with more data (1 million); vs the reported 810 seconds with 64 threads (60% data of approx 1.2 million):

Threads Time
1 340s
2 202s
3 167s
4 162s
5 166s
6 171s
7 172s
8 176s

@RAMitchell
Copy link
Member Author

RAMitchell commented Jul 15, 2018

This is still a problem because the default uses all CPU threads. We could internally limit the number of threads used by the hist algorithm as a quick fix but it would be nicer to get to the root of the problem.

@hcho3
Copy link
Collaborator

hcho3 commented Dec 16, 2020

Closing this, since XGBoost has progress substantially in terms of performance: #3810, szilard/GBM-perf#41. As for accuracy, there are several factors involved:

  • Whether to use depthwise or lossguide in growing trees. LightGBM only offers lossguide equivalent, whereas XGBoost offers both.
  • Whether to directly encode categorical data or require one-hot encoding. XGBoost currently requires one-hot encoding, whereas LightGBM allows direct splits of categorical features. However, see Categorical data support. #6503

Also, XGBoost has gained back the mind share. See this Twitter poll. XGBoost has the state-of-the-art performance on GPU, and it's got cutting-edge integration with Dask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants