-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUESTION] GPU memory efficiency #6327
Comments
On dask, you can try the |
Preferably with nightly build. |
Feel free to close if |
Thanks will try. Is there some specific thing spark contributors did that allowed 5X memory improvement that dask has not yet done? |
No. I implemented DDQDM based on quantile sketching algorithm recently. The post you linked is old. |
Sorry, I just mean, what is the 5X GPU memory improvement they are referring to? |
Also, with the scikit-learn API is the same option possible? Also, maybe good idea to allow scikit-learn API to accept dmatrix as X if not already possible. |
I think it meant comparing converting GPU dataframe to XGBoost DMatrix directly, and their old approach of saving memory.
Right now no.
Thanks for the suggestion, that's a possible option. Or maybe we can dispatch based on tree method and use |
Ya, if there was another parameter in constructor for scikit-learn API or xgboost parameters to choose that option, that would work and be aligned with how have to choose gpu_hist vs. hist, default of gpu_predictor instead of cpu_predictor (AFAIK with rapids/cudf can't switch to cpu_predictor), gpu_id = 0 as default, etc. So as parameter or as default sounds reasonable. |
When is 1.3.0 release planned? I couldn't find out the plan, only old roadmaps. The notes on releases says the plan of when to release is made once prior release is out. So I suppose there is a plan for 1.3.0? It seems to have good fixes and features for dask. |
@pseudotensor Here is the roadmap for 1.3.0: #6031. We will make the release once all the blocking issues are addressed. |
Closing as the integer overflow issue is resolved and now the |
https://news.developer.nvidia.com/gpu-accelerated-spark-xgboost/
mentions:
In this something only in xgboost4j? Or is it also in dmlc xgboost?
I'm asking because in playing around with multi-GPU using dask, the memory use is quite high. 37M rows by 20 features runs out of GPU memory on 2 11GB GPUs. If there was really 5X to gain, that would be incredible. I don't see any such significant changes in GPU memory usage since the first GPU implementations by @RAMitchell . @teju85
The text was updated successfully, but these errors were encountered: