Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Xgboost training run out of memory in some cases, can we add memory threshold config to prevent OOM ? #9372

Closed
WeichenXu123 opened this issue Jul 8, 2023 · 1 comment · Fixed by #9440

Comments

@WeichenXu123
Copy link
Contributor

WeichenXu123 commented Jul 8, 2023

One of our customer is facing a issue of Xgboost training run out of memory :

The customer trains xgboost model in distributed way, and when setting xgboost "max_depth" to a high value, the training routine easily runs out of memory.

The issue is, they found, for some of their dataset, using a specific "max_depth" value it works fine, but for some other dataset, using the same "max_depth" but OOM occurs,

we hope to set a larger "max_depth" in most cases for better model accuracy, but, we also need to ensure preventing OOM happening, then this is a pain point,

Can we add a param like "xgboost_train_worker_cpu_memory_usage_threshold" and "xgboost_train_worker_GPU_memory_usage_threshold", and xgboost training worker tracks its memory usage, when it finds it exceeds the threshold, then it stop increasing the model depth and finalize the model and then stop training ?

Related ticket: #9342

@WeichenXu123 WeichenXu123 changed the title [FR] Xgboost training run out of memory [FR] Xgboost training run out of memory in some cases, can we add memory threshold config to prevent OOM ? Jul 8, 2023
@trivialfis
Copy link
Member

trivialfis commented Jul 8, 2023

Same issue as #9342 I'm looking into it, might take some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants