XGBoost_Ray Train High Memory use #308

chadbreece · 2024-03-28T00:13:12Z

I am trying to train on a 34Ggb (result from df.info) dataset over 8 GPUs w/ 396gb of RAM. I can only get away with training on half the dataset currently without OOM errors killing the process. Each GPU ends up loaded with ~10gb of data. Does that mean the actual data size is 160gb (8 GPUs * 10gb * 2 halves to the data).

Any advice on how to train on so much data using. XGBoost Ray would be helpful.

showkeyjar · 2024-05-21T07:31:11Z

same problem.

On such a large-capacity GPU, the amount of data that xgboost_ray can load normally is not even as good as using a single GPU directly with xgboost. xgboost_ray often occurs cuda OOM and fails to take advantage of multiple GPUs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XGBoost_Ray Train High Memory use #308

XGBoost_Ray Train High Memory use #308

chadbreece commented Mar 28, 2024

showkeyjar commented May 21, 2024

XGBoost_Ray Train High Memory use #308

XGBoost_Ray Train High Memory use #308

Comments

chadbreece commented Mar 28, 2024

showkeyjar commented May 21, 2024