You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to train on a 34Ggb (result from df.info) dataset over 8 GPUs w/ 396gb of RAM. I can only get away with training on half the dataset currently without OOM errors killing the process. Each GPU ends up loaded with ~10gb of data. Does that mean the actual data size is 160gb (8 GPUs * 10gb * 2 halves to the data).
Any advice on how to train on so much data using. XGBoost Ray would be helpful.
The text was updated successfully, but these errors were encountered:
On such a large-capacity GPU, the amount of data that xgboost_ray can load normally is not even as good as using a single GPU directly with xgboost. xgboost_ray often occurs cuda OOM and fails to take advantage of multiple GPUs.
I am trying to train on a 34Ggb (result from df.info) dataset over 8 GPUs w/ 396gb of RAM. I can only get away with training on half the dataset currently without OOM errors killing the process. Each GPU ends up loaded with ~10gb of data. Does that mean the actual data size is 160gb (8 GPUs * 10gb * 2 halves to the data).
Any advice on how to train on so much data using. XGBoost Ray would be helpful.
The text was updated successfully, but these errors were encountered: