Distributed hyper parameter optimization for dask. #6525

trivialfis · 2020-12-17T13:29:44Z

Right now, integration between dask and various single node machine learning libraries are implemented as standalone dask extensions like dask-ml and dask-optuna. These can be used with xgboost when xgboost is performing single node training. That's using XGBRegressor and friends with them, instead of using xgboost.dask.DaskXGBRegressor. If users want to train the entire dataset on 1 model, the dask interface is required. The underlying issue is xgboost by itself is a distributed learning library employing a MPI like communication framework, but those extensions are designed to extend single node libraries. To resolve it, we need to design python wrappers that can glue them together.

Optuna is an exception as it's using callback function in xgboost, so the xgboost.dask interface can be adopted to optuna. I will submit some changes with demos later. Others like grid searching are more difficult to implement.

Related: #5347

cc @pseudotensor @sandys

The text was updated successfully, but these errors were encountered:

sandys · 2020-12-17T13:38:09Z

Thank you for taking this forward. We are also seriously looking at Ray Distributed - because of its native Integration with kubernetes ( https://docs.ray.io/en/master/cluster/kubernetes.html ) and its success at Ant Financial in China ( https://docs.ray.io/en/master/cluster/kubernetes.html ) My thought is that you won't be able to make it generic enough to fit all frameworks. And probably it's an overkill of engineering to do so. The one thing I would strongly request you to align towards is kubernetes compatibility. Because it's the management framework that has massive amount of support. It is for this reason that Torch chose to write a thin wrapper over kubernetes for its distributed training library - https://pytorch.org/elastic/0.2.1/index.html https://aws.amazon.com/blogs/containers/fault-tolerant-distributed-machine-learning-training-with-the-torchelastic-controller-for-kubernetes/ So I would suggest when you look at solving this problem from a kubernetes first standpoint. Compatibility with every framework is going to take up more of your time than you might be willing to spend. Sincerely Sandeep

…

On Thu, 17 Dec, 2020, 19:00 Jiaming Yuan, ***@***.***> wrote: Right now, integration between dask and various single node machine learning libraries are implemented as standalone dask extensions like dask-ml and dask-optuna. These can be used with xgboost when xgboost is performing single node training. That's using XGBRegressor and friends with them, instead of using xgboost.dask.DaskXGBRegressor. If users want to train the entire dataset on 1 model, the dask interface is required. The underlying issue is xgboost by itself is a distributed learning library employing a MPI like communication framework, but those extensions are designed to extend single node libraries. To resolve it, we need to design python wrappers that can glue them together. Optuna is an exception as it's using callback function in xgboost, so the xgboost.dask interface can be adopted to optuna. I will submit some changes with demos later. Others like grid searching are more difficult to implement. Related: #5347 <#5347> cc @pseudotensor <https://github.com/pseudotensor> @sandys <https://github.com/sandys> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#6525>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAASYU3WUDQSFYIEA2IBT6DSVIBVRANCNFSM4U7VFG7Q> .

trivialfis · 2020-12-17T13:39:28Z

@sandys k8s support is in 1.3.

trivialfis · 2020-12-17T13:40:35Z

@sandys Use https://kubernetes.dask.org/en/latest/ to create the cluster, and train xgboost dask as usual.

sandys · 2020-12-17T13:54:02Z

Yes. Although dask has some independent issues with kubernetes - on the scaling side. Ray uses kubernetes scheduler. Dask brings a centralised scheduler of its own. In any case, I would strongly suggest Xgboost pick one framework and play well with kubernetes. Rather than expose bindings for many many frameworks to work with Xgboost. Just my 0.02$

…

On Thu, 17 Dec, 2020, 19:10 Jiaming Yuan, ***@***.***> wrote: @sandys <https://github.com/sandys> Use https://kubernetes.dask.org/en/latest/ to create the cluster, and train xgboost dask as usual. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#6525 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAASYU4F5QKQYZ32IYSACHDSVIC6JANCNFSM4U7VFG7Q> .

trivialfis · 2020-12-17T14:04:23Z

@sandys Thanks for your feedback. k8s is definitely important to us.

CarterFendley · 2023-04-13T00:05:03Z

Hi @trivialfis thanks for your work on this. Just reading these threads now.

I see in the other issue you recommend using sklearn optimizers for out-of-core datasets and dask_ml optimizers for datasets which can fit in memory. I am wondering if you can expand / explain more on why xgboost.dask is incompatible with dask_ml at the moment.

What would it take to, in your words, "design python wrappers that can glue them together."?

trivialfis · 2023-04-13T19:04:58Z

Hi @CarterFendley , there is ongoing work on HPO, could you please take a look at https://github.com/coiled/dask-xgboost-nyctaxi and see if it helps?

This was referenced Dec 17, 2020

Migrating from dask-ml to xgboost.dask #5347

Closed

Grid searching with dask interface. #5567

Closed

trivialfis added the feature-request label Dec 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed hyper parameter optimization for dask. #6525

Distributed hyper parameter optimization for dask. #6525

trivialfis commented Dec 17, 2020

sandys commented Dec 17, 2020 via email

trivialfis commented Dec 17, 2020

trivialfis commented Dec 17, 2020

sandys commented Dec 17, 2020 via email

trivialfis commented Dec 17, 2020

CarterFendley commented Apr 13, 2023

trivialfis commented Apr 13, 2023 •

edited

Loading

Distributed hyper parameter optimization for dask. #6525

Distributed hyper parameter optimization for dask. #6525

Comments

trivialfis commented Dec 17, 2020

sandys commented Dec 17, 2020 via email

trivialfis commented Dec 17, 2020

trivialfis commented Dec 17, 2020

sandys commented Dec 17, 2020 via email

trivialfis commented Dec 17, 2020

CarterFendley commented Apr 13, 2023

trivialfis commented Apr 13, 2023 • edited Loading

trivialfis commented Apr 13, 2023 •

edited

Loading