-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ray on spark implementation #28771
Ray on spark implementation #28771
Conversation
Signed-off-by: Weichen Xu <[email protected]>
CC @jjyao Ready for first pass reviewing :) |
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
python/ray/spark/__init__.py
Outdated
num_spark_tasks, | ||
head_options=None, | ||
worker_options=None, | ||
ray_temp_dir="/tmp/ray/temp", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide a param to control the ray_temp_dir
path, this is useful when the default temp dir disk capacity is not sufficient.
python/ray/spark/__init__.py
Outdated
context.barrier() | ||
task_id = context.partitionId() | ||
|
||
# TODO: remove temp dir when ray worker exits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is something we could offer except as best effort, which suggests it isn't the right mechanism for cleanup. Instead, why not set the temp dir to a known location and remove that in a wrapper script after the Ray worker exits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericl Yes. Make sense. I use a wrapper script and it register a SIGTERM handler and in handler it deletes the temp dir.
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
999e051
to
ecc0139
Compare
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
REP: ray-project/enhancements#14 Signed-off-by: tmynn <[email protected]>
Signed-off-by: Weichen Xu [email protected]
Why are these changes needed?
REP: ray-project/enhancements#14
Commands to run tests:
Prerequisite
Testing on local machine (Requires linux system)
first, merge latest master into ray-on-spark branch, then
cd ray-repo
, thenInstall latest ray dev version:
pip install -U https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp38-cp38-manylinux2014_x86_64.whl
(for other python version see https://docs.ray.io/en/latest/ray-overview/installation.html#install-nightlies) , then runpython python/ray/setup-dev.py -y
, it will link ray source directory to your local ray repo code directory.pytest python/ray/tests/spark/test_ray_on_spark.py -s
Testing on databricks runtime
Install my PR branch ray package on databricks notebook:
https://e2-dogfood.staging.cloud.databricks.com/?o=6051921418418893#notebook/2110795073425386/command/2110795073425387
Using my testing notebook:
https://e2-dogfood.staging.cloud.databricks.com/?o=6051921418418893#notebook/2110795073420548/command/2110795073420549
Debugging tips
Checking ray processes output logs is painful. These files are scattered on every spark cluster worker nodes. For easier testing, we can create a spark cluster with only one worker machine, then,
By default, we can check following local disk path:
"Ray start" script output:
Other ray processes log output:
There will be a warning message output like:
But note the log path is local path on every spark cluster nodes. So, for non-driver nodes, you have to write a spark job to collect those files, like:
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.