From f7bc05e5b183d0a1a47e78de0dbbe98b56b20a60 Mon Sep 17 00:00:00 2001 From: Carson Wang Date: Thu, 15 Sep 2022 17:20:22 +0800 Subject: [PATCH 1/3] Add document about Ray autoscaling --- doc/spark_on_ray.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/doc/spark_on_ray.md b/doc/spark_on_ray.md index 99a8888b..3a840285 100644 --- a/doc/spark_on_ray.md +++ b/doc/spark_on_ray.md @@ -60,6 +60,11 @@ spark = raydp.init_spark(...,enable_hive=True) spark.sql("select * from db.xxx").show() ``` + +### Autoscaling +You can use RayDP with Ray autoscaling. When you call `raydp.init_spark`, the autoscaler will try to increase the number of worker nodes if the current capacity of the cluster can't meet the resource demands. However currently there is a known issue in Ray autoscaling. The autoscaler's default strategy is to avoid launching GPU nodes if there aren't any GPU tasks at all. So if you configure a single worker node type with GPU, by default the autosaler will not launch nodes to start Spark executors on them. To resolve the issue, you can either set the environment variable "AUTOSCALER_CONSERVE_GPU_NODES" to 0 or configure multiple node types that at least one is CPU only node. + + ### Logging + Driver Log: By default, the spark driver log level is WARN. After getting a Spark session by running `spark = raydp.init_spark`, you can change the log level for example `spark.sparkContext.setLogLevel("INFO")`. You will also see some AppMaster INFO logs on the driver. This is because Ray redirects the actor logs to driver by default. To disable logging to driver, you can set it in Ray init `ray.init(log_to_driver=False)` + Executor Log: The spark executor logs are stored in Ray's logging directory. By default they are available at /tmp/ray/session_\*/logs/java-worker-\*.log From 018c103ef39e768f98fec41ac1449b3711b5be98 Mon Sep 17 00:00:00 2001 From: Carson Wang Date: Thu, 15 Sep 2022 17:26:30 +0800 Subject: [PATCH 2/3] Fix typo --- doc/spark_on_ray.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/spark_on_ray.md b/doc/spark_on_ray.md index 3a840285..61fd2fa6 100644 --- a/doc/spark_on_ray.md +++ b/doc/spark_on_ray.md @@ -62,7 +62,7 @@ spark.sql("select * from db.xxx").show() ### Autoscaling -You can use RayDP with Ray autoscaling. When you call `raydp.init_spark`, the autoscaler will try to increase the number of worker nodes if the current capacity of the cluster can't meet the resource demands. However currently there is a known issue in Ray autoscaling. The autoscaler's default strategy is to avoid launching GPU nodes if there aren't any GPU tasks at all. So if you configure a single worker node type with GPU, by default the autosaler will not launch nodes to start Spark executors on them. To resolve the issue, you can either set the environment variable "AUTOSCALER_CONSERVE_GPU_NODES" to 0 or configure multiple node types that at least one is CPU only node. +You can use RayDP with Ray autoscaling. When you call `raydp.init_spark`, the autoscaler will try to increase the number of worker nodes if the current capacity of the cluster can't meet the resource demands. However currently there is a known issue in Ray autoscaling. The autoscaler's default strategy is to avoid launching GPU nodes if there aren't any GPU tasks at all. So if you configure a single worker node type with GPU, by default the autoscaler will not launch nodes to start Spark executors on them. To resolve the issue, you can either set the environment variable "AUTOSCALER_CONSERVE_GPU_NODES" to 0 or configure multiple node types that at least one is CPU only node. ### Logging From ecde602adc1f487d748a04bb8c54131b01b51c41 Mon Sep 17 00:00:00 2001 From: Carson Wang Date: Fri, 16 Sep 2022 17:04:25 +0800 Subject: [PATCH 3/3] Add issue link --- doc/spark_on_ray.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/spark_on_ray.md b/doc/spark_on_ray.md index 61fd2fa6..59ee0395 100644 --- a/doc/spark_on_ray.md +++ b/doc/spark_on_ray.md @@ -62,7 +62,7 @@ spark.sql("select * from db.xxx").show() ### Autoscaling -You can use RayDP with Ray autoscaling. When you call `raydp.init_spark`, the autoscaler will try to increase the number of worker nodes if the current capacity of the cluster can't meet the resource demands. However currently there is a known issue in Ray autoscaling. The autoscaler's default strategy is to avoid launching GPU nodes if there aren't any GPU tasks at all. So if you configure a single worker node type with GPU, by default the autoscaler will not launch nodes to start Spark executors on them. To resolve the issue, you can either set the environment variable "AUTOSCALER_CONSERVE_GPU_NODES" to 0 or configure multiple node types that at least one is CPU only node. +You can use RayDP with Ray autoscaling. When you call `raydp.init_spark`, the autoscaler will try to increase the number of worker nodes if the current capacity of the cluster can't meet the resource demands. However currently there is a known issue [#20476](https://github.com/ray-project/ray/issues/20476) in Ray autoscaling. The autoscaler's default strategy is to avoid launching GPU nodes if there aren't any GPU tasks at all. So if you configure a single worker node type with GPU, by default the autoscaler will not launch nodes to start Spark executors on them. To resolve the issue, you can either set the environment variable "AUTOSCALER_CONSERVE_GPU_NODES" to 0 or configure multiple node types that at least one is CPU only node. ### Logging