Skip to content

Commit

Permalink
Replace legacy commands with 'dask worker' and 'dask scheduler'.
Browse files Browse the repository at this point in the history
  • Loading branch information
wilson committed Feb 2, 2023
1 parent e528d12 commit ddf78ca
Show file tree
Hide file tree
Showing 9 changed files with 53 additions and 48 deletions.
5 changes: 1 addition & 4 deletions dask_cloudprovider/aws/ec2.py
Original file line number Diff line number Diff line change
Expand Up @@ -265,9 +265,6 @@ class EC2Cluster(VMCluster):
It is assumed that the ``ami`` will not have Docker installed (or the NVIDIA drivers for GPU instances).
If ``bootstrap`` is ``True`` these dependencies will be installed on instance start. If you are using
a custom AMI which already has these dependencies set this to ``False.``
worker_command: string (optional)
The command workers should run when starting. By default this will be ``"dask-worker"`` unless
``instance_type`` is a GPU instance in which case ``dask-cuda-worker`` will be used.
ami: string (optional)
The base OS AMI to use for scheduler and workers.
Expand Down Expand Up @@ -340,7 +337,7 @@ class EC2Cluster(VMCluster):
The Docker image to run on all instances.
This image must have a valid Python environment and have ``dask`` installed in order for the
``dask-scheduler`` and ``dask-worker`` commands to be available. It is recommended the Python
``dask scheduler`` and ``dask worker`` commands to be available. It is recommended the Python
environment matches your local environment where ``EC2Cluster`` is being created from.
For GPU instance types the Docker image much have NVIDIA drivers and ``dask-cuda`` installed.
Expand Down
79 changes: 44 additions & 35 deletions dask_cloudprovider/aws/ecs.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ class Task:
AWS resource tags to be applied to any resources that are created.
name: str (optional)
Name for the task. Currently used for the --namecommand line argument to dask-worker.
Name for the task. Currently used for the --namecommand line argument to `dask worker`.
platform_version: str (optional)
Version of the AWS Fargate platform to use, e.g. "1.4.0" or "LATEST". This
Expand Down Expand Up @@ -368,7 +368,7 @@ class Scheduler(Task):
scheduler_timeout: str
Time of inactivity after which to kill the scheduler.
scheduler_extra_args: List[str] (optional)
Any extra command line arguments to pass to dask-scheduler, e.g. ``["--tls-cert", "/path/to/cert.pem"]``
Any extra command line arguments to pass to ``dask scheduler``, e.g. ``["--tls-cert", "/path/to/cert.pem"]``
Defaults to `None`, no extra command line arguments.
kwargs: Dict()
Expand All @@ -386,7 +386,8 @@ def __init__(
self.task_type = "scheduler"
self._overrides = {
"command": [
"dask-scheduler",
"dask",
"scheduler",
"--idle-timeout",
scheduler_timeout,
]
Expand Down Expand Up @@ -434,24 +435,25 @@ def __init__(
self._mem = mem
self._gpu = gpu
self._nthreads = nthreads
_command = [
"dask",
"cuda" if self._gpu else None,
"worker",
self.scheduler,
"--name",
str(self.name),
"--nthreads",
"{}".format(
max(int(self._cpu / 1024), 1) if nthreads is None else self._nthreads
),
"--memory-limit",
"{}GB".format(int(self._mem / 1024)),
"--death-timeout",
"60",
]
_command = [e for e in _command if e is not None]
self._overrides = {
"command": [
"dask-cuda-worker" if self._gpu else "dask-worker",
self.scheduler,
"--name",
str(self.name),
"--nthreads",
"{}".format(
max(int(self._cpu / 1024), 1)
if nthreads is None
else self._nthreads
),
"--memory-limit",
"{}GB".format(int(self._mem / 1024)),
"--death-timeout",
"60",
]
+ (list() if not extra_args else extra_args)
"command": _command + (list() if not extra_args else extra_args)
}


Expand Down Expand Up @@ -503,7 +505,7 @@ class ECSCluster(SpecCluster, ConfigMixin):
Defaults to ``8786``
scheduler_extra_args: List[str] (optional)
Any extra command line arguments to pass to dask-scheduler, e.g. ``["--tls-cert", "/path/to/cert.pem"]``
Any extra command line arguments to pass to ``dask scheduler``, e.g. ``["--tls-cert", "/path/to/cert.pem"]``
Defaults to `None`, no extra command line arguments.
scheduler_task_definition_arn: str (optional)
Expand Down Expand Up @@ -549,7 +551,7 @@ class ECSCluster(SpecCluster, ConfigMixin):
Defaults to `None`, meaning that the task definition will be created along with the cluster, and cleaned up once
the cluster is shut down.
worker_extra_args: List[str] (optional)
Any extra command line arguments to pass to dask-worker, e.g. ``["--tls-cert", "/path/to/cert.pem"]``
Any extra command line arguments to pass to ``dask worker``, e.g. ``["--tls-cert", "/path/to/cert.pem"]``
Defaults to `None`, no extra command line arguments.
worker_task_kwargs: dict (optional)
Expand Down Expand Up @@ -698,7 +700,7 @@ class ECSCluster(SpecCluster, ConfigMixin):
... worker_gpu=1)
By setting the ``worker_gpu`` option to something other than ``None`` will cause the cluster
to run ``dask-cuda-worker`` as the worker startup command. Setting this option will also change
to run ``dask cuda worker`` as the worker startup command. Setting this option will also change
the default Docker image to ``rapidsai/rapidsai:latest``, if you're using a custom image
you must ensure the NVIDIA CUDA toolkit is installed with a version that matches the host machine
along with ``dask-cuda``.
Expand Down Expand Up @@ -1189,7 +1191,8 @@ async def _create_scheduler_task_definition_arn(self):
"memoryReservation": self._scheduler_mem,
"essential": True,
"command": [
"dask-scheduler",
"dask",
"scheduler",
"--idle-timeout",
self._scheduler_timeout,
]
Expand Down Expand Up @@ -1259,17 +1262,23 @@ async def _create_worker_task_definition_arn(self):
"resourceRequirements": resource_requirements,
"essential": True,
"command": [
"dask-cuda-worker" if self._worker_gpu else "dask-worker",
"--nthreads",
"{}".format(
max(int(self._worker_cpu / 1024), 1)
if self._worker_nthreads is None
else self._worker_nthreads
),
"--memory-limit",
"{}MB".format(int(self._worker_mem)),
"--death-timeout",
"60",
e
for e in [
"dask",
"cuda" if self._worker_gpu else None,
"worker",
"--nthreads",
"{}".format(
max(int(self._worker_cpu / 1024), 1)
if self._worker_nthreads is None
else self._worker_nthreads
),
"--memory-limit",
"{}MB".format(int(self._worker_mem)),
"--death-timeout",
"60",
]
if e is not None
]
+ (
list()
Expand Down
2 changes: 1 addition & 1 deletion dask_cloudprovider/azure/azurevm.py
Original file line number Diff line number Diff line change
Expand Up @@ -309,7 +309,7 @@ class AzureVMCluster(VMCluster):
The Docker image to run on all instances.
This image must have a valid Python environment and have ``dask`` installed in order for the
``dask-scheduler`` and ``dask-worker`` commands to be available. It is recommended the Python
``dask scheduler`` and ``dask worker`` commands to be available. It is recommended the Python
environment matches your local environment where ``AzureVMCluster`` is being created from.
For GPU instance types the Docker image much have NVIDIA drivers and ``dask-cuda`` installed.
Expand Down
1 change: 0 additions & 1 deletion dask_cloudprovider/cloudprovider.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,6 @@ cloudprovider:
availability_zone: null # The availability zone to start you clusters. By default AWS will select the AZ with most free capacity.
bootstrap: true # It is assumed that the AMI does not have Docker and needs bootstrapping. Set this to false if using a custom AMI with Docker already installed.
auto_shutdown: true # Shutdown instances automatically if the scheduler or worker services time out.
# worker_command: "dask-worker" # The command for workers to run. If the instance_type is a GPU instance dask-cuda-worker will be used.
ami: null # AMI ID to use for all instances. Defaults to latest Ubuntu 20.04 image.
instance_type: "t2.micro" # Instance type for the scheduler and all workers
scheduler_instance_type: "t2.micro" # Instance type for the scheduler
Expand Down
2 changes: 1 addition & 1 deletion dask_cloudprovider/digitalocean/droplet.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ class DropletCluster(VMCluster):
The Docker image to run on all instances.
This image must have a valid Python environment and have ``dask`` installed in order for the
``dask-scheduler`` and ``dask-worker`` commands to be available. It is recommended the Python
``dask scheduler`` and ``dask worker`` commands to be available. It is recommended the Python
environment matches your local environment where ``EC2Cluster`` is being created from.
For GPU instance types the Docker image much have NVIDIA drivers and ``dask-cuda`` installed.
Expand Down
2 changes: 1 addition & 1 deletion dask_cloudprovider/gcp/instances.py
Original file line number Diff line number Diff line change
Expand Up @@ -435,7 +435,7 @@ class GCPCluster(VMCluster):
The Docker image to run on all instances.
This image must have a valid Python environment and have ``dask`` installed in order for the
``dask-scheduler`` and ``dask-worker`` commands to be available. It is recommended the Python
``dask scheduler`` and ``dask worker`` commands to be available. It is recommended the Python
environment matches your local environment where ``EC2Cluster`` is being created from.
For GPU instance types the Docker image much have NVIDIA drivers and ``dask-cuda`` installed.
Expand Down
2 changes: 1 addition & 1 deletion dask_cloudprovider/gcp/tests/test_gcp.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ async def test_get_cloud_init():
docker_args="--privileged",
extra_bootstrap=["gcloud auth print-access-token"],
)
assert "dask-scheduler" in cloud_init
assert "dask scheduler" in cloud_init
assert "# Bootstrap" in cloud_init
assert " --privileged " in cloud_init
assert "- gcloud auth print-access-token" in cloud_init
Expand Down
4 changes: 2 additions & 2 deletions dask_cloudprovider/generic/vmcluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,7 @@ class VMCluster(SpecCluster):
The Docker image to run on all instances.
This image must have a valid Python environment and have ``dask`` installed in order for the
``dask-scheduler`` and ``dask-worker`` commands to be available. It is recommended the Python
``dask scheduler`` and ``dask worker`` commands to be available. It is recommended the Python
environment matches your local environment where ``EC2Cluster`` is being created from.
For GPU instance types the Docker image much have NVIDIA drivers and ``dask-cuda`` installed.
Expand Down Expand Up @@ -366,7 +366,7 @@ def get_cloud_init(cls, *args, **kwargs):
cluster.auto_shutdown = False
return cluster.render_cloud_init(
image=cluster.options["docker_image"],
command="dask-scheduler --version",
command="dask scheduler --version",
docker_args=cluster.options["docker_args"],
extra_bootstrap=cluster.options["extra_bootstrap"],
gpu_instance=cluster.gpu_instance,
Expand Down
4 changes: 2 additions & 2 deletions doc/source/gpus.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Each cluster manager handles this differently but generally you will need to con

- Configure the hardware to include GPUs. This may be by changing the hardware type or adding accelerators.
- Ensure the OS/Docker image has the NVIDIA drivers. For Docker images it is recommended to use the [RAPIDS images](https://hub.docker.com/r/rapidsai/rapidsai/).
- Set the ``worker_module`` config option to ``dask_cuda.cli.dask_cuda_worker`` or ``worker_command`` option to ``dask-cuda-worker``.
- Set the ``worker_module`` config option to ``dask_cuda.cli.dask_cuda_worker`` or set ``resources`` to include ``GPU=n`` where ``n`` is the number of GPUs you require. This will cause ``dask cuda worker`` to be used in place of ``dask worker``.

In the following AWS :class:`dask_cloudprovider.aws.EC2Cluster` example we set the ``ami`` to be a Deep Learning AMI with NVIDIA drivers, the ``docker_image`` to RAPIDS, the ``instance_type``
to ``p3.2xlarge`` which has one NVIDIA Tesla V100 and the ``worker_module`` to ``dask_cuda.cli.dask_cuda_worker``.
Expand All @@ -24,4 +24,4 @@ to ``p3.2xlarge`` which has one NVIDIA Tesla V100 and the ``worker_module`` to `
bootstrap=False,
filesystem_size=120)
See each cluster manager's example sections for info on starting a GPU cluster.
See each cluster manager's example sections for info on starting a GPU cluster.

0 comments on commit ddf78ca

Please sign in to comment.