Skip to content

Commit

Permalink
[Core] Upgrade ray to 2.4.0 (#1734)
Browse files Browse the repository at this point in the history
* update the patches

* upgrade node providers

* fix azure config.py

* print sky queue

* add back azure disk size

* fix job manager

* fix hash

* longer timeout

* fix test smoke

* Remove the patch for job_manager

* longer timeout for azure_region test

* address comments

* format

* fix templates

* pip install --exists-action

* Upgrade to 2.3 instead

* upgrade to ray 2.3

* update patches for 2.4

* adopt changes for azure providers: a777a028b8dbd7bbae9a7393c98f6cd65f98a5f5

* fix license

* fix patch for log monitor

* sleep longer for the multi-echo

* longer waiting time

* longer wait time

* fix click dependencies

* update setup.py

* Fix #1618 (comment)

* fix #1618 (comment)

* revert test_smoke

* fix comment

* revert to w instead of wipe

* rewording

* minor fix
  • Loading branch information
Michaelvll authored May 26, 2023
1 parent f733415 commit ef08910
Show file tree
Hide file tree
Showing 31 changed files with 330 additions and 219 deletions.
4 changes: 2 additions & 2 deletions docs/source/reference/local/setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,13 @@ For further reference, `here <https://docs.ray.io/en/latest/ray-core/configure.h
Installing SkyPilot dependencies
-----------------------------------

SkyPilot On-prem requires :code:`python3`, :code:`ray==2.0.1`, and :code:`sky` to be setup on all local nodes and globally available to all users.
SkyPilot On-prem requires :code:`python3`, :code:`ray==2.4.0`, and :code:`sky` to be setup on all local nodes and globally available to all users.

To install Ray and SkyPilot for all users, run the following commands on all local nodes:

.. code-block:: console
$ pip3 install ray[default]==2.0.1
$ pip3 install ray[default]==2.4.0
$ # SkyPilot requires python >= 3.7.
$ pip3 install skypilot
Expand Down
2 changes: 1 addition & 1 deletion examples/local/cluster-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# The system administrator must have `sudo` access to the local nodes.
# Requirements:
# 1) Python (> 3.6) on all nodes.
# 2) Ray CLI (= 2.0.1) on all nodes.
# 2) Ray CLI (= 2.4.0) on all nodes.
#
# Example usage:
# >> sky admin deploy cluster-config.yaml
Expand Down
8 changes: 4 additions & 4 deletions sky/backends/cloud_vm_ray_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -203,8 +203,8 @@ def add_prologue(self,
self.job_id = job_id
# Should use 'auto' or 'ray://<internal_head_ip>:10001' rather than
# 'ray://localhost:10001', or 'ray://127.0.0.1:10001', for public cloud.
# Otherwise, it will a bug of ray job failed to get the placement group
# in ray <= 2.0.1.
# Otherwise, ray will fail to get the placement group because of a bug
# in ray job.
# TODO(mluo): Check why 'auto' not working with on-prem cluster and
# whether the placement group issue also occurs in on-prem cluster.
ray_address = 'ray://localhost:10001' if is_local else 'auto'
Expand Down Expand Up @@ -1486,7 +1486,7 @@ def ray_up():
# Downside is existing tasks on the cluster will keep running
# (which may be ok with the semantics of 'sky launch' twice).
# Tracked in https://github.com/ray-project/ray/issues/20402.
# Ref: https://github.com/ray-project/ray/blob/releases/2.2.0/python/ray/autoscaler/sdk/sdk.py#L16-L49 # pylint: disable=line-too-long
# Ref: https://github.com/ray-project/ray/blob/releases/2.4.0/python/ray/autoscaler/sdk/sdk.py#L16-L49 # pylint: disable=line-too-long
script_path = write_ray_up_script_with_patched_launch_hash_fn(
cluster_config_file, ray_up_kwargs={'no_restart': True})

Expand Down Expand Up @@ -1721,7 +1721,7 @@ def _ensure_cluster_ray_started(self, handle: 'CloudVmRayResourceHandle',
if isinstance(launched_resources.cloud, clouds.Local):
raise RuntimeError(
'The command `ray status` errored out on the head node '
'of the local cluster. Check if ray[default]==2.0.1 '
'of the local cluster. Check if ray[default]==2.4.0 '
'is installed or running correctly.')
backend.run_on_head(handle, 'ray stop', use_cached_head_ip=False)

Expand Down
4 changes: 2 additions & 2 deletions sky/backends/monkey_patches/monkey_patch_ray_up.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
from ray.autoscaler import sdk


# Ref: https://github.com/ray-project/ray/blob/releases/2.2.0/python/ray/autoscaler/_private/util.py#L392-L404
# Ref: https://github.com/ray-project/ray/blob/releases/2.4.0/python/ray/autoscaler/_private/util.py#L396-L408
def monkey_patch_hash_launch_conf(node_conf, auth):
hasher = hashlib.sha1()
# For hashing, we replace the path to the key with the key
Expand All @@ -50,7 +50,7 @@ def monkey_patch_hash_launch_conf(node_conf, auth):
return hasher.hexdigest()


# Ref: https://github.com/ray-project/ray/blob/840215bc09e942b50cad0ab2db96a8fdc79217c1/python/ray/autoscaler/_private/commands.py#L854-L912
# Ref: https://github.com/ray-project/ray/blob/releases/2.4.0/python/ray/autoscaler/_private/commands.py#L854-L912
def monkey_patch_should_create_new_head(
head_node_id,
new_launch_hash,
Expand Down
2 changes: 1 addition & 1 deletion sky/design_docs/onprem-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
- Does not support different types of accelerators within the same node (intranode).

## Installing Ray and SkyPilot
- Admin installs Ray==2.0.1 and SkyPilot globally on all machines. It is assumed that the admin regularly keeps SkyPilot updated on the cluster.
- Admin installs Ray==2.4.0 and SkyPilot globally on all machines. It is assumed that the admin regularly keeps SkyPilot updated on the cluster.
- Python >= 3.7 for all users.
- When a regular user runs `sky launch`, a local version of SkyPilot will be installed on the machine for each user. The local installation of Ray is specified in `sky/templates/local-ray.yml.j2`.

Expand Down
33 changes: 18 additions & 15 deletions sky/setup_files/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,9 +65,8 @@ def parse_readme(readme: str) -> str:

install_requires = [
'wheel',
# NOTE: ray 2.0.1 requires click<=8.0.4,>=7.0; We disable the
# shell completion for click<8.0 for backward compatibility.
'click<=8.0.4,>=7.0',
# NOTE: ray requires click>=7.0
'click>=7.0',
# NOTE: required by awscli. To avoid ray automatically installing
# the latest version.
'colorama<0.4.5',
Expand All @@ -84,22 +83,26 @@ def parse_readme(readme: str) -> str:
# PrettyTable with version >=2.0.0 is required for the support of
# `add_rows` method.
'PrettyTable>=2.0.0',
# Lower local ray version is not fully supported, due to the
# autoscaler issues (also tracked in #537).
'ray[default]>=1.9.0,<=2.3.0',
# Lower version of ray will cause dependency conflict for
# click/grpcio/protobuf.
'ray[default]>=2.2.0,<=2.4.0',
'rich',
'tabulate',
'typing-extensions',
# Light weight requirement, can be replaced with "typing" once
# we deprecate Python 3.7 (this will take a while).
"typing_extensions; python_version < '3.8'",
'filelock>=3.6.0',
# This is used by ray. The latest 1.44.0 will generate an error
# `Fork support is only compatible with the epoll1 and poll
# polling strategies`
'grpcio>=1.32.0,<=1.43.0',
# Adopted from ray's setup.py:
# Tracking issue: https://github.com/ray-project/ray/issues/30984
"grpcio >= 1.32.0, <= 1.49.1; python_version < '3.10' and sys_platform == 'darwin'", # noqa:E501
"grpcio >= 1.42.0, <= 1.49.1; python_version >= '3.10' and sys_platform == 'darwin'", # noqa:E501
# Original issue: https://github.com/ray-project/ray/issues/33833
"grpcio >= 1.32.0, <= 1.51.3; python_version < '3.10' and sys_platform != 'darwin'", # noqa:E501
"grpcio >= 1.42.0, <= 1.51.3; python_version >= '3.10' and sys_platform != 'darwin'", # noqa:E501
'packaging',
# The latest 4.21.1 will break ray. Enforce < 4.0.0 until Ray releases the
# fix.
# https://github.com/ray-project/ray/pull/25211
'protobuf<4.0.0',
# Adopted from ray's setup.py:
# https://github.com/ray-project/ray/blob/86fab1764e618215d8131e8e5068f0d493c77023/python/setup.py#L326
'protobuf >= 3.15.3, != 3.19.5',
'psutil',
'pulp',
]
Expand Down
14 changes: 7 additions & 7 deletions sky/skylet/LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -203,19 +203,19 @@
--------------------------------------------------------------------------------

Code in providers/azure from
https://github.com/ray-project/ray/tree/ray-2.0.1/python/ray/autoscaler/_private/_azure
Git commit of the release 2.0.1: 03b6bc7b5a305877501110ec04710a9c57011479
https://github.com/ray-project/ray/tree/ray-2.4.0/python/ray/autoscaler/_private/_azure
Git commit of the release 2.4.0: a777a028b8dbd7bbae9a7393c98f6cd65f98a5f5

Code in providers/gcp from
https://github.com/ray-project/ray/tree/ray-2.0.1/python/ray/autoscaler/_private/gcp
Git commit of the release 2.0.1: 03b6bc7b5a305877501110ec04710a9c57011479
https://github.com/ray-project/ray/tree/ray-2.4.0/python/ray/autoscaler/_private/gcp
Git commit of the release 2.4.0: 45ffe6eb99d96488fdec187bb47a4a78d9b5ee92

Code in providers/aws from
https://github.com/ray-project/ray/tree/ray-2.0.1/python/ray/autoscaler/_private/aws
Git commit of the release 2.0.1: 03b6bc7b5a305877501110ec04710a9c57011479
https://github.com/ray-project/ray/tree/ray-2.4.0/python/ray/autoscaler/_private/aws
Git commit of the release 2.4.0: c27859fa49f6470b98743bdce8288c7242d89699


Copyright 2016-2022 Ray developers
Copyright 2016-2023 Ray developers

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion sky/skylet/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

SKY_LOGS_DIRECTORY = '~/sky_logs'
SKY_REMOTE_WORKDIR = '~/sky_workdir'
SKY_REMOTE_RAY_VERSION = '2.0.1'
SKY_REMOTE_RAY_VERSION = '2.4.0'

# TODO(mluo): Make explicit `sky launch -c <name> ''` optional.
UNINITIALIZED_ONPREM_CLUSTER_MESSAGE = (
Expand Down
4 changes: 2 additions & 2 deletions sky/skylet/job_lib.py
Original file line number Diff line number Diff line change
Expand Up @@ -392,7 +392,7 @@ def update_job_status(job_owner: str,
during job cancelling, we still need this to handle the staleness problem,
caused by instance restarting and other corner cases (if any).
This function should only be run on the remote instance with ray==2.0.1.
This function should only be run on the remote instance with ray==2.4.0.
"""
if len(job_ids) == 0:
return []
Expand All @@ -402,7 +402,7 @@ def update_job_status(job_owner: str,

job_client = _create_ray_job_submission_client()

# In ray 2.0.1, job_client.list_jobs returns a list of JobDetails,
# In ray 2.4.0, job_client.list_jobs returns a list of JobDetails,
# which contains the job status (str) and submission_id (str).
job_detail_lists: List['ray_pydantic.JobDetails'] = job_client.list_jobs()

Expand Down
Loading

0 comments on commit ef08910

Please sign in to comment.