Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Upgrade ray to 2.4.0 #1734

Merged
merged 44 commits into from
May 26, 2023
Merged
Show file tree
Hide file tree
Changes from 33 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
ca94eba
update the patches
Michaelvll Jan 23, 2023
1651e7f
upgrade node providers
Michaelvll Jan 23, 2023
55dd5b4
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
Michaelvll Jan 24, 2023
dc3c14f
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
Michaelvll Jan 24, 2023
d4ea222
fix azure config.py
Michaelvll Jan 24, 2023
ffb6f7b
print sky queue
Michaelvll Jan 24, 2023
d33593e
add back azure disk size
Michaelvll Jan 24, 2023
380d8b6
fix job manager
Michaelvll Jan 26, 2023
16fe424
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
Michaelvll Jan 26, 2023
c21d46d
fix hash
Michaelvll Jan 26, 2023
99cc5dc
longer timeout
Michaelvll Jan 26, 2023
5ad228d
fix test smoke
Michaelvll Jan 26, 2023
e3f0c60
Remove the patch for job_manager
Michaelvll Jan 26, 2023
3e42635
longer timeout for azure_region test
Michaelvll Jan 27, 2023
e0d8e7c
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
Michaelvll Feb 1, 2023
0cb298b
address comments
Michaelvll Feb 1, 2023
366173b
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
Michaelvll Feb 13, 2023
caee0e1
format
Michaelvll Feb 13, 2023
4e280a3
fix templates
Michaelvll Feb 13, 2023
582b0ba
pip install --exists-action
Michaelvll Feb 13, 2023
4351433
Upgrade to 2.3 instead
Michaelvll Feb 27, 2023
79627b8
upgrade to ray 2.3
Michaelvll Feb 27, 2023
9cc992e
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
Michaelvll Feb 27, 2023
daeaf02
Merge branches 'upgrade-ray-2.3' and 'master' of github.com:concretev…
Michaelvll Mar 26, 2023
e0a41ac
Merge branch 'master' of github.com:skypilot-org/skypilot into upgrad…
Michaelvll May 9, 2023
32f2b7e
update patches for 2.4
Michaelvll May 9, 2023
00cb9e9
adopt changes for azure providers: a777a028b8dbd7bbae9a7393c98f6cd65f…
Michaelvll May 9, 2023
68b7283
fix license
Michaelvll May 9, 2023
73bd1bf
fix patch for log monitor
Michaelvll May 9, 2023
d392b66
sleep longer for the multi-echo
Michaelvll May 9, 2023
90f88b0
longer waiting time
Michaelvll May 10, 2023
827eb7d
longer wait time
Michaelvll May 10, 2023
af74c2f
Merge branch 'master' of github.com:skypilot-org/skypilot into upgrad…
Michaelvll May 15, 2023
76e137f
Merge branch 'master' of github.com:skypilot-org/skypilot into upgrad…
Michaelvll May 23, 2023
c31874c
fix click dependencies
Michaelvll May 23, 2023
9814617
update setup.py
Michaelvll May 23, 2023
52571e9
Fix https://github.com/skypilot-org/skypilot/pull/1618#discussion_r11…
Michaelvll May 23, 2023
b312244
fix https://github.com/skypilot-org/skypilot/pull/1618#discussion_r11…
Michaelvll May 23, 2023
d070e39
revert test_smoke
Michaelvll May 23, 2023
621de78
fix comment
Michaelvll May 23, 2023
b30bf5d
revert to w instead of wipe
Michaelvll May 23, 2023
b414ddb
rewording
Michaelvll May 24, 2023
fc9eaf2
Merge branch 'master' of github.com:skypilot-org/skypilot into upgrad…
Michaelvll May 26, 2023
24fb0b4
minor fix
Michaelvll May 26, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/source/reference/local/setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,13 @@ For further reference, `here <https://docs.ray.io/en/latest/ray-core/configure.h
Installing SkyPilot dependencies
-----------------------------------

SkyPilot On-prem requires :code:`python3`, :code:`ray==2.0.1`, and :code:`sky` to be setup on all local nodes and globally available to all users.
SkyPilot On-prem requires :code:`python3`, :code:`ray==2.4.0`, and :code:`sky` to be setup on all local nodes and globally available to all users.

To install Ray and SkyPilot for all users, run the following commands on all local nodes:

.. code-block:: console

$ pip3 install ray[default]==2.0.1
$ pip3 install ray[default]==2.4.0

$ # SkyPilot requires python >= 3.7.
$ pip3 install skypilot
Expand Down
2 changes: 1 addition & 1 deletion examples/local/cluster-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# The system administrator must have `sudo` access to the local nodes.
# Requirements:
# 1) Python (> 3.6) on all nodes.
# 2) Ray CLI (= 2.0.1) on all nodes.
# 2) Ray CLI (= 2.3.0) on all nodes.
#
# Example usage:
# >> sky admin deploy cluster-config.yaml
Expand Down
6 changes: 3 additions & 3 deletions sky/backends/cloud_vm_ray_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,7 @@ def add_prologue(self,
# Should use 'auto' or 'ray://<internal_head_ip>:10001' rather than
# 'ray://localhost:10001', or 'ray://127.0.0.1:10001', for public cloud.
# Otherwise, it will a bug of ray job failed to get the placement group
# in ray <= 2.0.1.
# in ray <= 2.4.0.
# TODO(mluo): Check why 'auto' not working with on-prem cluster and
# whether the placement group issue also occurs in on-prem cluster.
ray_address = 'ray://localhost:10001' if is_local else 'auto'
Expand Down Expand Up @@ -1483,7 +1483,7 @@ def ray_up():
# Downside is existing tasks on the cluster will keep running
# (which may be ok with the semantics of 'sky launch' twice).
# Tracked in https://github.com/ray-project/ray/issues/20402.
# Ref: https://github.com/ray-project/ray/blob/releases/2.2.0/python/ray/autoscaler/sdk/sdk.py#L16-L49 # pylint: disable=line-too-long
# Ref: https://github.com/ray-project/ray/blob/releases/2.4.0/python/ray/autoscaler/sdk/sdk.py#L16-L49 # pylint: disable=line-too-long
script_path = write_ray_up_script_with_patched_launch_hash_fn(
cluster_config_file, ray_up_kwargs={'no_restart': True})

Expand Down Expand Up @@ -1718,7 +1718,7 @@ def _ensure_cluster_ray_started(self, handle: 'CloudVmRayResourceHandle',
if isinstance(launched_resources.cloud, clouds.Local):
raise RuntimeError(
'The command `ray status` errored out on the head node '
'of the local cluster. Check if ray[default]==2.0.1 '
'of the local cluster. Check if ray[default]==2.4.0 '
'is installed or running correctly.')
backend.run_on_head(handle, 'ray stop', use_cached_head_ip=False)

Expand Down
4 changes: 2 additions & 2 deletions sky/backends/monkey_patches/monkey_patch_ray_up.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
from ray.autoscaler import sdk


# Ref: https://github.com/ray-project/ray/blob/releases/2.2.0/python/ray/autoscaler/_private/util.py#L392-L404
# Ref: https://github.com/ray-project/ray/blob/releases/2.4.0/python/ray/autoscaler/_private/util.py#L396-L408
def monkey_patch_hash_launch_conf(node_conf, auth):
hasher = hashlib.sha1()
# For hashing, we replace the path to the key with the key
Expand All @@ -50,7 +50,7 @@ def monkey_patch_hash_launch_conf(node_conf, auth):
return hasher.hexdigest()


# Ref: https://github.com/ray-project/ray/blob/840215bc09e942b50cad0ab2db96a8fdc79217c1/python/ray/autoscaler/_private/commands.py#L854-L912
# Ref: https://github.com/ray-project/ray/blob/releases/2.4.0/python/ray/autoscaler/_private/commands.py#L854-L912
def monkey_patch_should_create_new_head(
head_node_id,
new_launch_hash,
Expand Down
2 changes: 1 addition & 1 deletion sky/design_docs/onprem-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
- Does not support different types of accelerators within the same node (intranode).

## Installing Ray and SkyPilot
- Admin installs Ray==2.0.1 and SkyPilot globally on all machines. It is assumed that the admin regularly keeps SkyPilot updated on the cluster.
- Admin installs Ray==2.4.0 and SkyPilot globally on all machines. It is assumed that the admin regularly keeps SkyPilot updated on the cluster.
- Python >= 3.7 for all users.
- When a regular user runs `sky launch`, a local version of SkyPilot will be installed on the machine for each user. The local installation of Ray is specified in `sky/templates/local-ray.yml.j2`.

Expand Down
7 changes: 3 additions & 4 deletions sky/setup_files/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,9 +65,8 @@ def parse_readme(readme: str) -> str:

install_requires = [
'wheel',
# NOTE: ray 2.0.1 requires click<=8.0.4,>=7.0; We disable the
# shell completion for click<8.0 for backward compatibility.
'click<=8.0.4,>=7.0',
# NOTE: ray>=2.3.0 requires click>=7.0
'click>=7.0',
# NOTE: required by awscli. To avoid ray automatically installing
# the latest version.
'colorama<0.4.5',
Expand All @@ -86,7 +85,7 @@ def parse_readme(readme: str) -> str:
'PrettyTable>=2.0.0',
# Lower local ray version is not fully supported, due to the
# autoscaler issues (also tracked in #537).
'ray[default]>=1.9.0,<=2.3.0',
'ray[default]>=1.9.0,<=2.4.0',
'rich',
'tabulate',
'typing-extensions',
Expand Down
14 changes: 7 additions & 7 deletions sky/skylet/LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -203,19 +203,19 @@
--------------------------------------------------------------------------------

Code in providers/azure from
https://github.com/ray-project/ray/tree/ray-2.0.1/python/ray/autoscaler/_private/_azure
Git commit of the release 2.0.1: 03b6bc7b5a305877501110ec04710a9c57011479
https://github.com/ray-project/ray/tree/ray-2.4.0/python/ray/autoscaler/_private/_azure
Git commit of the release 2.4.0: a777a028b8dbd7bbae9a7393c98f6cd65f98a5f5

Code in providers/gcp from
https://github.com/ray-project/ray/tree/ray-2.0.1/python/ray/autoscaler/_private/gcp
Git commit of the release 2.0.1: 03b6bc7b5a305877501110ec04710a9c57011479
https://github.com/ray-project/ray/tree/ray-2.4.0/python/ray/autoscaler/_private/gcp
Git commit of the release 2.4.0: 45ffe6eb99d96488fdec187bb47a4a78d9b5ee92

Code in providers/aws from
https://github.com/ray-project/ray/tree/ray-2.0.1/python/ray/autoscaler/_private/aws
Git commit of the release 2.0.1: 03b6bc7b5a305877501110ec04710a9c57011479
https://github.com/ray-project/ray/tree/ray-2.4.0/python/ray/autoscaler/_private/aws
Git commit of the release 2.4.0: c27859fa49f6470b98743bdce8288c7242d89699


Copyright 2016-2022 Ray developers
Copyright 2016-2023 Ray developers

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion sky/skylet/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

SKY_LOGS_DIRECTORY = '~/sky_logs'
SKY_REMOTE_WORKDIR = '~/sky_workdir'
SKY_REMOTE_RAY_VERSION = '2.0.1'
SKY_REMOTE_RAY_VERSION = '2.4.0'

# TODO(mluo): Make explicit `sky launch -c <name> ''` optional.
UNINITIALIZED_ONPREM_CLUSTER_MESSAGE = (
Expand Down
4 changes: 2 additions & 2 deletions sky/skylet/job_lib.py
Original file line number Diff line number Diff line change
Expand Up @@ -392,7 +392,7 @@ def update_job_status(job_owner: str,
during job cancelling, we still need this to handle the staleness problem,
caused by instance restarting and other corner cases (if any).
This function should only be run on the remote instance with ray==2.0.1.
This function should only be run on the remote instance with ray==2.4.0.
"""
if len(job_ids) == 0:
return []
Expand All @@ -402,7 +402,7 @@ def update_job_status(job_owner: str,

job_client = _create_ray_job_submission_client()

# In ray 2.0.1, job_client.list_jobs returns a list of JobDetails,
# In ray 2.4.0, job_client.list_jobs returns a list of JobDetails,
# which contains the job status (str) and submission_id (str).
job_detail_lists: List['ray_pydantic.JobDetails'] = job_client.list_jobs()

Expand Down
Loading