Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Upgrade ray to 2.3.0 #1618

Closed
wants to merge 23 commits into from
Closed
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
ca94eba
update the patches
Michaelvll Jan 23, 2023
1651e7f
upgrade node providers
Michaelvll Jan 23, 2023
55dd5b4
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
Michaelvll Jan 24, 2023
dc3c14f
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
Michaelvll Jan 24, 2023
d4ea222
fix azure config.py
Michaelvll Jan 24, 2023
ffb6f7b
print sky queue
Michaelvll Jan 24, 2023
d33593e
add back azure disk size
Michaelvll Jan 24, 2023
380d8b6
fix job manager
Michaelvll Jan 26, 2023
16fe424
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
Michaelvll Jan 26, 2023
c21d46d
fix hash
Michaelvll Jan 26, 2023
99cc5dc
longer timeout
Michaelvll Jan 26, 2023
5ad228d
fix test smoke
Michaelvll Jan 26, 2023
e3f0c60
Remove the patch for job_manager
Michaelvll Jan 26, 2023
3e42635
longer timeout for azure_region test
Michaelvll Jan 27, 2023
e0d8e7c
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
Michaelvll Feb 1, 2023
0cb298b
address comments
Michaelvll Feb 1, 2023
366173b
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
Michaelvll Feb 13, 2023
caee0e1
format
Michaelvll Feb 13, 2023
4e280a3
fix templates
Michaelvll Feb 13, 2023
582b0ba
pip install --exists-action
Michaelvll Feb 13, 2023
4351433
Upgrade to 2.3 instead
Michaelvll Feb 27, 2023
79627b8
upgrade to ray 2.3
Michaelvll Feb 27, 2023
9cc992e
Merge branch 'master' of github.com:concretevitamin/sky-experiments i…
Michaelvll Feb 27, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/source/reference/local/setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,13 @@ For further reference, `here <https://docs.ray.io/en/latest/ray-core/configure.h
Installing SkyPilot dependencies
-----------------------------------

SkyPilot On-prem requires :code:`python3`, :code:`ray==2.0.1`, and :code:`sky` to be setup on all local nodes and globally available to all users.
SkyPilot On-prem requires :code:`python3`, :code:`ray==2.2.0`, and :code:`sky` to be setup on all local nodes and globally available to all users.

To install Ray and SkyPilot for all users, run the following commands on all local nodes:

.. code-block:: console

$ pip3 install ray[default]==2.0.1
$ pip3 install ray[default]==2.2.0

$ # SkyPilot requires python >= 3.6.
$ pip3 install skypilot
Expand Down
2 changes: 1 addition & 1 deletion examples/local/cluster-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# The system administrator must have `sudo` access to the local nodes.
# Requirements:
# 1) Python (> 3.6) on all nodes.
# 2) Ray CLI (= 2.0.1) on all nodes.
# 2) Ray CLI (= 2.2.0) on all nodes.
#
# Example usage:
# >> sky admin deploy cluster-config.yaml
Expand Down
4 changes: 2 additions & 2 deletions sky/backends/cloud_vm_ray_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ def add_prologue(self,
# Should use 'auto' or 'ray://<internal_head_ip>:10001' rather than
# 'ray://localhost:10001', or 'ray://127.0.0.1:10001', for public cloud.
# Otherwise, it will a bug of ray job failed to get the placement group
# in ray <= 2.0.1.
# in ray <= 2.2.0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unneeded?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this in #1734.

# TODO(mluo): Check why 'auto' not working with on-prem cluster and
# whether the placement group issue also occurs in on-prem cluster.
ray_address = 'ray://localhost:10001' if is_local else 'auto'
Expand Down Expand Up @@ -1623,7 +1623,7 @@ def _ensure_cluster_ray_started(self,
if isinstance(launched_resources.cloud, clouds.Local):
raise RuntimeError(
'The command `ray status` errored out on the head node '
'of the local cluster. Check if ray[default]==2.0.1 '
'of the local cluster. Check if ray[default]==2.2.0 '
'is installed or running correctly.')
backend.run_on_head(handle, 'ray stop', use_cached_head_ip=False)

Expand Down
2 changes: 1 addition & 1 deletion sky/design_docs/onprem-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
- Does not support different types of accelerators within the same node (intranode).

## Installing Ray and SkyPilot
- Admin installs Ray==2.0.1 and SkyPilot globally on all machines. It is assumed that the admin regularly keeps SkyPilot updated on the cluster.
- Admin installs Ray==2.2.0 and SkyPilot globally on all machines. It is assumed that the admin regularly keeps SkyPilot updated on the cluster.
- Python >= 3.6 for all users.
- When a regular user runs `sky launch`, a local version of SkyPilot will be installed on the machine for each user. The local installation of Ray is specified in `sky/templates/local-ray.yml.j2`.

Expand Down
4 changes: 2 additions & 2 deletions sky/setup_files/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,9 +64,9 @@ def parse_readme(readme: str) -> str:

install_requires = [
'wheel',
# NOTE: ray 2.0.1 requires click<=8.0.4,>=7.0; We disable the
# NOTE: ray 2.2.0 requires click>=7.0; We disable the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local ray versions may be older than 2.2. Does that mean the click<=8.0.4 constraint is still needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to upgrade the local ray version to 2.2.0 due to a bunch of the conflicts of the dependencies in #1734 . Wdyt?

# shell completion for click<8.0 for backward compatibility.
'click<=8.0.4,>=7.0',
'click>=7.0',
# NOTE: required by awscli. To avoid ray automatically installing
# the latest version.
'colorama<0.4.5',
Expand Down
14 changes: 7 additions & 7 deletions sky/skylet/LICENCE
Original file line number Diff line number Diff line change
Expand Up @@ -203,19 +203,19 @@
--------------------------------------------------------------------------------

Code in providers/azure from
https://github.com/ray-project/ray/tree/ray-2.0.1/python/ray/autoscaler/_private/_azure
Git commit of the release 2.0.1: 03b6bc7b5a305877501110ec04710a9c57011479
https://github.com/ray-project/ray/tree/ray-2.2.0/python/ray/autoscaler/_private/_azure
Git commit of the release 2.2.0: 840215bc09e942b50cad0ab2db96a8fdc79217c1

Code in providers/gcp from
https://github.com/ray-project/ray/tree/ray-2.0.1/python/ray/autoscaler/_private/gcp
Git commit of the release 2.0.1: 03b6bc7b5a305877501110ec04710a9c57011479
https://github.com/ray-project/ray/tree/ray-2.2.0/python/ray/autoscaler/_private/gcp
Git commit of the release 2.2.0: 840215bc09e942b50cad0ab2db96a8fdc79217c1

Code in providers/aws from
https://github.com/ray-project/ray/tree/ray-2.0.1/python/ray/autoscaler/_private/aws
Git commit of the release 2.0.1: 03b6bc7b5a305877501110ec04710a9c57011479
https://github.com/ray-project/ray/tree/ray-2.2.0/python/ray/autoscaler/_private/aws
Git commit of the release 2.2.0: 840215bc09e942b50cad0ab2db96a8fdc79217c1


Copyright 2016-2022 Ray developers
Copyright 2016-2023 Ray developers

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion sky/skylet/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

SKY_LOGS_DIRECTORY = '~/sky_logs'
SKY_REMOTE_WORKDIR = '~/sky_workdir'
SKY_REMOTE_RAY_VERSION = '2.0.1'
SKY_REMOTE_RAY_VERSION = '2.2.0'

# TODO(mluo): Make explicit `sky launch -c <name> ''` optional.
UNINITIALIZED_ONPREM_CLUSTER_MESSAGE = (
Expand Down
4 changes: 2 additions & 2 deletions sky/skylet/job_lib.py
Original file line number Diff line number Diff line change
Expand Up @@ -389,7 +389,7 @@ def update_job_status(job_owner: str,
during job cancelling, we still need this to handle the staleness problem,
caused by instance restarting and other corner cases (if any).

This function should only be run on the remote instance with ray==2.0.1.
This function should only be run on the remote instance with ray==2.2.0.
"""
if len(job_ids) == 0:
return []
Expand All @@ -399,7 +399,7 @@ def update_job_status(job_owner: str,

job_client = _create_ray_job_submission_client()

# In ray 2.0.1, job_client.list_jobs returns a list of JobDetails,
# In ray 2.2.0, job_client.list_jobs returns a list of JobDetails,
# which contains the job status (str) and submission_id (str).
job_detail_lists: List['ray_pydantic.JobDetails'] = job_client.list_jobs()

Expand Down
Loading