-
Notifications
You must be signed in to change notification settings - Fork 503
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Working Ray K8s node provider based on SSH * wip * working provisioning with SkyPilot and ssh config * working provisioning with SkyPilot and ssh config * Updates to master * ray2.3 * Clean up docs * multiarch build * hacking around ray start * more port fixes * fix up default instance selection * fix resource selection * Add provisioning timeout by checking if pods are ready * Working mounting * Remove catalog * fixes * fixes * Fix ssh-key auth to create unique secrets * Fix for ContainerCreating timeout * Fix head node ssh port caching * mypy * lint * fix ports * typo * cleanup * cleanup * wip * Update setup * readme updates * lint * Fix failover * Fix failover * optimize setup * Fix sync down logs for k8s * test wip * instance name parsing wip * Fix instance name parsing * Merge fixes for query_status * [k8s_cloud] Delete k8s service resources. (#2105) Delete k8s service resources. - 'sky down' for Kubernetes cloud to remove cluster service resources. * Status refresh WIP * refactor to kubernetes adaptor * tests wip * clean up auth * wip tests * cli * cli * sky local up/down cli * cli * lint * lint * lint * Speed up kind cluster creation * tests * lint * tests * handling for non-reachable clusters * Invalid kubeconfig handling * Timeout for sky check * code cleanup * lint * Do not raise error if GPUs requested, return empty list * Address comments * comments * lint * Remove public key upload * add shebang * comments * change permissions * remove chmod * merge 2241 * add todo * Handle kube config management for sky local commands (#2253) * Set current-context (if availablee) after sky local down and remove incorrect prompt in sky local up * Warn user of kubeconfig context switch during sky local up * Use Optional instead of Union * Switch context in create_cluster if cluster already exists. * fix typo * update sky check error msg after sky local down * lint * update timeout check * fix import error * Fix kube API access from within cluster (load_incluster_auth) * lint * lint * working autodown and sky status -r * lint * add test_kubernetes_autodown * lint * address comments * address comments * lint * deletion timeouts wip * [k8s_cloud] Ray pod not created under current context namespace. (#2302) 'namespace' exists under 'context' key. * head ssh port namespace fix * [k8s-cloud] Typo in sky local --help. (#2308) Typo. * [k8s-cloud] Set build_image.sh to be executable. (#2307) * Set build_image.sh to be executable. * Use TAG to easily switch between registries. * remove ingress * remove debug statements * UX and readme updates * lint * fix logging for 409 retry * lint * lint * comments * remove k8s from default clouds to run --------- Co-authored-by: Avi Weit <[email protected]> Co-authored-by: Hemil Desai <[email protected]>
- Loading branch information
1 parent
4d51a89
commit 4045cf3
Showing
37 changed files
with
3,000 additions
and
63 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
FROM continuumio/miniconda3:22.11.1 | ||
|
||
# TODO(romilb): Investigate if this image can be consolidated with the skypilot | ||
# client image (`Dockerfile`) | ||
|
||
# Initialize conda for root user, install ssh and other local dependencies | ||
RUN apt update -y && \ | ||
apt install gcc rsync sudo patch openssh-server pciutils nano fuse -y && \ | ||
rm -rf /var/lib/apt/lists/* && \ | ||
apt remove -y python3 && \ | ||
conda init | ||
|
||
# Setup SSH and generate hostkeys | ||
RUN mkdir -p /var/run/sshd && \ | ||
sed -i 's/PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config && \ | ||
sed 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' -i /etc/pam.d/sshd && \ | ||
cd /etc/ssh/ && \ | ||
ssh-keygen -A | ||
|
||
# Setup new user named sky and add to sudoers. Also add /opt/conda/bin to sudo path. | ||
RUN useradd -m -s /bin/bash sky && \ | ||
echo "sky ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers && \ | ||
echo 'Defaults secure_path="/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"' > /etc/sudoers.d/sky | ||
|
||
# Switch to sky user | ||
USER sky | ||
|
||
# Install SkyPilot pip dependencies | ||
RUN pip install wheel Click colorama cryptography jinja2 jsonschema && \ | ||
pip install networkx oauth2client pandas pendulum PrettyTable && \ | ||
pip install ray==2.4.0 rich tabulate filelock && \ | ||
pip install packaging 'protobuf<4.0.0' pulp && \ | ||
pip install awscli boto3 pycryptodome==3.12.0 && \ | ||
pip install docker kubernetes | ||
|
||
# Add /home/sky/.local/bin/ to PATH | ||
RUN echo 'export PATH="$PATH:$HOME/.local/bin"' >> ~/.bashrc | ||
|
||
# Install SkyPilot. This is purposely separate from installing SkyPilot | ||
# dependencies to optimize rebuild time | ||
COPY --chown=sky . /skypilot/sky/ | ||
|
||
# TODO(romilb): Installing SkyPilot may not be necessary since ray up will do it | ||
RUN cd /skypilot/ && \ | ||
sudo mv -v sky/setup_files/* . && \ | ||
pip install ".[aws]" | ||
|
||
# Set WORKDIR and initialize conda for sky user | ||
WORKDIR /home/sky | ||
RUN conda init |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,140 @@ | ||
"""Kubernetes adaptors""" | ||
|
||
# pylint: disable=import-outside-toplevel | ||
|
||
import functools | ||
import os | ||
|
||
from sky.utils import ux_utils, env_options | ||
|
||
kubernetes = None | ||
urllib3 = None | ||
|
||
_configured = False | ||
_core_api = None | ||
_auth_api = None | ||
_networking_api = None | ||
_custom_objects_api = None | ||
|
||
# Timeout to use for API calls | ||
API_TIMEOUT = 5 | ||
|
||
|
||
def import_package(func): | ||
|
||
@functools.wraps(func) | ||
def wrapper(*args, **kwargs): | ||
global kubernetes | ||
global urllib3 | ||
if kubernetes is None: | ||
try: | ||
import kubernetes as _kubernetes | ||
import urllib3 as _urllib3 | ||
except ImportError: | ||
# TODO(romilb): Update this message to point to installation | ||
# docs when they are ready. | ||
raise ImportError('Fail to import dependencies for Kubernetes. ' | ||
'Run `pip install kubernetes` to ' | ||
'install them.') from None | ||
kubernetes = _kubernetes | ||
urllib3 = _urllib3 | ||
return func(*args, **kwargs) | ||
|
||
return wrapper | ||
|
||
|
||
@import_package | ||
def get_kubernetes(): | ||
return kubernetes | ||
|
||
|
||
@import_package | ||
def _load_config(): | ||
global _configured | ||
if _configured: | ||
return | ||
try: | ||
# Load in-cluster config if running in a pod | ||
# Kubernetes set environment variables for service discovery do not | ||
# show up in SkyPilot tasks. For now, we work around by using | ||
# DNS name instead of environment variables. | ||
# See issue: https://github.com/skypilot-org/skypilot/issues/2287 | ||
os.environ['KUBERNETES_SERVICE_HOST'] = 'kubernetes.default.svc' | ||
os.environ['KUBERNETES_SERVICE_PORT'] = '443' | ||
kubernetes.config.load_incluster_config() | ||
except kubernetes.config.config_exception.ConfigException: | ||
try: | ||
kubernetes.config.load_kube_config() | ||
except kubernetes.config.config_exception.ConfigException as e: | ||
suffix = '' | ||
if env_options.Options.SHOW_DEBUG_INFO.get(): | ||
suffix += f' Error: {str(e)}' | ||
# Check if exception was due to no current-context | ||
if 'Expected key current-context' in str(e): | ||
err_str = ('Failed to load Kubernetes configuration. ' | ||
'Kubeconfig does not contain any valid context(s).' | ||
f'{suffix}\n' | ||
' If you were running a local Kubernetes ' | ||
'cluster, run `sky local up` to start the cluster.') | ||
else: | ||
err_str = ( | ||
'Failed to load Kubernetes configuration. ' | ||
f'Please check if your kubeconfig file is valid.{suffix}') | ||
with ux_utils.print_exception_no_traceback(): | ||
raise ValueError(err_str) from None | ||
_configured = True | ||
|
||
|
||
@import_package | ||
def core_api(): | ||
global _core_api | ||
if _core_api is None: | ||
_load_config() | ||
_core_api = kubernetes.client.CoreV1Api() | ||
|
||
return _core_api | ||
|
||
|
||
@import_package | ||
def auth_api(): | ||
global _auth_api | ||
if _auth_api is None: | ||
_load_config() | ||
_auth_api = kubernetes.client.RbacAuthorizationV1Api() | ||
|
||
return _auth_api | ||
|
||
|
||
@import_package | ||
def networking_api(): | ||
global _networking_api | ||
if _networking_api is None: | ||
_load_config() | ||
_networking_api = kubernetes.client.NetworkingV1Api() | ||
|
||
return _networking_api | ||
|
||
|
||
@import_package | ||
def custom_objects_api(): | ||
global _custom_objects_api | ||
if _custom_objects_api is None: | ||
_load_config() | ||
_custom_objects_api = kubernetes.client.CustomObjectsApi() | ||
|
||
return _custom_objects_api | ||
|
||
|
||
@import_package | ||
def api_exception(): | ||
return kubernetes.client.rest.ApiException | ||
|
||
|
||
@import_package | ||
def config_exception(): | ||
return kubernetes.config.config_exception.ConfigException | ||
|
||
|
||
@import_package | ||
def max_retry_error(): | ||
return urllib3.exceptions.MaxRetryError |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.