Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Enable ResourceQuota by adding Resources for the health-check init container #1043

Merged

Conversation

kevin85421
Copy link
Member

@kevin85421 kevin85421 commented Apr 24, 2023

Why are these changes needed?

Flyte will create ResourceQuota for each Kubernetes namespace. To support ResourceQuota, each container in the namespace requires setting resource limits. However, the injected health-check init container does not specify the resource limits.

Why am I using the same resource configurations for the init container as those used in the Ray container?

In Kubernetes Pod, all init containers will be executed in a sequence, one by one, and each init container must complete successfully before the next one starts. On the other hands, when all init containers complete successfully, all application containers will start in parallel, and the startup order is arbitrary.

For example, we have 3 init containers and 3 application containers. The execution order will be:

init1 -> init2 -> init3 -> app1, app2, app3

The Pod's effective request/limit for a resource is the higher of:

  • the sum of all app containers request/limit for a resource
  • the effective init request/limit for a resource

Example: max(max(init1, init2, init3), sum(app1, app2, app3))

Hence, using the same resource configurations for the init container as those used in the Ray container will not waste resources.

Related issue number

Slack thread: https://ray-distributed.slack.com/archives/C02GFQ82JPM/p1681294121762329

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(
# Step 0: Build the Docker image for this PR
# Step 1: Install the KubeRay operator with this new Docker image
helm install kuberay-operator kuberay/kuberay-operator --version 0.5.0 --set image.repository=controller,image.tag=latest

# Step 2: Create a namespace flyte and ResourceQuota.
kubectl create ns flyte
kubectl apply -f resource_quota.yaml

# resource_quota.yaml
apiVersion: v1
kind: List
items:
- apiVersion: v1
  kind: ResourceQuota
  metadata:
    name: project-quota
    namespace: flyte
  spec:
    hard:
      limits.cpu: "8"
      limits.memory: 3000Mi

# Step 3: Install a RayCluster with resource limit/request for both head & worker groups in the `flyte` namespace.
# => Expected behavior: Both head & worker Pods should be created successfully.
helm install raycluster kuberay/ray-cluster --version 0.5.1 -n flyte
  • If you use helm install kuberay-operator kuberay/kuberay-operator --version 0.5.0 in Step 1, only the head Pod will be created in Step 3.

@kevin85421
Copy link
Member Author

@pingsutw would you mind testing this change with Flyte? Thanks!

@kevin85421 kevin85421 marked this pull request as ready for review April 24, 2023 18:09
Copy link
Contributor

@pingsutw pingsutw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested it. works for us, thanks

@kevin85421 kevin85421 merged commit 39562b5 into ray-project:master Apr 25, 2023
lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
…nit container (ray-project#1043)

Enable ResourceQuota by adding Resources for the health-check init container
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants