Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RayCluster created by RayService set death info env for ray container #419

Merged
merged 3 commits into from
Jul 26, 2022

Conversation

brucez-anyscale
Copy link
Contributor

@brucez-anyscale brucez-anyscale commented Jul 26, 2022

Why are these changes needed?

For Ray Serve, if we set the death info time as 0, it will increase SLA. This pr sets the env for RayService create Ray containers.

spec:
  containers:
  - args:
    - 'ulimit -n 65536; ray start --head  --num-cpus=2  --metrics-export-port=8080  --dashboard-agent-listen-port=52365  --memory=2147483648  --block  --dashboard-host=0.0.0.0  --node-ip-address=$MY_POD_IP  --object-store-memory=100000000  --port=6379 '
    command:
    - /bin/bash
    - -c
    - --
    env:
    - name: MY_POD_IP
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.podIP
    - name: RAY_IP
      value: 127.0.0.1
    - name: RAY_PORT
      value: "6379"
    - name: RAY_timeout_ms_task_wait_for_death_info
      value: "0"
    - name: RAY_ADDRESS
      value: 127.0.0.1:6379

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@@ -513,6 +513,13 @@ func setContainerEnvVars(pod *v1.Pod, rayContainerIndex int, rayNodeType rayiov1
portEnv := v1.EnvVar{Name: RAY_PORT, Value: headPort}
container.Env = append(container.Env, portEnv)
}
if createdByRayService {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think thinking if there's better way to indicate a cluster is created by RayServe controller.

Can we leverage

KubernetesCreatedByLabelKey = "app.kubernetes.io/created-by"
and pass into the method.

This help us not expand method arguments if we need some customization on clusters created by RayJob etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg

@Jeffwan
Copy link
Collaborator

Jeffwan commented Jul 26, 2022

The change looks good to me. Let's merge it after test complete

@Jeffwan Jeffwan merged commit ce84f04 into master Jul 26, 2022
@Jeffwan Jeffwan deleted the brucez/addEnv branch July 26, 2022 22:29
@@ -513,6 +513,13 @@ func setContainerEnvVars(pod *v1.Pod, rayContainerIndex int, rayNodeType rayiov1
portEnv := v1.EnvVar{Name: RAY_PORT, Value: headPort}
container.Env = append(container.Env, portEnv)
}
if strings.ToLower(creator) == RayServiceCreatorLabelValue {
// Only add this env for Ray Service cluster to improve service SLA.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we only need this in Ray Service? what about other scenarios?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iycheng mentioned this will increase SLA for ray service.
But for ray job, it will ignore the error info which is not good for observability. So for now, we keep it only for RayService.

lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
…ray-project#419)

* RayCluster created by RayService set death info env for ray container

* update
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants