Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposing Serve Service #1117

Merged
merged 16 commits into from
Jun 8, 2023
Merged

Conversation

kodwanis
Copy link
Contributor

@kodwanis kodwanis commented May 26, 2023

Why are these changes needed?

This PR exposes the serve service and all of its fields as part of the RayService CRD. This allow users to pass custom fields like labels, annotations and other fields to Serve Service.

This is similar to exposing head service to users addressed in #1040

Here's the definitions of the priorities:

Name: If it's specified in ServeService, override the default.

Namespace: ignore user-provided namespace in ServeService. Enforce it to be the same namespace as the RayCluster.

annotations field: user-provided fields are passed as is.

Selector field: Ignore user specified selector fields and keep the default fields

Labels field: default field values takes priority user specified fields. Other user specified fields passed are merged.

Ports field: Keeping the behavior consisted of adding only serve port. If serve port is defined in rayCluster, that takes priority and user specified ports are ignored. If serve port is not defined in rayCluster, user specified serve port is added. All ports other than serve port are ignored.

Type: If it's specified in ServeService, override the default.

Expected result

The serve-svc should have the service defined by the user. It should contain the user defined labels, annotations, Type, name (if passed) and "serve" port (if not defined in the cluster config).

Expected output for ray-service.custom-serve-service.yaml:

apiVersion: v1
kind: Service
metadata:
  creationTimestamp: 'XX'
  labels:
    ray.io/serve: custom-ray-serve-service-name-serve
    ray.io/service: custom-ray-serve-service-name
    custom-label: custom-ray-serve-service-label
  annotations:
    custom-annotation: custom-ray-serve-service-annotation
  name: custom-ray-serve-service-name-serve-svc
  namespace: default
  ownerReferences:
    - apiVersion: ray.io/v1alpha1
      blockOwnerDeletion: true
      controller: true
      kind: RayService
      name: custom-ray-serve-service-name-
      uid: xxx
  resourceVersion: '3265983567'
  uid: xxx
spec:
  clusterIP: 172.30.233.159
  clusterIPs:
    - 172.30.233.159
  internalTrafficPolicy: Cluster
  ipFamilies:
    - IPv4
  ipFamilyPolicy: SingleStack
  ports:
    - name: serve
      port: 8000
      protocol: TCP
      targetPort: 8000
  selector:
    ray.io/cluster: custom-ray-serve-service-name-serve-raycluster-xxxx
    ray.io/serve: 'true'
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer: {}

Related issue number

Closes #1034

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@kodwanis kodwanis marked this pull request as ready for review May 26, 2023 17:02
@architkulkarni architkulkarni self-assigned this May 30, 2023
Copy link
Contributor

@architkulkarni architkulkarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there's a fair amount of duplicated code and test code from #1040, is it possible to refactor it to avoid duplicating code as much as possible? For example, by pulling out common code into helper functions.

For the test code, I'm less sure, but I think it's possible to use "table-driven tests" to parametrize multiple tests that are very similar.

Will give this a detailed review as soon as I can.

@kodwanis
Copy link
Contributor Author

It looks like there's a fair amount of duplicated code and test code from #1040, is it possible to refactor it to avoid duplicating code as much as possible? For example, by pulling out common code into helper functions.

For the test code, I'm less sure, but I think it's possible to use "table-driven tests" to parametrize multiple tests that are very similar.

Will give this a detailed review as soon as I can.

@architkulkarni Added helper function to reduce the duplicated code both in the implementation and test files. Please take a look.

Signed-off-by: Siddharth Kodwani <[email protected]>
Signed-off-by: Siddharth Kodwani <[email protected]>
Copy link
Contributor

@architkulkarni architkulkarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactor, looks good! Just one nit about the warning but it shouldn't block the merge.

Before merging I would like to get @kevin85421's opinion on just one point, which is the handling of ports. @kevin85421 do you think it makes sense? I don't have any strong preferences here for how to handle them.

Comment on lines 177 to 178
// Add DeafultServePort if it is already not added and ignore any custom ports
// Keeping this consistentent with adding only serve port in serve service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Add DeafultServePort if it is already not added and ignore any custom ports
// Keeping this consistentent with adding only serve port in serve service
// Add DefaultServePort if it is already not added and ignore any custom ports
// Keeping this consistent with adding only serve port in serve service

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to print or log a warning when we see a custom port and ignore it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add the warning.

@architkulkarni
Copy link
Contributor

@kodwanis Sorry, do you mind fixing the last merge conflict?

Signed-off-by: Siddharth Kodwani <[email protected]>
@kodwanis
Copy link
Contributor Author

kodwanis commented Jun 5, 2023

@kodwanis Sorry, do you mind fixing the last merge conflict?

@architkulkarni resolved the conflict and added log statement for ports

kodwanis and others added 2 commits June 5, 2023 12:10
Signed-off-by: Siddharth Kodwani <[email protected]>
@kevin85421 kevin85421 self-assigned this Jun 5, 2023
Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding the expected result of ray-service.custom-serve-service.yaml in the PR description? I will clone this branch and manually test it. Thanks!

ray-operator/controllers/ray/common/service.go Outdated Show resolved Hide resolved
ray-operator/controllers/ray/common/service_test.go Outdated Show resolved Hide resolved
@kodwanis
Copy link
Contributor Author

kodwanis commented Jun 6, 2023

Would you mind adding the expected result of ray-service.custom-serve-service.yaml in the PR description? I will clone this branch and manually test it. Thanks!

@kevin85421 Updated the PR description with expected output for serve-svc

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others look good to me.

ray-operator/controllers/ray/common/service.go Outdated Show resolved Hide resolved
@kevin85421
Copy link
Member

I tested it manually, comparing the results with and without this PR. The upper one corresponds to the results with this PR applied, while the lower one represents the results without this PR. I will attempt to rerun the failed nightly compatibility tests. However, the nightly tests have become quite unstable recently, possibly due to breaking changes in the Ray master branch. If it still cannot pass the test, we may consider merging this PR without passing that test.

Screen Shot 2023-06-06 at 2 59 44 PM

@Jeffwan
Copy link
Collaborator

Jeffwan commented Jun 6, 2023

Hi, I suggest to hold this change and introduce to beta APIs. #1146 is added and we can grow new fields in latest API version and controller version.

@kodwanis
Copy link
Contributor Author

kodwanis commented Jun 6, 2023

Hi, I suggest to hold this change and introduce to beta APIs. #1146 is added and we can grow new fields in latest API version and controller version.

@Jeffwan This has been a critical blocker from adoption pov for months now and this is a must have to load balance the traffic for rayserve.

@kevin85421
Copy link
Member

Hi @Jeffwan, we may not update the version of RayService for v0.6.0. Instead, we will update the versions for RayJob and RayCluster in v0.6.0. I believe it is safe to merge this PR as it doesn't introduce any backward compatibility issues or cause significant delays for users. This PR is similar to Archit's PR #1040. Is it OK for you? Thanks!

@kevin85421
Copy link
Member

Experiments:

# Run test
RAY_IMAGE=rayproject/ray:nightly OPERATOR_IMAGE=controller:latest python3 tests/compatibility-test.py RayFTTestCase.test_ray_serve 2>&1 | tee log

# Check Ray commit (in head Pod)
python3 -c "import ray; print(ray.__commit__)"

@kodwanis
Copy link
Contributor Author

kodwanis commented Jun 7, 2023

Experiments:

# Run test
RAY_IMAGE=rayproject/ray:nightly OPERATOR_IMAGE=controller:latest python3 tests/compatibility-test.py RayFTTestCase.test_ray_serve 2>&1 | tee log

# Check Ray commit (in head Pod)
python3 -c "import ray; print(ray.__commit__)"

Thanks for testing it out manually.

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why does it consistently fails on GitHub Actions, but it can pass on my devbox. We need to prioritize #1058.

Screen Shot 2023-06-07 at 5 10 25 PM

Experiments:

# Run test
RAY_IMAGE=rayproject/ray:nightly OPERATOR_IMAGE=controller:latest python3 tests/compatibility-test.py RayFTTestCase.test_ray_serve 2>&1 | tee log

# Check Ray commit (in head Pod)
python3 -c "import ray; print(ray.__commit__)"

@kevin85421
Copy link
Member

Merge this PR before passing the nightly compatibility test. The failing has no relationship with this PR. See #1117 (review) for more details.

@kevin85421 kevin85421 merged commit e430a93 into ray-project:master Jun 8, 2023
lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Expose Serve service annotations
4 participants