-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ray restricted podsecuritystandards for enterprise security and Kubeflow integration #29665
Comments
This part is beyond near-term scope; for now, "Ray node == K8s pod" is the only architecture we can feasibly maintain.
This the Ray team must make sure to nail down. Adding the experts from the Ray team to comment on secure image building. |
Is there a scanner/tool we can use to check (and therefore plan out to fix) the gap in our current docker images? |
Yes let's start with https://kubernetes.io/docs/concepts/security/pod-security-admission/ use a modern cluster 1.24+ with that enabled and you can audit any violations. 1.24-1.25 is what kubeflow 1.7 will require and it supports the podadmission controller. Here is an example, but with the wrong profile (baseline instead of restricted) https://kubernetes.io/docs/tasks/configure-pod-container/enforce-standards-namespace-labels/ |
You need to add USER 1000:0 at the end of your Dockerfiles. Then run your cicd test and I bet you will already get some error messages regarding file permissions. I think that is the first step and afterwards I would focus on PSS |
I believe we already met the requirements regarding uid, gid, ports, etc... ray/docker/base-deps/Dockerfile Lines 16 to 17 in 69b3e3c
We can go test against a modern cluster and audit violations. I think there is a misunderstanding here that Ray, on its critical path, and the default configurations, do not run as root at all and do not use podman or any other container tools. The snippet you linked in thread is referring to an experimental feature that has not been recommended to any user to try out yet. ray/python/ray/_private/runtime_env/container.py Lines 26 to 37 in 3e357c8
This means Ray deployments on K8s should not require any privileges. |
@simon-mo that sounds amazing. Yes, please test it in enforcing mode with a some workloads and then you can also advertise in the main GitHub readme, that it runs according to kubernetes restricted PSS. This goes for the operator as well as the cluster. If that is solved the integration into Kubeflow is rather straightforward, just some boring controller and UI writing, RBAC Rules, Istio polices etc. for automation. This pod security part was worrying me the most ;-) regarding the Notebook (will be renamed to workbenches) integration we will also find something simple. |
We can use this workload https://docs.ray.io/en/master/cluster/kubernetes/examples/ml-example.html as the example. |
I will take a look at this issue. Action items: (1) Run KubeRay on Kubernetes v1.25 (KubeRay supports v1.19 - v1.24). By the way, Pod Security Admission becomes the stable state in Kubernetes v1.25. (2) Create a namespace with the label (3) Check with a pod that uses sys_admin xor root that the policies are really enforced. (4) Deploy a RayCluster in that namespace. (5) Check whether some restricted policies are violated or not. (6) Run an example E2E workload (7) Check whether some restricted policies are violated or not. |
Alright now that #31563 is merged i think only ray-project/kuberay#866 is missing. We can then start the integration into Kubeflow on 30th of January if you have time then @kevin85421 |
We decided to integrate with Kubeflow without Docker image update from Ray. See kubeflow/manifests#2383 for more details. |
Description
@Jeffwan @DmitriGekhtman related to kubeflow/kubeflow#6680 and ray-project/kuberay#502
You can build the OCI images however you want, you just need to adhere to the official safe Kubernetes "Restricted PSP" standard https://kubernetes.io/docs/concepts/security/pod-security-standards/#restricted .
If you have a good architecture, just enforcing the restricted PSP on your namespace should be enough. You can also set the pod securitycontext manually to the values from the restricted set. The most important ones are to block anything that starts with host or privilege escalation and runasroot. If your images crash then please check whether you forgot to set proper file permissions on the working directories. If you need help on that level feel free to reach out on the Kubeflow slack or LinkedIn. I can also provide you Podsecuritypolicies if your clusters are below 1.23.
To properly build your images there are thousand of guides out there, but the most common stuff is :
For example use USER 1000:0 at the end of the Dockerfiles to build the OCI image. Make sure working directories are created with 777 file permissions and of course do not use and drop all capabilities such as SYS_ADMIN, SYS_CHROOT (done by the restricted PSP) etc. Use proper networking ports above 1024, do not use insecure setuid or setuid binaries and so on, just the same stuff that you would do for any proper Linux userspace application.
As long as all pods (Worker/head/raylet) run with the restricted PSS that is fine. This is needed to prevent that the cluster can be more easily hacked as described in the kubernetes PSS documentation linked above. Adding isolation within the Worker/Raylet pod is not essential (but still desired in general) for kubeflow integration, since the users will only have access to their own on-demand created kuberay clusters in their own namespaces. They can damage their clusters in their own namespace anyway if they wish to do so.
This can of course happen in parallel with the other integration tasks.
So far i only checked the kuberay implementation here #14077 (comment) Please point me to the the other implementation that you want to use instead and the Dockerfiles for the corresponding OCI images.
Use case
Official Kubernetes Enterprise security standards and integration with Kubeflow.
The text was updated successfully, but these errors were encountered: