Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docker] (Kubeflow integration) Only UID=1000 has the write access of /home/ray in Ray images #30959

Closed
kevin85421 opened this issue Dec 8, 2022 · 13 comments · Fixed by #31563
Assignees
Labels
enhancement Request for new feature and/or capability

Comments

@kevin85421
Copy link
Member

Description

This issue is a part of the integration between KubeRay and Kubeflow. See ray-project/kuberay#750 (comment) for some context.

In KinD (a Kubernetes distribution) cluster, when we use the command kubectl exec .. to log in to a ray Pod, the UID will be the same as $RAY_UID (i.e. 1000) in base-deps/Dockerfile.

ARG RAY_UID=1000
ARG RAY_GID=100
RUN apt-get update -y \
&& apt-get install -y sudo tzdata \
&& useradd -ms /bin/bash -d /home/ray ray --uid $RAY_UID --gid $RAY_GID \
&& usermod -aG sudo ray \
&& echo 'ray ALL=NOPASSWD: ALL' >> /etc/sudoers \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
USER $RAY_UID

For OpenShift (a Kubernetes distribution), a random non-root UID will be used when we log in to a ray Pod. However, only UID=1000 has the write access of /home/ray. Therefore, the error message of Permission denied will be reported. As follows, only ray (UID = 1000) has rwx (read, write, execute) access to /home/ray. Others only have r-x (read & execute) access to /home/ray.

> ls -l /home/
drwxr-xr-x 1 ray users 4096 Dec  7 17:18 ray

To reproduce it, we can follow instructions in pod-security.md, and add runAsUser and runAsGroup to the securityContext of ray-head in ray-cluster.pod-security.yaml.

 securityContext:
    runAsUser: 1001190000
    runAsGroup: 0
    allowPrivilegeEscalation: false
    capabilities:
      drop: ["ALL"]
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault 

After the RayCluster is ready, use kubectl exec ... to log in to the head Pod. Next, execute the following commands.

(base) I have no name!@raycluster-pod-security-head-nlfmj:~$ id
uid=1001190000 gid=0(root) groups=0(root)
(base) I have no name!@raycluster-pod-security-head-nlfmj:~$ pwd
/home/ray
(base) I have no name!@raycluster-pod-security-head-nlfmj:~$ touch 123
touch: cannot touch '123': Permission denied

Use case

This is a part of the integration between KubeRay and Kubeflow. See ray-project/kuberay#750 for some context.

@kevin85421 kevin85421 added the enhancement Request for new feature and/or capability label Dec 8, 2022
@kevin85421
Copy link
Member Author

@kevin85421
Copy link
Member Author

@DmitriGekhtman can you help tag the owner of Dockerfiles? I cannot find the owners of ray/docker in CODEOWNERS. If the owners do not have time to work on this issue, I can work on it. Thank you!

@kevin85421 kevin85421 changed the title [Docker] Only UID=1000 has the write access of /home/ray in Ray images [Docker] (Kubeflow integration) Only UID=1000 has the write access of /home/ray in Ray images Dec 8, 2022
@DmitriGekhtman
Copy link
Contributor

The Dockerfiles are arcane work of @ijrsvt :D
I imagine he might be a bit busy, but I'm sure he can answer any questions that come up.

@ijrsvt
Copy link
Contributor

ijrsvt commented Dec 8, 2022

@kevin85421 What exactly do you want to change with the base Docker images?

@kevin85421
Copy link
Member Author

@kevin85421 What exactly do you want to change with the base Docker images?

Thank @ijrsvt for your reply! As mentioned in the PR description, we want to open the write access of /home/ray for other UIDs (!= 1000). I am not a security expert, so I am not sure whether this change will cause any negative impact on security or not.

@ijrsvt
Copy link
Contributor

ijrsvt commented Dec 9, 2022

Ahh! Totally missed that. Can we change the permissions to: 775 (User: RWX, Group: RWX, Others: R_X)?

@kevin85421
Copy link
Member Author

Ahh! Totally missed that. Can we change the permissions to: 775 (User: RWX, Group: RWX, Others: R_X)?

"During the creation of a project or namespace, OpenShift assigns a User ID (UID) range, a supplemental group ID (GID) range, and unique SELinux MCS labels to the project or namespace. ... When a Pod is deployed into the namespace, by default, OpenShift will use the first UID and first GID from this range to run the Pod. Any attempt by a Pod definition to specify a UID outside the assigned range will fail and requires special privileges." (A Guide to OpenShift and UIDs, RedHat)

775 will not work because we cannot assume OpenShift will choose RAY_GID (i.e. 100) as its GID. I think we should use 777, but I am not sure whether 777 will cause any security concern or not.

@juliusvonkohout
Copy link

Amazing guys. i was busy with personal stuff but i will catch up again. write me on slack if you need help

@juliusvonkohout
Copy link

@kevin85421 @DmitriGekhtman

Openshift uses GID 0 by default so definitely not 100. Yes 777 is the right way to support all Kubernetes distributions.
Everything in your container is executed as UID 1000 so far i guess. i do not see where the security problem is. For virtual machines with varying users and processes this this could be relevant, but not for properly written OCI containers with a single user. This is not only relevant for Openshift but also other enterprise security level clusters.

Is there anything else preventing you from moving forward?

@kevin85421
Copy link
Member Author

@kevin85421 @DmitriGekhtman

Openshift uses GID 0 by default so definitely not 100. Yes 777 is the right way to support all Kubernetes distributions. Everything in your container is executed as UID 1000 so far i guess. i do not see where the security problem is. For virtual machines with varying users and processes this this could be relevant, but not for properly written OCI containers with a single user. This is not only relevant for Openshift but also other enterprise security level clusters.

Is there anything else preventing you from moving forward?

Gentle ping @ijrsvt. Thank you!

@ijrsvt
Copy link
Contributor

ijrsvt commented Jan 5, 2023

@kevin85421 Yeah, I think we can modify it to be 777

@kevin85421
Copy link
Member Author

kevin85421 commented Jan 28, 2023

777 is too open for ssh, so #31563 may be reverted. See #32025 for more details. Any ideas for other solutions? cc @juliusvonkohout @ijrsvt

@kevin85421 kevin85421 reopened this Jan 28, 2023
@kevin85421
Copy link
Member Author

We decided to integrate Kubeflow without this update. (kubeflow/manifests#2383)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability
Projects
None yet
4 participants