ray-project · DmitriGekhtman · Dec 8, 2022 · Nov 21, 2022 · Nov 21, 2022 · Nov 21, 2022
diff --git a/docs/guidance/pod-security.md b/docs/guidance/pod-security.md
@@ -0,0 +1,121 @@
+# Pod Security
+
+Kubernetes defines three different Pod Security Standards, including `privileged`, `baseline`, and `restricted`, to broadly
+cover the security spectrum. The `privileged` standard allows users to do known privilege escalations, and thus it is not 
+safe enough for security-critical applications.
+
+This document describes how to configure RayCluster YAML file to apply `restricted` Pod security standard. The following 
+references can help you understand this document better:
+
+* [Kubernetes - Pod Security Standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/#restricted)
+* [Kubernetes - Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/)
+* [Kubernetes - Auditing](https://kubernetes.io/docs/tasks/debug/debug-cluster/audit/)
+* [KinD - Auditing](https://kind.sigs.k8s.io/docs/user/auditing/)
+
+# Step 1: Create a KinD cluster
+```bash
+# Path: ray-operator/config/security
+kind create cluster --config kind-config.yaml --image=kindest/node:v1.24.0
+```
+The `kind-config.yaml` enables audit logging with the audit policy defined in `audit-policy.yaml`. The `audit-policy.yaml`
+defines an auditing policy to listen to the Pod events in the namespace `pod-security`. With this policy, we can check
+whether our Pods violate the policies in `restricted` standard or not.
+
+The feature [Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/) is firstly 
+introduced in Kubernetes v1.22 (alpha) and becomes stable in Kubernetes v1.25. In addition, KubeRay currently supports 
+Kubernetes from v1.19 to v1.24. (At the time of writing, we have not tested KubeRay with Kubernetes v1.25). Hence, I use **Kubernetes v1.24** in this step.
+
+# Step 2: Check the audit logs
+```bash
+docker exec kind-control-plane cat /var/log/kubernetes/kube-apiserver-audit.log
+```
+The log should be empty because the namespace `pod-security` does not exist.
+
+# Step 3: Create the `pod-security` namespace
+```bash
+kubectl create ns pod-security
+kubectl label --overwrite ns pod-security \
+  pod-security.kubernetes.io/warn=restricted \
+  pod-security.kubernetes.io/warn-version=latest \
+  pod-security.kubernetes.io/audit=restricted \
+  pod-security.kubernetes.io/audit-version=latest \
+  pod-security.kubernetes.io/enforce=restricted \
+  pod-security.kubernetes.io/enforce-version=latest
+```
+With the `pod-security.kubernetes.io` labels, the built-in Kubernetes Pod security admission controller will apply the 
+`restricted` Pod security standard to all Pods in the namespace `pod-security`. The label
+`pod-security.kubernetes.io/enforce=restricted` means that the Pod will be rejected if it violate the policies defined in 
+`restricted` security standard. See [Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/) for more details about the labels.
+
+# Step 4: Install the KubeRay operator
+```bash
+# Update the field securityContext in helm-chart/kuberay-operator/values.yaml
+securityContext:
+  allowPrivilegeEscalation: false
+  capabilities:
+    drop: ["ALL"]
+  runAsNonRoot: true
+  seccompProfile:
+    type: RuntimeDefault
+
+# Path: helm-chart/kuberay-operator
+helm install -n pod-security kuberay-operator .
+```
+
+# Step 5: Create a RayCluster (Choose either Step 5.1 or Step 5.2)
+* If you choose Step 5.1, no Pod will be created in the namespace `pod-security`.
+* If you choose Step 5.2, Pods can be created successfully.
+
+## Step 5.1: Create a RayCluster without proper `securityContext` configurations
+```bash
+# Path: ray-operator/config/samples
+kubectl apply -n pod-security -f ray-cluster.complete.yaml
+
+# Wait 20 seconds and check audit logs for the error messages.
+docker exec kind-control-plane cat /var/log/kubernetes/kube-apiserver-audit.log
+
+# Example error messagess
+# "pods \"raycluster-complete-head-fkbf5\" is forbidden: violates PodSecurity \"restricted:latest\": allowPrivilegeEscalation != false (container \"ray-head\" must set securityContext.allowPrivilegeEscalation=false) ...
+
+kubectl get pod -n pod-security
+# NAME                               READY   STATUS    RESTARTS   AGE
+# kuberay-operator-8b6d55dbb-t8msf   1/1     Running   0          62s
+
+# Clean up the RayCluster
+kubectl delete rayclusters.ray.io -n pod-security raycluster-complete
+# raycluster.ray.io "raycluster-complete" deleted
+```
+No Pod will be created in the namespace `pod-security`, and check audit logs for error messages.
+
+## Step 5.2: Create a RayCluster with proper `securityContext` configurations
+```bash
+# Path: ray-operator/config/security
+kubectl apply -n pod-security -f ray-cluster.pod-security.yaml
+
+# Wait for the RayCluster convergence and check audit logs for the messages.
+docker exec kind-control-plane cat /var/log/kubernetes/kube-apiserver-audit.log
+
+# Forward the dashboard port
+kubectl port-forward --address 0.0.0.0 svc/raycluster-pod-security-head-svc -n pod-security 8265:8265
+
+# Log in to the head Pod
+kubectl exec -it -n pod-security ${YOUR_HEAD_POD} -- bash
+
+# (Head Pod) Run a sample job in the Pod
+python3 samples/xgboost_example.py
+
+# Check the job status in the dashboard on your browser.
+# http://127.0.0.1:8265/#/job => The job status should be "SUCCEEDED".
+
+# (Head Pod) Make sure Python dependencies can be installed under `restricted` security standard 
+pip3 install jsonpatch
+echo $? # Check the exit code of `pip3 install jsonpatch`. It should be 0.
+
+# Clean up the RayCluster
+kubectl delete -n pod-security -f ray-cluster.pod-security.yaml
+# raycluster.ray.io "raycluster-pod-security" deleted
+# configmap "xgboost-example" deleted
+```
+One head Pod and one worker Pod will be created as specified in `ray-cluster.pod-security.yaml`.
+First, we log in to the head Pod, run a XGBoost example script, and check the job
+status in the dashboard. Next, we use `pip` to install a Python dependency (i.e. `jsonpatch`), and the exit code of the `pip` command should be 0.
diff --git a/helm-chart/kuberay-operator/values.yaml b/helm-chart/kuberay-operator/values.yaml
@@ -58,3 +58,7 @@ rbacEnable: true
 
 batchScheduler:
   enabled: false
+
+# Set up `securityContext` to improve Pod security.
+# See https://github.com/ray-project/kuberay/blob/master/docs/guidance/pod-security.md for further guidance.
+securityContext: {}
diff --git a/ray-operator/config/samples/ray-cluster.complete.yaml b/ray-operator/config/samples/ray-cluster.complete.yaml
@@ -11,8 +11,7 @@ metadata:
   name: raycluster-complete
 spec:
   rayVersion: '2.1.0'
-  ######################headGroupSpec#################################
-  # Ray head pod template and specs
+  # Ray head pod configuration
   headGroupSpec:
     # Kubernetes Service Type, valid values are 'ClusterIP', 'NodePort' and 'LoadBalancer'
     serviceType: ClusterIP

diff --git a/ray-operator/config/security/audit-policy.yaml b/ray-operator/config/security/audit-policy.yaml
@@ -0,0 +1,15 @@
+apiVersion: audit.k8s.io/v1 # This is required.
+kind: Policy
+# Don't generate audit events for all requests in RequestReceived stage.
+omitStages:
+  - "RequestReceived"
+rules:
+  # Log pod changes at RequestResponse level
+  - level: Metadata
+    resources:
+    - group: ""
+      # Resource "pods" doesn't match requests to any subresource of pods,
+      # which is consistent with the RBAC policy.
+      resources: ["pods"]
+    # This rule only applies to resources in the "pod-security" namespace.
+    namespaces: ["pod-security"]
diff --git a/ray-operator/config/security/kind-config.yaml b/ray-operator/config/security/kind-config.yaml
@@ -0,0 +1,29 @@
+kind: Cluster
+apiVersion: kind.x-k8s.io/v1alpha4
+nodes:
+- role: control-plane
+  kubeadmConfigPatches:
+  - |
+    kind: ClusterConfiguration
+    apiServer:
+        # enable auditing flags on the API server
+        extraArgs:
+          audit-log-path: /var/log/kubernetes/kube-apiserver-audit.log
+          audit-policy-file: /etc/kubernetes/policies/audit-policy.yaml
+        # mount new files / directories on the control plane
+        extraVolumes:
+          - name: audit-policies
+            hostPath: /etc/kubernetes/policies
+            mountPath: /etc/kubernetes/policies
+            readOnly: true
+            pathType: "DirectoryOrCreate"
+          - name: "audit-logs"
+            hostPath: "/var/log/kubernetes"
+            mountPath: "/var/log/kubernetes"
+            readOnly: false
+            pathType: DirectoryOrCreate
+  # mount the local file on the control plane
+  extraMounts:
+  - hostPath: ./audit-policy.yaml
+    containerPath: /etc/kubernetes/policies/audit-policy.yaml
+    readOnly: true
diff --git a/ray-operator/config/security/ray-cluster.pod-security.yaml b/ray-operator/config/security/ray-cluster.pod-security.yaml
@@ -0,0 +1,175 @@
+# The resource requests and limits in this config are too small for production!
+# For examples with more realistic resource configuration, see
+# ray-cluster.complete.large.yaml and
+# ray-cluster.autoscaler.large.yaml.
+apiVersion: ray.io/v1alpha1
+kind: RayCluster
+metadata:
+  labels:
+    controller-tools.k8s.io: "1.0"
+    # A unique identifier for the head node and workers of this cluster.
+  name: raycluster-pod-security
+spec:
+  rayVersion: '2.1.0'
+  # Ray head pod configuration
+  headGroupSpec:
+    # Kubernetes Service Type, valid values are 'ClusterIP', 'NodePort' and 'LoadBalancer'
+    serviceType: ClusterIP
+    # for the head group, replicas should always be 1.
+    # headGroupSpec.replicas is deprecated in KubeRay >= 0.3.0.
+    replicas: 1
+    # the following params are used to complete the ray start: ray start --head --block --dashboard-host: '0.0.0.0' ...
+    rayStartParams:
+      dashboard-host: '0.0.0.0'
+      block: 'true'
+    #pod template
+    template:
+      spec:
+        containers:
+        - name: ray-head
+          image: rayproject/ray-ml:2.1.0
+          ports:
+          - containerPort: 6379
+            name: gcs
+          - containerPort: 8265
+            name: dashboard
+          - containerPort: 10001
+            name: client
+          lifecycle:
+            preStop:
+              exec:
+                command: ["/bin/sh","-c","ray stop"]
+          volumeMounts:
+            - mountPath: /tmp/ray
+              name: ray-logs
+            - mountPath: /home/ray/samples
+              name: ray-example-configmap
+          resources:
+            limits:
+              cpu: 1
+              memory: 2Gi
+            requests:
+              cpu: 1
+              memory: 2Gi
+          securityContext:
+            allowPrivilegeEscalation: false
+            capabilities:
+              drop: ["ALL"]
+            runAsNonRoot: true
+            seccompProfile:
+              type: RuntimeDefault
+        volumes:
+          - name: ray-logs
+            emptyDir: {}
+          - name: ray-example-configmap
+            configMap:
+              name: ray-example
+              # An array of keys from the ConfigMap to create as files
+              items:
+                - key: xgboost_example.py
+                  path: xgboost_example.py
+  workerGroupSpecs:
+  # the pod replicas in this group typed worker
+  - replicas: 1
+    minReplicas: 1
+    maxReplicas: 10
+    # logical group name, for this called large-group, also can be functional
+    groupName: large-group
+    # if worker pods need to be added, we can simply increment the replicas
+    # if worker pods need to be removed, we decrement the replicas, and populate the podsToDelete list
+    # the operator will remove pods from the list until the number of replicas is satisfied
+    # when a pod is confirmed to be deleted, its name will be removed from the list below
+    #scaleStrategy:
+    #  workersToDelete:
+    #  - raycluster-complete-worker-large-group-bdtwh
+    #  - raycluster-complete-worker-large-group-hv457
+    #  - raycluster-complete-worker-large-group-k8tj7 
+    # the following params are used to complete the ray start: ray start --block
+    rayStartParams:
+      block: 'true'
+    #pod template
+    template:
+      spec:
+        containers:
+        - name: ray-worker
+          image: rayproject/ray-ml:2.1.0
+          # environment variables to set in the container.Optional.
+          # Refer to https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/
+          lifecycle:
+            preStop:
+              exec:
+                command: ["/bin/sh","-c","ray stop"]
+          # use volumeMounts.Optional.
+          # Refer to https://kubernetes.io/docs/concepts/storage/volumes/
+          volumeMounts:
+            - mountPath: /tmp/ray
+              name: ray-logs
+          resources:
+            limits:
+              cpu: 4
+              memory: 2Gi
+            requests:
+              cpu: 1
+              memory: 2Gi
+          securityContext:
+            allowPrivilegeEscalation: false
+            capabilities:
+              drop: ["ALL"]
+            runAsNonRoot: true
+            seccompProfile:
+              type: RuntimeDefault
+        initContainers:
+        # the env var $RAY_IP is set by the operator if missing, with the value of the head service name
+        - name: init-myservice
+          image: busybox:1.28
+          # Change the cluster postfix if you don't have a default setting
+          command: ['sh', '-c', "until nslookup $RAY_IP.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"]
+          securityContext:
+            runAsUser: 1000
+            allowPrivilegeEscalation: false
+            capabilities:
+              drop: ["ALL"]
+            runAsNonRoot: true
+            seccompProfile:
+              type: RuntimeDefault
+        # use volumes
+        # Refer to https://kubernetes.io/docs/concepts/storage/volumes/
+        volumes:
+          - name: ray-logs
+            emptyDir: {}
+---
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: ray-example
+data:
+  xgboost_example.py: |
+    import ray
+    from ray.train.xgboost import XGBoostTrainer
+    from ray.air.config import ScalingConfig
+
+    # Load data.
+    dataset = ray.data.read_csv("s3://anonymous@air-example-data/breast_cancer.csv")
+
+    # Split data into train and validation.
+    train_dataset, valid_dataset = dataset.train_test_split(test_size=0.3)
+
+    trainer = XGBoostTrainer(
+        scaling_config=ScalingConfig(
+            # Number of workers to use for data parallelism.
+            num_workers=1,
+            # Whether to use GPU acceleration.
+            use_gpu=False,
+        ),
+        label_column="target",
+        num_boost_round=20,
+        params={
+            # XGBoost specific params
+            "objective": "binary:logistic",
+            # "tree_method": "gpu_hist",  # uncomment this to use GPU for training
+            "eval_metric": ["logloss", "error"],
+        },
+        datasets={"train": train_dataset, "valid": valid_dataset},
+    )
+    result = trainer.fit()
+    print(result.metrics)