Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty/missing Kubernetes securityContexts #258

Closed
eero-t opened this issue Jun 4, 2024 · 10 comments
Closed

Empty/missing Kubernetes securityContexts #258

eero-t opened this issue Jun 4, 2024 · 10 comments
Assignees

Comments

@eero-t
Copy link
Contributor

eero-t commented Jun 4, 2024

I would expect seeing pod container securityContexts like this:

securityContext:
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  seccompProfile:
    type: RuntimeDefault
  capabilities:
     drop: [ "ALL" ]

And runAsUser setting for something else than the default root [1].

However, securityContexts in this project are either not set, or empty:

$ git grep securityContext
CodeGen/kubernetes/manifests/gaudi/codegen.yaml:      securityContext: {}
CodeGen/kubernetes/manifests/gaudi/codegen.yaml:          securityContext: {}
CodeGen/kubernetes/manifests/gaudi/codegen.yaml:      securityContext: {}
CodeGen/kubernetes/manifests/gaudi/codegen.yaml:          securityContext: {}
CodeGen/kubernetes/manifests/gaudi/codegen.yaml:      securityContext: null
CodeGen/kubernetes/manifests/gaudi/codegen.yaml:          securityContext: null
CodeGen/kubernetes/manifests/xeon/codegen.yaml:      securityContext: {}
CodeGen/kubernetes/manifests/xeon/codegen.yaml:          securityContext: {}
CodeGen/kubernetes/manifests/xeon/codegen.yaml:      securityContext: {}
CodeGen/kubernetes/manifests/xeon/codegen.yaml:          securityContext: {}
CodeGen/kubernetes/manifests/xeon/codegen.yaml:      securityContext: null
CodeGen/kubernetes/manifests/xeon/codegen.yaml:          securityContext: null

For more info, see:

[1] At least for Xeon. Device access (e.g. for Gaudi) may require root user if container runtime is not properly configured: https://kubernetes.io/blog/2021/11/09/non-root-containers-and-devices/

@eero-t eero-t changed the title Empty Kubernetes securityContexts Empty/missing Kubernetes securityContexts Jun 4, 2024
@yongfengdu
Copy link
Collaborator

I'll look at this issue.
Could you provide more information about what issue the empty securityContexts cause?
I've verified this runs fine with Gaudi-device-plugin without any special priviledge settings, will learn more from your link.

@yongfengdu yongfengdu self-assigned this Jun 5, 2024
@eero-t
Copy link
Contributor Author

eero-t commented Jun 5, 2024

Could you provide more information about what issue the empty securityContexts cause?

Such pods cannot be run in clusters with more strict pod security policies (see the "pod-security-standards" link).

In general all unnecessary container privileges should be dropped to reduce likelihood of subverted containers being also able to take over their host. "Defense in depth" etc.

@yongfengdu yongfengdu assigned lianhao and unassigned yongfengdu Jun 6, 2024
@lianhao
Copy link
Collaborator

lianhao commented Jun 7, 2024

The manifest files are generated by helmchart from GenAIInfra repo. We'll figure out minimum privileges for the workload containers running successfully, and create PR there first, both for xeon case & gaudi xeon, then later populate them here

@lianhao
Copy link
Collaborator

lianhao commented Jun 21, 2024

setting runAsUser to non-root user will results the tgi pod(image: ghcr.io/huggingface/text-generation-inference:1.4) crash, the log shows something like: RuntimeError: cannot cache function 'create_fsm_info': no locator available for file '/opt/conda/lib/python3.10/site-packages/outlines/fsm/regex.py'

Will investigate more into this

@lianhao
Copy link
Collaborator

lianhao commented Jun 26, 2024

The pod security has been added into the helm chart(see PR opea-project/GenAIInfra#133). The manifests here are generated by helm chart from GenAIInfra repo. Currently, we're in the process of discussion how to generate and use those kind of manifest with GMC and is expecting quite major changes to the k8s manifest. So we'll defer the manifest update here until that is resolved. Please see issue opea-project/GenAIInfra#129 for details tracking.

@eero-t
Copy link
Contributor Author

eero-t commented Jun 26, 2024

Thanks, the merged PR looks good, but there are few things that could be improved:

  • /mnt is not a good host mount point. Dirs mounted from host should be very specific (e.g. /mnt/opea-models), not top level host directories (in worst case, /mnt could e.g. include remote home directory mount points or other security sensitive data)
  • Anything that does not write to disk, should use read-only root fs setting

@lianhao
Copy link
Collaborator

lianhao commented Jul 5, 2024

Thanks, the merged PR looks good, but there are few things that could be improved:

  • /mnt is not a good host mount point. Dirs mounted from host should be very specific (e.g. /mnt/opea-models), not top level host directories (in worst case, /mnt could e.g. include remote home directory mount points or other security sensitive data)
  • Anything that does not write to disk, should use read-only root fs setting

PR opea-project/GenAIInfra#153 should have resolved this

@eero-t
Copy link
Contributor Author

eero-t commented Jul 5, 2024

PR opea-project/GenAIInfra#153 should have resolved this

Yes, looks good! Any idea when those changes get also to this (GenAIExamples) repository?

@lianhao
Copy link
Collaborator

lianhao commented Jul 25, 2024

The newly added manifests for CodeGen/ChatQnA/DocSum/CodeTrans has security context setting now

@lianhao
Copy link
Collaborator

lianhao commented Aug 29, 2024

closed as CodeGen/ChatQnA/DocSum/CodeTrans has security context setting now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants