-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Nvidia GPU support to the buildah-remote task #1529
base: main
Are you sure you want to change the base?
Conversation
I left you a couple reviews. The buildah-remote tasks are generated by running /hack/generate-buildah-remote.sh in this repo. This ensures they stay consistent with the normal buildah task. You need to modify the main.go called by generate-buildah-remote.sh so that when you run it, it produces the same diff this PR has. Once you run it, the PR should have 3 changed files: the generate script, buildah-remote/0.1/buildah-remote.yaml and buildah-remote/0.2/buildah-remote.yaml. After that, you also need to run the /hack/generate-ta-tasks.sh. which will update 2 more files (trusted artifacts versions of the 2 buildah remote tasks). Summary: you will modify 1 file (main.go), run two generate commands and add all those changes here. |
9766338
to
9196222
Compare
9196222
to
c67d9dd
Compare
Thanks, @brianwcook! PTAL |
/ok-to-test |
c67d9dd
to
8c7d22d
Compare
@@ -445,6 +445,11 @@ spec: | |||
REMOTESSHEOF | |||
chmod +x scripts/script-build.sh | |||
|
|||
PODMAN_NVIDIA_ARGS=() | |||
if [[ "$PLATFORM" == "linux-g"* ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried in the past to not depend on the semantics of the PLATFORM parameter. @ifireball @mshaposhnik, what do you think?
The use of the PLATFORM parameter like this would fall in line with the functionality requested in https://issues.redhat.com/browse/KONFLUX-4073.
8c7d22d
to
5c798da
Compare
Could you describe what these changes do, how and why? The code change on its own doesn't give me much to work with |
@@ -445,6 +445,11 @@ spec: | |||
REMOTESSHEOF | |||
chmod +x scripts/script-build.sh | |||
|
|||
PODMAN_NVIDIA_ARGS=() | |||
if [[ "$PLATFORM" == "linux-g"* ]]; then | |||
PODMAN_NVIDIA_ARGS+=("--device nvidia.com/gpu=all" "--security-opt=label=disable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the implications of --security-opt=label=disable
?
I would also consider dropping this PR in favor of #1530, although that one seems like it potentially gives the user too much control |
The goal here is to allow Konflux builds to access Nvidia GPUs on machines so equipped. An example is running PyTorch during container build - https://github.com/openshift/lightspeed-rag-content/blob/main/Containerfile. This PR is a building block towards support of this scenario. The others are AWS instance type(s) in multi-platform controller https://github.com/redhat-appstudio/infra-deployments/blob/0b936310854c7b4031b967eda33ad8399f12da60/components/multi-platform-controller/production/common/host-config.yaml#L528 and an AMI with Nvidia drivers. This PR, for platforms that start with "linux-g", tells podman to pass though Nvidia GPU devices to the containers it runs.
Not too sure about it beyond the obvious, but this came from Nvidia docs https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/1.14.2/cdi-support.html Upd: attempted dropping |
5c798da
to
c390881
Compare
Added Nvidia GPU support to the buildah-remote task