Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inject rocm runtimes to runtime-images folder #628

Merged

Conversation

atheo89
Copy link
Member

@atheo89 atheo89 commented Jul 19, 2024

Related to: https://issues.redhat.com/browse/RHOAIENG-9680
Depends on: openshift/release#54567

Description

Add rocm runtimes to runtime-images folder

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@atheo89 atheo89 changed the title [WIP] Add rocm runtimes to runtime-images folder [WIP] Inject rocm runtimes to runtime-images folder Jul 19, 2024
Copy link
Member

@jiridanek jiridanek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the other json files around, so I guess it should be correct

@atheo89
Copy link
Member Author

atheo89 commented Aug 12, 2024

Opened a fix PR on OCP CI to resolve the naming for rocm runtimes on 2024a build branch.
openshift/release#55441 Once this get merged I can proceed to fill the correct image hashes to this one.

@atheo89 atheo89 force-pushed the RHOAIENG-9680-inject-runtimes branch from 3cf53fc to d588eea Compare August 12, 2024 13:22
@atheo89 atheo89 changed the title [WIP] Inject rocm runtimes to runtime-images folder Inject rocm runtimes to runtime-images folder Aug 12, 2024
@atheo89
Copy link
Member Author

atheo89 commented Aug 12, 2024

This PR is ready for review

Copy link
Contributor

openshift-ci bot commented Aug 12, 2024

@atheo89: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/amd-runtimes-ubi9-e2e-tests 3cf53fc link true /test amd-runtimes-ubi9-e2e-tests
ci/prow/notebook-rocm-ubi9-python-3-9-pr-image-mirror 3cf53fc link true /test notebook-rocm-ubi9-python-3-9-pr-image-mirror
ci/prow/runtime-rocm-pytorch-ubi9-python-3-9-pr-image-mirror 3cf53fc link true /test runtime-rocm-pytorch-ubi9-python-3-9-pr-image-mirror
ci/prow/runtime-rocm-tensorflow-ubi9-python-3-9-pr-image-mirror 3cf53fc link true /test runtime-rocm-tensorflow-ubi9-python-3-9-pr-image-mirror
ci/prow/rocm-runtimes-ubi9-e2e-tests 3cf53fc link true /test rocm-runtimes-ubi9-e2e-tests
ci/prow/runtimes-ubi8-e2e-tests 3cf53fc link true /test runtimes-ubi8-e2e-tests
ci/prow/runtimes-ubi9-e2e-tests 3cf53fc link true /test runtimes-ubi9-e2e-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jiridanek
Copy link
Member

Habana seems to be having problems

Installing collected packages: typing-extensions, triton, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, nvidia-cusparse-cu12, nvidia-cudnn-cu12, lightning-utilities, nvidia-cusolver-cu12, lightning-habana, torch, torchmetrics, pytorch-lightning, lightning
  Attempting uninstall: typing-extensions
    Found existing installation: typing_extensions 4.5.0
    Uninstalling typing_extensions-4.5.0:
      Successfully uninstalled typing_extensions-4.5.0
  Attempting uninstall: torch
    Found existing installation: torch 2.1.0a0+gitf8b6084
    Uninstalling torch-2.1.0a0+gitf8b6084:
      Successfully uninstalled torch-2.1.0a0+gitf8b6084
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-cpu 2.12.1 requires tensorboard<2.13,>=2.12, but you have tensorboard 2.11.2 which is incompatible.
tensorflow-cpu 2.12.1 requires typing-extensions<4.6.0,>=3.6.6, but you have typing-extensions 4.12.2 which is incompatible.
kfp 2.7.0 requires protobuf<5,>=4.21.1, but you have protobuf 3.20.3 which is incompatible.
kfp-kubernetes 1.2.0 requires protobuf<5,>=4.21.1, but you have protobuf 3.20.3 which is incompatible.

I know that's not caused by this change, but it's a problem nonetheless https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/opendatahub-io_notebooks/628/pull-ci-opendatahub-io-notebooks-main-images/1822987134265462784

@jstourac
Copy link
Member

/lgtm

Copy link
Member

@harshad16 harshad16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

thanks for the work 💯

Copy link
Contributor

openshift-ci bot commented Aug 13, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: harshad16

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@harshad16
Copy link
Member

/override ci/prow/images
/override ci/prow/notebooks-ubi9-e2e-tests
/override ci/prow/rocm-notebooks-e2e-tests

Copy link
Contributor

openshift-ci bot commented Aug 13, 2024

@harshad16: Overrode contexts on behalf of harshad16: ci/prow/images, ci/prow/notebooks-ubi9-e2e-tests, ci/prow/rocm-notebooks-e2e-tests

In response to this:

/override ci/prow/images
/override ci/prow/notebooks-ubi9-e2e-tests
/override ci/prow/rocm-notebooks-e2e-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-bot openshift-merge-bot bot merged commit 38a8d17 into opendatahub-io:main Aug 13, 2024
18 checks passed
@atheo89 atheo89 deleted the RHOAIENG-9680-inject-runtimes branch October 23, 2024 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants