Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci/gpu] Move ML dependency install to Dockerfile #21711

Merged
merged 7 commits into from
Feb 1, 2022
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .buildkite/Dockerfile.gpu
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,14 @@ ARG REMOTE_CACHE_URL
ARG BUILDKITE_PULL_REQUEST
ARG BUILDKITE_COMMIT
ARG BUILDKITE_PULL_REQUEST_BASE_BRANCH
ARG PYTHON=3.7

ENV DEBIAN_FRONTEND=noninteractive
ENV TZ=America/Los_Angeles

ENV BUILDKITE=true
ENV CI=true
ENV PYTHON=3.6
ENV PYTHON=$PYTHON
ENV RAY_USE_RANDOM_PORTS=1
ENV RAY_DEFAULT_BUILD=1
ENV RAY_INSTALL_JAVA=1
Expand Down Expand Up @@ -61,6 +62,8 @@ COPY . .
RUN ./ci/travis/ci.sh init
RUN bash --login -i ./ci/travis/ci.sh build

RUN bash --login -i ./ci/travis/install-dependencies.sh

# Run determine test to run
RUN bash --login -i -c "python ./ci/travis/determine_tests_to_run.py --output=json > affected_set.json"
RUN cat affected_set.json
12 changes: 5 additions & 7 deletions .buildkite/pipeline.gpu.large.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@
conditions: ["RAY_CI_TRAIN_AFFECTED"]
commands:
- cleanup() { if [ "${BUILDKITE_PULL_REQUEST}" = "false" ]; then ./ci/travis/upload_build_info.sh; fi }; trap cleanup EXIT
- TRAIN_TESTING=1 TUNE_TESTING=1 INSTALL_HOROVOD=1 ./ci/travis/install-dependencies.sh
- PYTHON=3.6 TRAIN_TESTING=1 TUNE_TESTING=1 INSTALL_HOROVOD=1 ./ci/travis/install-dependencies.sh
# Because Python version changed, we need to re-install Ray here
- rm -rf ./python/ray/thirdparty_files; rm -rf ./python/ray/pickle5_files; ./ci/travis/ci.sh build
Comment on lines +6 to +7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be implemented directly within install-dependencies.sh? Seems weird to have the option to specify a PYTHON version but also needing to do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many tests currently separately call ./ci/travis/ci.sh build. You are right that this is not ideal, but I would say let's keep it out of the scope of this PR and review this in the future when refactoring CI structure

- pip install -Ur ./python/requirements_ml_docker.txt
- ./ci/travis/env_info.sh
- bazel test --config=ci $(./scripts/bazel_export_options) --build_tests_only --test_tag_filters=gpu,gpu_only python/ray/train/...
Expand All @@ -11,9 +13,7 @@
conditions: ["RAY_CI_TRAIN_AFFECTED"]
commands:
- cleanup() { if [ "${BUILDKITE_PULL_REQUEST}" = "false" ]; then ./ci/travis/upload_build_info.sh; fi }; trap cleanup EXIT
- TRAIN_TESTING=1 DATA_PROCESSING_TESTING=1 PYTHON=3.7 ./ci/travis/install-dependencies.sh
# Because Python version changed, we need to re-install Ray here
- rm -rf ./python/ray/thirdparty_files; rm -rf ./python/ray/pickle5_files; ./ci/travis/ci.sh build
- PYTHON=3.7 TRAIN_TESTING=1 DATA_PROCESSING_TESTING=1 ./ci/travis/install-dependencies.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the default version is now 3.7, should PYTHON=3.7 be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep it here for verbosity

- pip install -Ur ./python/requirements_ml_docker.txt
- ./ci/travis/env_info.sh
- bazel test --config=ci $(./scripts/bazel_export_options) --build_tests_only --test_tag_filters=datasets_train doc/...
Expand All @@ -22,9 +22,7 @@
conditions: ["RAY_CI_RLLIB_AFFECTED"]
commands:
- cleanup() { if [ "${BUILDKITE_PULL_REQUEST}" = "false" ]; then ./ci/travis/upload_build_info.sh; fi }; trap cleanup EXIT
- RLLIB_TESTING=1 PYTHON=3.7 ./ci/travis/install-dependencies.sh
# Because Python version changed, we need to re-install Ray here
- rm -rf ./python/ray/thirdparty_files; rm -rf ./python/ray/pickle5_files; ./ci/travis/ci.sh build
- PYTHON=3.7 RLLIB_TESTING=1 ./ci/travis/install-dependencies.sh
- pip install -Ur ./python/requirements_ml_docker.txt
- ./ci/travis/env_info.sh
# --jobs 2 is necessary as we only need to have at least 2 gpus on the machine
Expand Down
5 changes: 1 addition & 4 deletions .buildkite/pipeline.gpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,7 @@
conditions: ["RAY_CI_RLLIB_AFFECTED"]
commands:
- cleanup() { if [ "${BUILDKITE_PULL_REQUEST}" = "false" ]; then ./ci/travis/upload_build_info.sh; fi }; trap cleanup EXIT
- RLLIB_TESTING=1 PYTHON=3.7 ./ci/travis/install-dependencies.sh
# Because Python version changed, we need to re-install Ray here
- rm -rf ./python/ray/thirdparty_files; rm -rf ./python/ray/pickle5_files; ./ci/travis/ci.sh build
- pip install -Ur ./python/requirements_ml_docker.txt
krfricke marked this conversation as resolved.
Show resolved Hide resolved
- PYTHON=3.7 RLLIB_TESTING=1 ./ci/travis/install-dependencies.sh
- ./ci/travis/env_info.sh
# --jobs 1 is necessary as we only have 1 GPU on the machine and running tests in parallel
# would cause timeouts as the other scripts would wait for the GPU to become available.
Expand Down