Skip to content

Commit

Permalink
[ci] Move to new hierarchical docker structure + pipeline (#28641)
Browse files Browse the repository at this point in the history
This PR moves our buildkite pipeline to a new hierarchical structure and will be used with the new buildkite pipeline.

When merging this PR, the old behavior will still work, i.e. the old pipeline is still in place.

After merging this PR, we can build the base images for the master branch, and then switch the CI pipelines to use the new build structure.

Once this switch has been done, the following files will be removed:

- `./buildkite/pipeline.yml` - this has been split into pipeline.test.yml and pipeline.build.yml
- `./buildkite/Dockerfile` - this has been moved (and split) to `./ci/docker/`
- `./buildkite/Dockerfile.gpu` - this has been moved (and split) to `./ci/docker/`


The new structure is as follow:

- `./ci/docker` contains hierarchical docker files that will be built by the pipeline.
- `Dockerfile.base_test` contains common dependencies
- `Dockerfile.base_build` inherits from it and adds build-specific dependencies, e.g. llvm, nvm, java
- `Dockerfile.base_ml` inherits from `base_test` and adds ML dependencies, e.g. torch, tensorflow
- `Dockerfile.base_gpu` depends on a cuda image and otherwise has the same contents as `base_test` and `base_ml` combined

In each build, we do the following

- `Dockerfile.build` is built on top of `Dockerfile.base_build`. Dependencies are re-installed, which is mostly a no-op (except if they changed from when the base image was built)
- `Dockerfile.test` is built on top of `Dockerfile.base_test`, and the extracted Ray installation from`Dockerfile.build` is injected
- The same is true respectively for `ml` and `gpu`.

The pipelines have been split, and a new attribute `NO_WHEELS_REQUIRED` is added, identifying tests that can be early-started. Early start means that the last available branch image is used and the current code revision is checked out upon it.

See https://github.com/ray-project/buildkite-ci-pipelines/ for the pipeline logic.

Additionally, this PR identified two CI regressions that haven't been caught previously, namely the minimal install tests that didn't properly install the respective Python versions, and some runtime environment tests that don't work with later Ray versions. These should be addressed separately and I'll create issues for them once this PR is merged.

Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Co-authored-by: Artur Niederfahrenhorst <[email protected]>
  • Loading branch information
krfricke and ArturNiederfahrenhorst authored Sep 22, 2022
1 parent a3c97b4 commit ee2a8da
Show file tree
Hide file tree
Showing 31 changed files with 1,162 additions and 157 deletions.
5 changes: 4 additions & 1 deletion .buildkite/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ ARG REMOTE_CACHE_URL
ARG BUILDKITE_PULL_REQUEST
ARG BUILDKITE_COMMIT
ARG BUILDKITE_PULL_REQUEST_BASE_BRANCH
ARG PYTHON=3.6
ARG PYTHON=3.7
ARG INSTALL_DEPENDENCIES

ENV DEBIAN_FRONTEND=noninteractive
Expand Down Expand Up @@ -51,6 +51,9 @@ ENV LC_ALL=en_US.utf8
ENV LANG=en_US.utf8
RUN echo "ulimit -c 0" >> /root/.bashrc

ENV BUILD=1
ENV DL=1

# Setup Bazel caches
RUN (echo "build --remote_cache=${REMOTE_CACHE_URL}" >> /root/.bazelrc); \
(if [ "${BUILDKITE_PULL_REQUEST}" != "false" ]; then (echo "build --remote_upload_local_results=false" >> /root/.bazelrc); fi); \
Expand Down
3 changes: 3 additions & 0 deletions .buildkite/Dockerfile.gpu
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,9 @@ ENV LC_ALL=en_US.utf8
ENV LANG=en_US.utf8
RUN echo "ulimit -c 0" >> /root/.bashrc

ENV BUILD=1
ENV DL=1

# Setup Bazel caches
RUN (echo "build --remote_cache=${REMOTE_CACHE_URL}" >> /root/.bazelrc); \
(if [ "${BUILDKITE_PULL_REQUEST}" != "false" ]; then (echo "build --remote_upload_local_results=false" >> /root/.bazelrc); fi); \
Expand Down
30 changes: 30 additions & 0 deletions .buildkite/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Buildkite pipelines

This directory contains buildkite pipelines used to start CI tests.

Each step contains a buildkite step that is parsed and executed according to the
[Buildkite pipeline specification](https://buildkite.com/docs/pipelines).

## Conditions

An extra optional field `conditions` is defined, which includes conditions under which tests are run.
The script `ci/pipeline/determine_tests_to_run.py` determines changed files in a PR and only kicks off
tests that include at least one of the conditions. If no condition is specified, the test is always run.

A special case is the `NO_WHEELS_REQUIRED` condition. If this is present, it indicates that the test can
be run with the latest available binaries - in this case the test can be started early, as it will re-use
the latest branch image and only check out the current code revision in the PR. This early kick off will
only trigger on PR builds, not on branch builds.

## Pipelines

This directory should be considered with respect to the docker images located in `ci/docker`.

- `pipeline.build.yml` contains jobs that require build dependencies. This includes all tests that re-build
Ray (e.g. when switching Python versions). The tests are run on the `build.Dockerfile` image.
- `pipeline.test.yml` contains jobs that only require an installed Ray and a small subset of dependencies,
notably exlcuding ML libraries such as Tensorflow or Torch. The tests are run on the `test.Dockerfile` image.
- `pipeline.ml.yml` contains jobs that require ML libraries Tensorflow and Torch to be available. The tests
are run on the `ml.Dockerfile` image.
- `pipeline.gpu.yml` contains jobs that require one GPU. The tests are run on the `gpu.Dockerfile` image.
- `pipeline.gpu.large.yml` contains jobs that require multi-GPUs (currently 4). The tests are run on the `gpu.Dockerfile` image.
Loading

0 comments on commit ee2a8da

Please sign in to comment.