Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci/gpu] Move ML dependency install to Dockerfile #21711

Merged
merged 7 commits into from
Feb 1, 2022

Conversation

krfricke
Copy link
Contributor

Why are these changes needed?

Instead of installing dependencies in each Buildkite job, let's move this to the Dockerfile instead.
This will update GPU tests to always use Python 3.7.

Cc @richardliaw @amogkam @matthewdeng will updating to 3.7 be a problem? Most GPUs tests are already installing Python 3.7, the exception are the Train tests.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Copy link
Contributor

@matthewdeng matthewdeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3.7 should be fine for Train!

Comment on lines 66 to 67
RUN (if [ "${INSTALL_DEPENDENCIES}" = "ML" ]; then RLLIB_TESTING=1 TRAIN_TESTING=1 TUNE_TESTING=1 bash --login -i ./ci/travis/install-dependencies.sh; fi)
RUN (if [ "${INSTALL_DEPENDENCIES}" = "ML" ]; then pip install -Ur ./python/requirements_ml_docker.txt; fi)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm of the opinion that we should try to keep dependency installation to be the minimal set needed for each test suite.

If we always install the superset of dependencies, our tests may not be reflective of real user setups and may miss some cases. Though perhaps for GPU we do intentionally want to include at least the ML docker image dependencies.... thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of moving the installs into the Dockerfile is to speedup subsequent pipeline steps, e.g. reduce the cumulative installation time. The dependencies that benefit the most from this are presumably large dependencies, like tensorflow and torch.

The decision we'll have to make here is if we care about testing a minimal setup, and if we want to include these checks in the CI pipeline. We do have a minimal dependency test step, but it only tests a small subset at the moment.

That said, I agree that we should probably limit the default dependencies here.

Actually, let's hold off this PR for now, I think I have an idea on how to generally solve this problem.

Comment on lines +6 to +7
# Because Python version changed, we need to re-install Ray here
- rm -rf ./python/ray/thirdparty_files; rm -rf ./python/ray/pickle5_files; ./ci/travis/ci.sh build
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be implemented directly within install-dependencies.sh? Seems weird to have the option to specify a PYTHON version but also needing to do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many tests currently separately call ./ci/travis/ci.sh build. You are right that this is not ideal, but I would say let's keep it out of the scope of this PR and review this in the future when refactoring CI structure

- TRAIN_TESTING=1 DATA_PROCESSING_TESTING=1 PYTHON=3.7 ./ci/travis/install-dependencies.sh
# Because Python version changed, we need to re-install Ray here
- rm -rf ./python/ray/thirdparty_files; rm -rf ./python/ray/pickle5_files; ./ci/travis/ci.sh build
- PYTHON=3.7 TRAIN_TESTING=1 DATA_PROCESSING_TESTING=1 ./ci/travis/install-dependencies.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the default version is now 3.7, should PYTHON=3.7 be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep it here for verbosity

.buildkite/pipeline.gpu.yml Show resolved Hide resolved
@bveeramani
Copy link
Member

‼️ ACTION REQUIRED ‼️

We've switched our code formatter from YAPF to Black (see #21311).

To prevent issues with merging your code, here's what you'll need to do:

  1. Install Black
pip install -I black==21.12b0
  1. Format changed files with Black
curl -o format-changed.sh https://gist.githubusercontent.com/bveeramani/42ef0e9e387b755a8a735b084af976f2/raw/7631276790765d555c423b8db2b679fd957b984a/format-changed.sh
chmod +x ./format-changed.sh
./format-changed.sh
rm format-changed.sh
  1. Commit your changes.
git add --all
git commit -m "Format Python code with Black"
  1. Merge master into your branch.
git pull upstream master
  1. Resolve merge conflicts (if necessary).

After running these steps, you'll have the updated format.sh.

@krfricke krfricke merged commit b51b5af into ray-project:master Feb 1, 2022
@krfricke krfricke deleted the ci/gpu-dependencies branch February 1, 2022 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants