-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ci/gpu] Move ML dependency install to Dockerfile #21711
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3.7 should be fine for Train!
.buildkite/Dockerfile.gpu
Outdated
RUN (if [ "${INSTALL_DEPENDENCIES}" = "ML" ]; then RLLIB_TESTING=1 TRAIN_TESTING=1 TUNE_TESTING=1 bash --login -i ./ci/travis/install-dependencies.sh; fi) | ||
RUN (if [ "${INSTALL_DEPENDENCIES}" = "ML" ]; then pip install -Ur ./python/requirements_ml_docker.txt; fi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm of the opinion that we should try to keep dependency installation to be the minimal set needed for each test suite.
If we always install the superset of dependencies, our tests may not be reflective of real user setups and may miss some cases. Though perhaps for GPU we do intentionally want to include at least the ML docker image dependencies.... thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of moving the installs into the Dockerfile is to speedup subsequent pipeline steps, e.g. reduce the cumulative installation time. The dependencies that benefit the most from this are presumably large dependencies, like tensorflow and torch.
The decision we'll have to make here is if we care about testing a minimal setup, and if we want to include these checks in the CI pipeline. We do have a minimal dependency test step, but it only tests a small subset at the moment.
That said, I agree that we should probably limit the default dependencies here.
Actually, let's hold off this PR for now, I think I have an idea on how to generally solve this problem.
# Because Python version changed, we need to re-install Ray here | ||
- rm -rf ./python/ray/thirdparty_files; rm -rf ./python/ray/pickle5_files; ./ci/travis/ci.sh build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be implemented directly within install-dependencies.sh
? Seems weird to have the option to specify a PYTHON
version but also needing to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many tests currently separately call ./ci/travis/ci.sh build
. You are right that this is not ideal, but I would say let's keep it out of the scope of this PR and review this in the future when refactoring CI structure
- TRAIN_TESTING=1 DATA_PROCESSING_TESTING=1 PYTHON=3.7 ./ci/travis/install-dependencies.sh | ||
# Because Python version changed, we need to re-install Ray here | ||
- rm -rf ./python/ray/thirdparty_files; rm -rf ./python/ray/pickle5_files; ./ci/travis/ci.sh build | ||
- PYTHON=3.7 TRAIN_TESTING=1 DATA_PROCESSING_TESTING=1 ./ci/travis/install-dependencies.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the default version is now 3.7, should PYTHON=3.7
be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd keep it here for verbosity
|
Why are these changes needed?
Instead of installing dependencies in each Buildkite job, let's move this to the Dockerfile instead.
This will update GPU tests to always use Python 3.7.
Cc @richardliaw @amogkam @matthewdeng will updating to 3.7 be a problem? Most GPUs tests are already installing Python 3.7, the exception are the Train tests.
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.