Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up ray-ml requirements #23325

Merged
merged 18 commits into from
Mar 25, 2022
Merged

Clean up ray-ml requirements #23325

merged 18 commits into from
Mar 25, 2022

Conversation

ddelange
Copy link
Contributor

@ddelange ddelange commented Mar 18, 2022

Why are these changes needed?

In https://github.com/ray-project/ray/blob/ray-1.11.0/docker/ray-ml/Dockerfile, the order of pip install commands currently matters (potentially a lot). It would be good to run one big pip install command to avoid ending up with a broken env:

$ pip install \
  -r https://raw.githubusercontent.com/ray-project/ray/ray-1.11.0/python/requirements.txt \
  -r https://raw.githubusercontent.com/ray-project/ray/ray-1.11.0/python/requirements/ml/requirements_rllib.txt \
  -r https://raw.githubusercontent.com/ray-project/ray/ray-1.11.0/python/requirements/ml/requirements_dl.txt \
  -r https://raw.githubusercontent.com/ray-project/ray/ray-1.11.0/python/requirements/ml/requirements_train.txt \
  -r https://raw.githubusercontent.com/ray-project/ray/ray-1.11.0/python/requirements/ml/requirements_tune.txt \
  -r https://raw.githubusercontent.com/ray-project/ray/ray-1.11.0/python/requirements/ml/requirements_upstream.txt

...

ERROR: Cannot install tensorflow==2.5.1 and tensorflow==2.6.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested tensorflow==2.5.1
    The user requested tensorflow==2.6.0

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@ddelange ddelange changed the title Unpin tensorflow Clean up tensorflow requirements Mar 18, 2022
# TODO(amogkam): Remove after https://github.com/tensorflow/tensorflow/issues/52922 is fixed.
keras==2.6.0
tensorflow==2.6.0
tensorflow~=2.6.2
Copy link
Contributor Author

@ddelange ddelange Mar 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe remove here and only have it only in python/requirements.txt?

Copy link
Contributor Author

@ddelange ddelange Mar 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything against allowing tensorflow 2.7 or 2.8 btw? Probably one of the breaking changes in 2.7? Or is it pinned to sync with tensorflow-probability releases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually it's a breaking change but cc @sven1977 for more details

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this sync up is necessary indeed. There's a note in their docs:

Note: Since TensorFlow is not included as a dependency of the TensorFlow Probability package (in setup.py), you must explicitly install the TensorFlow package (tensorflow or tensorflow-gpu). This allows us to maintain one package instead of separate packages for CPU and GPU-enabled TensorFlow.

Which is actually outdated as tensorflow-gpu is deprecated. They could simply have the correct tensorflow requirement range specified in the probability package requirements, and there would be no need to manually sync up here anymore (i.e. simply only specify tensorflow-probability==XXX in python/requirements.txt).

@ddelange ddelange changed the title Clean up tensorflow requirements Clean up ray-ml requirements Mar 18, 2022
Copy link
Contributor

@krfricke krfricke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thank. I'd like to get a quick glance from @sven1977 and @amogkam (and maybe @matthewdeng) to see if they expect any issues with these changes.


tensorflow~=2.6.2
tensorflow-probability~=0.14.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is keras installed as a dependency in tensorflow automatically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly. There was a small problem back in the day (#20032) for a single version, but that's fixed and so tensorflow will request the right version of keras

# TODO(amogkam): Remove after https://github.com/tensorflow/tensorflow/issues/52922 is fixed.
keras==2.6.0
tensorflow==2.6.0
tensorflow~=2.6.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually it's a breaking change but cc @sven1977 for more details

python/requirements.txt Outdated Show resolved Hide resolved
@ddelange
Copy link
Contributor Author

It seems test_memstat is consistently failing, but CI doesn't tell us which test exactly fails. How can I access the detailed logs? Or even better: anyone an idea why this change can cause this test to fail?

//python/ray/tests:test_memstat       FAILED in 3 out of 3 in 68.5s

Copy link
Contributor

@amogkam amogkam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a bunch for doing this @ddelange!

Just left a couple minor suggestions!

&& $HOME/anaconda3/bin/pip --no-cache-dir install -U -r requirements_ml_docker.txt \
&& $HOME/anaconda3/bin/pip --no-cache-dir install -U -r requirements_upstream.txt \
&& $HOME/anaconda3/bin/pip --no-cache-dir install -U pip \
&& $HOME/anaconda3/bin/pip --no-cache-dir install -U \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice thanks for doing this!

python/requirements.txt Outdated Show resolved Hide resolved
python/requirements.txt Outdated Show resolved Hide resolved
python/requirements/ml/requirements_rllib.txt Outdated Show resolved Hide resolved
python/requirements.txt Outdated Show resolved Hide resolved
python/requirements/ml/requirements_rllib.txt Outdated Show resolved Hide resolved
python/requirements/ml/requirements_dl.txt Outdated Show resolved Hide resolved
@ddelange ddelange requested a review from amogkam March 23, 2022 19:04
Copy link
Contributor

@amogkam amogkam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Please ping again once CI finishes! Thanks again @ddelange!

@@ -85,7 +85,6 @@ pytest-lazy-fixture
pytest-timeout
pytest-virtualenv
scikit-learn==0.22.2
tensorflow==2.5.1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this change need to be documented anywhere?

@ddelange
Copy link
Contributor Author

ddelange commented Mar 24, 2022

test_tensorflow times out, but I can't seem to find any relevant tracebacks in the UI. How can I access the logfiles mentioned there?

I have a feeling that removing tensorflow from base requirements has consequences for CI (cc @amogkam), but I'm a bit lost tbh. Every time a different subset of test suites seem to fail, also before removing tensorflow there 😅

@ddelange
Copy link
Contributor Author

Can it be that rllib tests are being run without installing requirements_rllib.txt?
From Small & Client:

File "/Users/ec2-user/.buildkite-agent/builds/bk-mac1-experimental-queue-i-078ec7b108f388153-1/ray-project/ray-builders-pr/python/ray/rllib/agents/cql/cql_tf_policy.py", line 50, in <module>
    def _repeat_tensor(t: tf.Tensor, n: int):
AttributeError: 'NoneType' object has no attribute 'Tensor'

From Medium K-Z:

TIMEOUT: //python/ray/tests:test_tensorflow

From Serve Tests

==================== Test output for //python/ray/serve:tutorial_tensorflow:
--
  | Traceback (most recent call last):
  | File "/root/.cache/bazel/_bazel_root/5fe90af4e7d1ed9fcf52f59e39e126f5/execroot/com_github_ray_project_ray/bazel-out/k8-opt/bin/python/ray/serve/tutorial_tensorflow.runfiles/com_github_ray_project_ray/python/ray/serve/examples/doc/tutorial_tensorflow.py", line 46, in <module>
  | train_and_save_model()
  | File "/root/.cache/bazel/_bazel_root/5fe90af4e7d1ed9fcf52f59e39e126f5/execroot/com_github_ray_project_ray/bazel-out/k8-opt/bin/python/ray/serve/tutorial_tensorflow.runfiles/com_github_ray_project_ray/python/ray/serve/examples/doc/tutorial_tensorflow.py", line 18, in train_and_save_model
  | import tensorflow as tf
  | ModuleNotFoundError: No module named 'tensorflow'

@ddelange
Copy link
Contributor Author

All tests are now passing except for Windows test_standalone

  | serve.start(detached=True, _checkpoint_path=f"file://{tmp_path}")
  | hello.deploy()
  | world.deploy()
  | >       assert check("hello")
  | E       AssertionError: assert False
  | E        +  where False = <function test_local_store_recovery.<locals>.check at 0x000001CC17AE93A0>('hello')
  |  
  | \\?\C:\Users\ContainerAdministrator\AppData\Local\temp\Bazel.runfiles_c6bml4jv\runfiles\com_github_ray_project_ray\python\ray\serve\tests\test_standalone.py:584: AssertionError

@amogkam I see it's the top flakey test for serve, what do you guys usually do in this case? Can you confirm this is indeed flakiness?

@amogkam
Copy link
Contributor

amogkam commented Mar 25, 2022

Just reran the test...let's see if it passes this time around.

But it does look unrelated to your changes so I think we should be good here!

@ddelange
Copy link
Contributor Author

@amogkam looks like we're green!
bike front flip

@krfricke krfricke merged commit e109c13 into ray-project:master Mar 25, 2022
@krfricke
Copy link
Contributor

This is amazing. Thanks!

krfricke added a commit to krfricke/ray that referenced this pull request Apr 4, 2022
In https://github.com/ray-project/ray/blob/ray-1.11.0/docker/ray-ml/Dockerfile, the order of pip install commands currently matters (potentially a lot). It would be good to run one big pip install command to avoid ending up with a broken env.

Co-authored-by: Kai Fricke <[email protected]>
architkulkarni pushed a commit that referenced this pull request Apr 8, 2022
* [release tests] Pin gym everywhere (#23349)

* [rllib] Pin gym everywhere (#23384)

This PR Pins gym in the app config.yaml's for rllib and tune so that release tests are no longer broken by the new gym version.

* [RLlib] Pin Gym Everywhere and turn off gpu for recsim tests (#23452)

* [ci] Clean up ray-ml requirements (#23325)

In https://github.com/ray-project/ray/blob/ray-1.11.0/docker/ray-ml/Dockerfile, the order of pip install commands currently matters (potentially a lot). It would be good to run one big pip install command to avoid ending up with a broken env.

Co-authored-by: Kai Fricke <[email protected]>

* Fix merge conflict

* Also copy requirements_train.txt

Co-authored-by: Avnish Narayan <[email protected]>
Co-authored-by: ddelange <[email protected]>
Co-authored-by: Amog Kamsetty <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants