Clean up ray-ml requirements #23325

ddelange · 2022-03-18T08:51:09Z

Why are these changes needed?

In https://github.com/ray-project/ray/blob/ray-1.11.0/docker/ray-ml/Dockerfile, the order of pip install commands currently matters (potentially a lot). It would be good to run one big pip install command to avoid ending up with a broken env:

$ pip install \
  -r https://raw.githubusercontent.com/ray-project/ray/ray-1.11.0/python/requirements.txt \
  -r https://raw.githubusercontent.com/ray-project/ray/ray-1.11.0/python/requirements/ml/requirements_rllib.txt \
  -r https://raw.githubusercontent.com/ray-project/ray/ray-1.11.0/python/requirements/ml/requirements_dl.txt \
  -r https://raw.githubusercontent.com/ray-project/ray/ray-1.11.0/python/requirements/ml/requirements_train.txt \
  -r https://raw.githubusercontent.com/ray-project/ray/ray-1.11.0/python/requirements/ml/requirements_tune.txt \
  -r https://raw.githubusercontent.com/ray-project/ray/ray-1.11.0/python/requirements/ml/requirements_upstream.txt

...

ERROR: Cannot install tensorflow==2.5.1 and tensorflow==2.6.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested tensorflow==2.5.1
    The user requested tensorflow==2.6.0

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

ddelange · 2022-03-18T09:23:40Z

python/requirements/ml/requirements_dl.txt

-# TODO(amogkam): Remove after https://github.com/tensorflow/tensorflow/issues/52922 is fixed.
-keras==2.6.0
-tensorflow==2.6.0
+tensorflow~=2.6.2


maybe remove here and only have it only in python/requirements.txt?

Anything against allowing tensorflow 2.7 or 2.8 btw? Probably one of the breaking changes in 2.7? Or is it pinned to sync with tensorflow-probability releases?

Usually it's a breaking change but cc @sven1977 for more details

Looks like this sync up is necessary indeed. There's a note in their docs:

Note: Since TensorFlow is not included as a dependency of the TensorFlow Probability package (in setup.py), you must explicitly install the TensorFlow package (tensorflow or tensorflow-gpu). This allows us to maintain one package instead of separate packages for CPU and GPU-enabled TensorFlow.

Which is actually outdated as tensorflow-gpu is deprecated. They could simply have the correct tensorflow requirement range specified in the probability package requirements, and there would be no need to manually sync up here anymore (i.e. simply only specify tensorflow-probability==XXX in python/requirements.txt).

python/requirements/ml/requirements_rllib.txt

krfricke

This is great, thank. I'd like to get a quick glance from @sven1977 and @amogkam (and maybe @matthewdeng) to see if they expect any issues with these changes.

krfricke · 2022-03-23T10:08:41Z

python/requirements/ml/requirements_dl.txt


+tensorflow~=2.6.2
+tensorflow-probability~=0.14.1


Is keras installed as a dependency in tensorflow automatically?

exactly. There was a small problem back in the day (#20032) for a single version, but that's fixed and so tensorflow will request the right version of keras

krfricke · 2022-03-23T10:08:59Z

python/requirements/ml/requirements_dl.txt

-# TODO(amogkam): Remove after https://github.com/tensorflow/tensorflow/issues/52922 is fixed.
-keras==2.6.0
-tensorflow==2.6.0
+tensorflow~=2.6.2


Usually it's a breaking change but cc @sven1977 for more details

python/requirements.txt

ddelange · 2022-03-23T14:35:16Z

It seems test_memstat is consistently failing, but CI doesn't tell us which test exactly fails. How can I access the detailed logs? Or even better: anyone an idea why this change can cause this test to fail?

//python/ray/tests:test_memstat       FAILED in 3 out of 3 in 68.5s

amogkam

Thanks a bunch for doing this @ddelange!

Just left a couple minor suggestions!

amogkam · 2022-03-23T16:08:41Z

docker/ray-ml/Dockerfile

-    && $HOME/anaconda3/bin/pip --no-cache-dir install -U -r requirements_ml_docker.txt \
-    && $HOME/anaconda3/bin/pip --no-cache-dir install -U -r requirements_upstream.txt \
+    && $HOME/anaconda3/bin/pip --no-cache-dir install -U pip \
+    && $HOME/anaconda3/bin/pip --no-cache-dir install -U \


Nice thanks for doing this!

python/requirements.txt

python/requirements/ml/requirements_rllib.txt

python/requirements.txt

python/requirements/ml/requirements_rllib.txt

python/requirements/ml/requirements_dl.txt

Co-authored-by: Kai Fricke <[email protected]>

amogkam

LGTM! Please ping again once CI finishes! Thanks again @ddelange!

ddelange · 2022-03-24T06:49:36Z

python/requirements.txt

@@ -85,7 +85,6 @@ pytest-lazy-fixture
 pytest-timeout
 pytest-virtualenv
 scikit-learn==0.22.2
-tensorflow==2.5.1


does this change need to be documented anywhere?

ddelange · 2022-03-24T09:13:54Z

test_tensorflow times out, but I can't seem to find any relevant tracebacks in the UI. How can I access the logfiles mentioned there?

I have a feeling that removing tensorflow from base requirements has consequences for CI (cc @amogkam), but I'm a bit lost tbh. Every time a different subset of test suites seem to fail, also before removing tensorflow there 😅

ddelange · 2022-03-24T12:56:13Z

Can it be that rllib tests are being run without installing requirements_rllib.txt?
From Small & Client:

File "/Users/ec2-user/.buildkite-agent/builds/bk-mac1-experimental-queue-i-078ec7b108f388153-1/ray-project/ray-builders-pr/python/ray/rllib/agents/cql/cql_tf_policy.py", line 50, in <module>
    def _repeat_tensor(t: tf.Tensor, n: int):
AttributeError: 'NoneType' object has no attribute 'Tensor'

From Medium K-Z:

TIMEOUT: //python/ray/tests:test_tensorflow

From Serve Tests

==================== Test output for //python/ray/serve:tutorial_tensorflow:
--
  | Traceback (most recent call last):
  | File "/root/.cache/bazel/_bazel_root/5fe90af4e7d1ed9fcf52f59e39e126f5/execroot/com_github_ray_project_ray/bazel-out/k8-opt/bin/python/ray/serve/tutorial_tensorflow.runfiles/com_github_ray_project_ray/python/ray/serve/examples/doc/tutorial_tensorflow.py", line 46, in <module>
  | train_and_save_model()
  | File "/root/.cache/bazel/_bazel_root/5fe90af4e7d1ed9fcf52f59e39e126f5/execroot/com_github_ray_project_ray/bazel-out/k8-opt/bin/python/ray/serve/tutorial_tensorflow.runfiles/com_github_ray_project_ray/python/ray/serve/examples/doc/tutorial_tensorflow.py", line 18, in train_and_save_model
  | import tensorflow as tf
  | ModuleNotFoundError: No module named 'tensorflow'

python/requirements/ml/requirements_rllib.txt

ddelange · 2022-03-25T06:37:24Z

All tests are now passing except for Windows test_standalone

  | serve.start(detached=True, _checkpoint_path=f"file://{tmp_path}")
  | hello.deploy()
  | world.deploy()
  | >       assert check("hello")
  | E       AssertionError: assert False
  | E        +  where False = <function test_local_store_recovery.<locals>.check at 0x000001CC17AE93A0>('hello')
  |  
  | \\?\C:\Users\ContainerAdministrator\AppData\Local\temp\Bazel.runfiles_c6bml4jv\runfiles\com_github_ray_project_ray\python\ray\serve\tests\test_standalone.py:584: AssertionError

@amogkam I see it's the top flakey test for serve, what do you guys usually do in this case? Can you confirm this is indeed flakiness?

amogkam · 2022-03-25T06:41:34Z

Just reran the test...let's see if it passes this time around.

But it does look unrelated to your changes so I think we should be good here!

ddelange · 2022-03-25T08:33:01Z

@amogkam looks like we're green!

krfricke · 2022-03-25T15:59:59Z

This is amazing. Thanks!

In https://github.com/ray-project/ray/blob/ray-1.11.0/docker/ray-ml/Dockerfile, the order of pip install commands currently matters (potentially a lot). It would be good to run one big pip install command to avoid ending up with a broken env. Co-authored-by: Kai Fricke <[email protected]>

* [release tests] Pin gym everywhere (#23349) * [rllib] Pin gym everywhere (#23384) This PR Pins gym in the app config.yaml's for rllib and tune so that release tests are no longer broken by the new gym version. * [RLlib] Pin Gym Everywhere and turn off gpu for recsim tests (#23452) * [ci] Clean up ray-ml requirements (#23325) In https://github.com/ray-project/ray/blob/ray-1.11.0/docker/ray-ml/Dockerfile, the order of pip install commands currently matters (potentially a lot). It would be good to run one big pip install command to avoid ending up with a broken env. Co-authored-by: Kai Fricke <[email protected]> * Fix merge conflict * Also copy requirements_train.txt Co-authored-by: Avnish Narayan <[email protected]> Co-authored-by: ddelange <[email protected]> Co-authored-by: Amog Kamsetty <[email protected]>

ddelange added 2 commits March 18, 2022 09:31

Unpin tensorflow

f4c8b14

Clean up requirements_ml_docker

164eed5

ddelange requested review from amogkam, sven1977, richardliaw and matthewdeng as code owners March 18, 2022 09:12

ddelange added 2 commits March 18, 2022 10:13

Bump 2.6.2<=tensorflow<2.6.*

10474d2

One big pip install command

29ae855

ddelange changed the title ~~Unpin tensorflow~~ Clean up tensorflow requirements Mar 18, 2022

ddelange commented Mar 18, 2022

View reviewed changes

ddelange added 4 commits March 18, 2022 10:26

Typo

40cf14b

Typo

4d2cde8

Allow bugfix releases of tensorflow-probability

a35ebc9

Fix compatibility between rllib and train reqs

e466fb2

ddelange commented Mar 18, 2022

View reviewed changes

python/requirements/ml/requirements_rllib.txt Outdated Show resolved Hide resolved

ddelange changed the title ~~Clean up tensorflow requirements~~ Clean up ray-ml requirements Mar 18, 2022

ddelange added 2 commits March 18, 2022 16:17

Add back -U

7e807a3

Merge branch 'ray-project:master' into patch-1

b482d7f

krfricke approved these changes Mar 23, 2022

View reviewed changes

krfricke assigned matthewdeng, sven1977, amogkam and krfricke Mar 23, 2022

Merge branch 'ray-project:master' into patch-1

a19cbd6

amogkam requested changes Mar 23, 2022

View reviewed changes

ddelange commented Mar 23, 2022

View reviewed changes

python/requirements.txt Outdated Show resolved Hide resolved

python/requirements/ml/requirements_rllib.txt Outdated Show resolved Hide resolved

python/requirements/ml/requirements_dl.txt Outdated Show resolved Hide resolved

ddelange and others added 2 commits March 23, 2022 20:02

Apply suggestions from code review

e57f5ec

Co-authored-by: Kai Fricke <[email protected]>

PR Suggestion

d5e322f

ddelange requested a review from amogkam March 23, 2022 19:04

amogkam approved these changes Mar 23, 2022

View reviewed changes

ddelange added 2 commits March 24, 2022 06:52

Revert tensorflow to 2.6.2

ac8a2fb

Merge branch 'ray-project:master' into patch-1

4af1217

ddelange commented Mar 24, 2022

View reviewed changes

ddelange added 2 commits March 24, 2022 14:26

Install requirements_dl.txt in CI

f6b0105

Merge branch 'ray-project:master' into patch-1

25d6bf1

ddelange commented Mar 24, 2022

View reviewed changes

python/requirements/ml/requirements_rllib.txt Outdated Show resolved Hide resolved

Leftover

42767be

krfricke merged commit e109c13 into ray-project:master Mar 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up ray-ml requirements #23325

Clean up ray-ml requirements #23325

ddelange commented Mar 18, 2022 •

edited

Loading

ddelange Mar 18, 2022 •

edited

Loading

ddelange Mar 18, 2022 •

edited

Loading

krfricke Mar 23, 2022

ddelange Mar 23, 2022

krfricke left a comment

krfricke Mar 23, 2022

ddelange Mar 23, 2022

krfricke Mar 23, 2022

ddelange commented Mar 23, 2022

amogkam left a comment

amogkam Mar 23, 2022

amogkam left a comment

ddelange Mar 24, 2022

ddelange commented Mar 24, 2022 •

edited

Loading

ddelange commented Mar 24, 2022

ddelange commented Mar 25, 2022

amogkam commented Mar 25, 2022

ddelange commented Mar 25, 2022

krfricke commented Mar 25, 2022


		tensorflow~=2.6.2
		tensorflow-probability~=0.14.1

Clean up ray-ml requirements #23325

Clean up ray-ml requirements #23325

Conversation

ddelange commented Mar 18, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

ddelange Mar 18, 2022 • edited Loading

Choose a reason for hiding this comment

ddelange Mar 18, 2022 • edited Loading

Choose a reason for hiding this comment

krfricke Mar 23, 2022

Choose a reason for hiding this comment

ddelange Mar 23, 2022

Choose a reason for hiding this comment

krfricke left a comment

Choose a reason for hiding this comment

krfricke Mar 23, 2022

Choose a reason for hiding this comment

ddelange Mar 23, 2022

Choose a reason for hiding this comment

krfricke Mar 23, 2022

Choose a reason for hiding this comment

ddelange commented Mar 23, 2022

amogkam left a comment

Choose a reason for hiding this comment

amogkam Mar 23, 2022

Choose a reason for hiding this comment

amogkam left a comment

Choose a reason for hiding this comment

ddelange Mar 24, 2022

Choose a reason for hiding this comment

ddelange commented Mar 24, 2022 • edited Loading

ddelange commented Mar 24, 2022

ddelange commented Mar 25, 2022

amogkam commented Mar 25, 2022

ddelange commented Mar 25, 2022

krfricke commented Mar 25, 2022

ddelange commented Mar 18, 2022 •

edited

Loading

ddelange Mar 18, 2022 •

edited

Loading

ddelange Mar 18, 2022 •

edited

Loading

ddelange commented Mar 24, 2022 •

edited

Loading