Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Revert "Global logging format changes" (#34126)" #34182

Merged
merged 3 commits into from
May 19, 2023

Conversation

peytondmurray
Copy link
Contributor

@peytondmurray peytondmurray commented Apr 7, 2023

This PR changes how the logging configuration for Ray is set, and changes the format of log messages.

Why are these changes needed?

Changes

  • Attempts to consolidate logging configuration by introducing reasonable defaults in ray/log.py.
  • This new logging configuration is done once in ray/__init__.py at the top of the module. Subsequent calls to the configuration are ignored.
  • A logger for ray.rllib is configured at the WARN level, to address Revert "Simplify logging configuration. (#30863)" #31858. With this change,
    Revert "Simplify logging configuration. (#30863)" #31858 can be reverted, again simplifying and consolidating logging configuration.
  • Modified test_output.py::test_logger_config to test only the logger config, not launch a ray cluster. The test was failing intermittently, I think due to a race condition between the launch of the cluster and the reading of the subprocess's stdout, and anyway it wasn't necessary to call ray.init here to check that logging was configured correctly.
  • Modified python/ray/tune/tests/test_commands.py::test_ls_with_cfg to test the underlying data, not what gets printed to stdout (which has changed with the new logging system).
  • Modified a logging message in ray.tune.automl.search_policy.AutoMLSearcher.on_trial_complete, which in certain cases emits a logging message which tries to format a NoneType into a %f during log message formatting. This was a previously-undetected bug which showed up because the default log level is now INFO. This fixes a test that was failing in test_automl_searcher.py::AutoMLSearcherTest.

New changes since the revert

  • Added propagate = False to the ray logger, which matches the behavior of the current master branch. The implication of this is that some tests broke, because in the current master branch some tests called ray.init(setup_logging=False), and no logging configuration was generated, i.e. the default behavior was to propagate messages up to the root logger. Tests which rely on caplog therefore expect to be able to examine log messages this way because caplog attaches a special logging handler to the root logger.

The fix for these tests is to always use the propagate_logs pytest fixture, which was already being used in many places, everywhere caplog was being used. This ensures messages are propagated beyond the ray logger to the root logger, and that caplog would be able to examine log messages once again.

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@rkooo567
Copy link
Contributor

@krfricke @sihanwang41 can you guys approve this PR again? It is the revert-revert

@rkooo567
Copy link
Contributor

Also, @c21

python/ray/data/dataset.py Outdated Show resolved Hide resolved
@peytondmurray peytondmurray force-pushed the global-logging-2 branch 2 times, most recently from edb8fbc to 42f540c Compare April 13, 2023 07:16
@peytondmurray peytondmurray force-pushed the global-logging-2 branch 8 times, most recently from ef402d4 to 1772154 Compare April 14, 2023 05:22
@rkooo567
Copy link
Contributor

cc @peytondmurray seems like it is ready to go? Can you convert it to regular PR?

@krfricke @c21 @sihanwang41 please approve the PR. It is the revert-revert, and there shouldn't be any new change (the issue was we set propagate=True to the root logger, which was the changed behavior)

@peytondmurray
Copy link
Contributor Author

linux://python/ray/tests:test_advanced_9: Flaky
linux://python/ray/tests:test_autoscaler_e2e: Not sure about this one. No obvious connection to logging
Documentation: This one does not appear to be related to logging, there's some issue with a torch trainer example.
linux://python/ray/train:test_accelerate_trainer_gpu: Test has been broken for a while
linux://python/ray/tune:test_commands: A bit flaky, seems like it took a bit too long for a test to run.
windows://python/ray/tests:test_object_store_metrics: Flaky

Copy link
Contributor

@angelinalg angelinalg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rkooo567
Copy link
Contributor

cc @peytondmurray can you resolve the conflict before merging it?

@rkooo567 rkooo567 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label May 10, 2023
@peytondmurray
Copy link
Contributor Author

Huh, so I rebased on master and the huggingface test merge conflict went away, but then there was another conflict about ray/data/datastream.py, which I guess got reverted a little while ago. Anyway, it all seems fixed now.

@peytondmurray peytondmurray force-pushed the global-logging-2 branch 2 times, most recently from 574e1fb to a97b7fd Compare May 11, 2023 19:34
@peytondmurray
Copy link
Contributor Author

Rebased to get rid of the agent.py import conflict.

@peytondmurray
Copy link
Contributor Author

caplog was used without propagate_logs fixture on the new widget tests. Fixed now, will check back in ~2h to see if it passes.

@peytondmurray
Copy link
Contributor Author

@rkooo567 About the test_autoscaler.py failure, I tried testing locally and every time I ran the AutoscalingTest.testDynamicScaling<number> tests different ones passed/failed:

image

image

So after talking with @DmitriGekhtman about this:

It's probably not related to logging.
You'd probably see the same intermittent failures if you run the tests against the master branch.

I tried testing on the master branch and indeed I am seeing the same flaky test failures:

image

image

@peytondmurray
Copy link
Contributor Author

@rkooo567 Looks like the failing test was fixed on master. Is there any chance we can get this merged?

@rkooo567
Copy link
Contributor

Failures seem unrelated

@rkooo567
Copy link
Contributor

Test failing tests seem to fail in the master as well

@rkooo567 rkooo567 merged commit ce7764b into ray-project:master May 19, 2023
@peytondmurray peytondmurray deleted the global-logging-2 branch May 19, 2023 00:55
@pcmoritz pcmoritz mentioned this pull request May 23, 2023
8 tasks
pcmoritz added a commit that referenced this pull request May 24, 2023
Since #34182, the docs build on M1 mac had been broken
scv119 pushed a commit to scv119/ray that referenced this pull request Jun 16, 2023
…ay-project#34182)

Attempts to consolidate logging configuration by introducing reasonable defaults in ray/log.py.
This new logging configuration is done once in ray/__init__.py at the top of the module. Subsequent calls to the configuration are ignored.
A logger for ray.rllib is configured at the WARN level, to address Revert "Simplify logging configuration. (ray-project#30863)" ray-project#31858. With this change,
Revert "Simplify logging configuration. (ray-project#30863)" ray-project#31858 can be reverted, again simplifying and consolidating logging configuration.
Modified test_output.py::test_logger_config to test only the logger config, not launch a ray cluster. The test was failing intermittently, I think due to a race condition between the launch of the cluster and the reading of the subprocess's stdout, and anyway it wasn't necessary to call ray.init here to check that logging was configured correctly.
Modified python/ray/tune/tests/test_commands.py::test_ls_with_cfg to test the underlying data, not what gets printed to stdout (which has changed with the new logging system).
Modified a logging message in ray.tune.automl.search_policy.AutoMLSearcher.on_trial_complete, which in certain cases emits a logging message which tries to format a NoneType into a %f during log message formatting. This was a previously-undetected bug which showed up because the default log level is now INFO. This fixes a test that was failing in test_automl_searcher.py::AutoMLSearcherTest.
scv119 pushed a commit to scv119/ray that referenced this pull request Jun 16, 2023
Since ray-project#34182, the docs build on M1 mac had been broken
arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023
…ay-project#34182)

Attempts to consolidate logging configuration by introducing reasonable defaults in ray/log.py.
This new logging configuration is done once in ray/__init__.py at the top of the module. Subsequent calls to the configuration are ignored.
A logger for ray.rllib is configured at the WARN level, to address Revert "Simplify logging configuration. (ray-project#30863)" ray-project#31858. With this change,
Revert "Simplify logging configuration. (ray-project#30863)" ray-project#31858 can be reverted, again simplifying and consolidating logging configuration.
Modified test_output.py::test_logger_config to test only the logger config, not launch a ray cluster. The test was failing intermittently, I think due to a race condition between the launch of the cluster and the reading of the subprocess's stdout, and anyway it wasn't necessary to call ray.init here to check that logging was configured correctly.
Modified python/ray/tune/tests/test_commands.py::test_ls_with_cfg to test the underlying data, not what gets printed to stdout (which has changed with the new logging system).
Modified a logging message in ray.tune.automl.search_policy.AutoMLSearcher.on_trial_complete, which in certain cases emits a logging message which tries to format a NoneType into a %f during log message formatting. This was a previously-undetected bug which showed up because the default log level is now INFO. This fixes a test that was failing in test_automl_searcher.py::AutoMLSearcherTest.

Signed-off-by: e428265 <[email protected]>
arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023
Since ray-project#34182, the docs build on M1 mac had been broken

Signed-off-by: e428265 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants