Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust the TuneCallback class for Pytorch-lightning v1.6+ (removing o… #27770

Closed
wants to merge 2 commits into from

Conversation

davzaman
Copy link
Contributor

Why are these changes needed?

This is a duplicate of #27766 but with the changes that were supposed to be there (accidentally made changes while the PR was being accepted while trying to change the commits to have the sign off).

Getting a ton of warnings from pytorch-lightning about deprecation of certain hooks. The TuneCallback has old hooks such as keyboard_interrupt which have been deprecated. I updated the callback to the new hooks and updated the integration tests.

The deprecated hooks and their new versions (In pl v1.6+) are:

  • on_keyboard_interrupt -> on_exception
  • on_init_start -> just deprecated no new version
  • on_init_end -> just deprecated no new version
  • on_batch_start -> on_train_batch_start
  • on_batch_end -> on_train_batch_end
  • on_epoch_end -> on_<train/validation/test>_epoch_end
  • on_epoch_start -> on_<train/validation/test>_epoch_start

Related issue number

Closes #27487

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/

I spent a few hours trying to wrangle my environment. Unfortunately I'm on windows using WSL and have an old version of Debian for some reason which only supports gcc-6 (not even close to gcc-9 which is required for building the project so I can run the tests). Given this roadblock and the fact that the fixes were minimal and simple I decided to push out the changes.

  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

…r modifying deprecated functions), reorganized the order to match the rest of the functions, along with the integration tests.

Signed-off-by: Davina Zaman <[email protected]>
def on_epoch_start(self, trainer: Trainer, pl_module: LightningModule):
if "epoch_start" in self._on:
def on_train_epoch_start(self, trainer: Trainer, pl_module: LightningModule):
if "train_epoch_start" in self._on:
self._handle(trainer, pl_module)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardliaw should we ensure backwards compatible support for PTL < 1.6?

If so, we can do some versioning checks of pytorch_lightning similar to https://github.com/ray-project/ray/pull/27395/files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I looked into implementing this. My proposed solution would be to have an additional class, something like TuneCallbackBackwardCompat that is the old class, and then we inherit from either TuneCallback or the backward compatible version in the remaining classes TuneReportCallback, _TuneCheckpointCallback, TuneReportCheckpointCallback.

Additionally in the tests, we'd need to test both, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we should ensure backwards compat support. Having a backwards compat class seems good to me, thanks!

Copy link
Contributor Author

@davzaman davzaman Aug 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardliaw Awesome, I've added those changes in the newest commit

@zhe-thoughts
Copy link
Collaborator

Thanks @davzaman for the contribution and @matthewdeng for the initial review!

Ping @richardliaw for the question "should we ensure backwards compatible support for PTL < 1.6?"

@stale
Copy link

stale bot commented Sep 22, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

  • If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@stale stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Sep 22, 2022
@davzaman
Copy link
Contributor Author

@richardliaw Does everything look good to go?

@stale
Copy link

stale bot commented Oct 8, 2022

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

@stale stale bot closed this Oct 8, 2022
@stale stale bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Oct 11, 2022
Copy link
Contributor

@matthewdeng matthewdeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review. Generally looks good to me. cc @amogkam for another pass!

from ray import tune
from ray.tune.integration.pytorch_lightning import (
TuneReportCallback,
TuneReportCheckpointCallback,
_TuneCheckpointCallback,
)

ray_pl_use_master = v_parse(version("pytorch_lightning")) >= v_parse("1.6")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


logger = logging.getLogger(__name__)

ray_pl_use_master = v_parse(version("pytorch_lightning")) >= v_parse("1.6")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Would rename this to something more descriptive since master may change in the future (e.g. use_ptl_1_6_api)

@davzaman
Copy link
Contributor Author

davzaman commented Nov 10, 2022

Sorry it took me a while to get to this. I'm seeing @amogkam's commit 60cde11 from a PR #30045 that uses a different solution (dynamic rather than using a flag to decide on using one set of hooks over the other). In that case, the changes I made have already been integrated in a sense. The only thing missing is the update to https://github.com/ray-project/ray/blob/master/python/ray/tests/ray_lightning/simple_tune.py as you mentioned, as well as rerunning https://github.com/ray-project/ray/blob/master/doc/source/tune/examples/tune-pytorch-lightning.ipynb to regenerate the outputs without the deprecation warnings.

In that case, how would you like to proceed? We can probably close this pr and just add a commit for the remaining couple of things?

@davzaman
Copy link
Contributor Author

davzaman commented Dec 8, 2022

Checking in to see your thoughts re: my previous comment @matthewdeng :)

@matthewdeng
Copy link
Contributor

@davzaman yeah that sounds great! Closing this PR and creating a new one with the remaining changes makes sense to me.

@stale
Copy link

stale bot commented Jan 15, 2023

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

  • If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@stale stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jan 15, 2023
@stale
Copy link

stale bot commented Feb 2, 2023

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

@stale stale bot closed this Feb 2, 2023
@Minxiangliu
Copy link

Hi @davzaman ,
I use your modification and run the test, but get the following error, do you have any suggestions?
TypeError: on_exception() takes 3 positional arguments but 4 were given

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale The issue is stale. It will be closed within 7 days unless there are further conversation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Tune: Python] Pytorch-lightning deprecated hooks in TuneReportCallback
6 participants