Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert new Hydra launch behavior #15737

Merged
merged 12 commits into from
Nov 21, 2022
Merged

Revert new Hydra launch behavior #15737

merged 12 commits into from
Nov 21, 2022

Conversation

awaelchli
Copy link
Contributor

@awaelchli awaelchli commented Nov 19, 2022

What does this PR do?

Closes #15727
Reverts #11617 without undoing the refactor
Fixes #15689

Temporarily removes multirun support for Hydra.

I verified that the reported issue is gone using the provided script.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

I made sure I had fun coding 🙃

cc @Borda @justusschock @awaelchli @akihironitta

@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Nov 19, 2022
@awaelchli awaelchli changed the title WIP: Revert new Hydra cwd behavior WIP: Revert new Hydra launch behavior Nov 19, 2022
@awaelchli awaelchli added 3rd party Related to a 3rd-party breaking change Includes a breaking change labels Nov 19, 2022
@awaelchli awaelchli added this to the v1.8.x milestone Nov 19, 2022
@awaelchli awaelchli added the strategy: ddp DistributedDataParallel label Nov 19, 2022
@awaelchli awaelchli changed the title WIP: Revert new Hydra launch behavior Revert new Hydra launch behavior Nov 19, 2022
@awaelchli awaelchli marked this pull request as ready for review November 19, 2022 15:45
Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@mergify mergify bot added the ready PRs ready to be merged label Nov 21, 2022
@awaelchli
Copy link
Contributor Author

Holding on merge until @SeanNaren reviewed the changes :)

@mergify mergify bot added has conflicts and removed ready PRs ready to be merged labels Nov 21, 2022
Copy link
Contributor

@carmocca carmocca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fixes #15545?

Copy link
Contributor

@SeanNaren SeanNaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can confirm the changes fixes our issue on the NeMo side, thanks so much @awaelchli :)

@SeanNaren SeanNaren mentioned this pull request Nov 21, 2022
8 tasks
@github-actions
Copy link
Contributor

github-actions bot commented Nov 21, 2022

⚡ Required checks status: All passing 🟢

Groups summary

🟢 pytorch_lightning: Tests workflow
Check ID Status
pl-cpu (macOS-11, pytorch, 3.8, 1.11) success
pl-cpu (macOS-11, pytorch, 3.9, 1.12) success
pl-cpu (macOS-11, pytorch, 3.10, 1.13) success
pl-cpu (macOS-11, pytorch, 3.8, 1.10, oldest) success
pl-cpu (ubuntu-20.04, pytorch, 3.8, 1.10) success
pl-cpu (ubuntu-20.04, pytorch, 3.9, 1.11) success
pl-cpu (ubuntu-20.04, pytorch, 3.10, 1.12) success
pl-cpu (ubuntu-20.04, pytorch, 3.10, 1.13) success
pl-cpu (ubuntu-20.04, pytorch, 3.7, 1.10, oldest) success
pl-cpu (windows-2022, pytorch, 3.9, 1.11) success
pl-cpu (windows-2022, pytorch, 3.10, 1.12) success
pl-cpu (windows-2022, pytorch, 3.10, 1.13) success
pl-cpu (windows-2022, pytorch, 3.7, 1.10, oldest) success
pl-cpu (slow, macOS-11, pytorch, 3.7, 1.11) success
pl-cpu (slow, ubuntu-20.04, pytorch, 3.7, 1.11) success
pl-cpu (slow, windows-2022, pytorch, 3.7, 1.11) success
pl-cpu (macOS-11, lightning, 3.8, 1.13) success
pl-cpu (ubuntu-20.04, lightning, 3.8, 1.13) success
pl-cpu (windows-2022, lightning, 3.8, 1.13) success
🟢 pytorch_lightning: Azure GPU
Check ID Status
pytorch-lightning (GPUs) success
🟢 pytorch_lightning: Azure HPU
Check ID Status
pytorch-lightning (HPUs) success
🟢 pytorch_lightning: Azure IPU
Check ID Status
pytorch-lightning (IPUs) success
🟢 pytorch_lightning: Docs
Check ID Status
make-doctest (pytorch) success
make-html (pytorch) success
🟢 lightning_lite: CPU workflow
Check ID Status
lite-cpu (macOS-11, lite, 3.8, 1.11) success
lite-cpu (macOS-11, lite, 3.9, 1.12) success
lite-cpu (macOS-11, lite, 3.10, 1.13) success
lite-cpu (macOS-11, lite, 3.7, 1.10, oldest) success
lite-cpu (ubuntu-20.04, lite, 3.8, 1.10) success
lite-cpu (ubuntu-20.04, lite, 3.9, 1.11) success
lite-cpu (ubuntu-20.04, lite, 3.10, 1.12) success
lite-cpu (ubuntu-20.04, lite, 3.10, 1.13) success
lite-cpu (ubuntu-20.04, lite, 3.7, 1.10, oldest) success
lite-cpu (windows-2022, lite, 3.9, 1.11) success
lite-cpu (windows-2022, lite, 3.10, 1.12) success
lite-cpu (windows-2022, lite, 3.10, 1.13) success
lite-cpu (windows-2022, lite, 3.7, 1.10, oldest) success
lite-cpu (macOS-11, lightning, 3.8, 1.13) success
lite-cpu (ubuntu-20.04, lightning, 3.8, 1.13) success
lite-cpu (windows-2022, lightning, 3.8, 1.13) success
🟢 lightning_lite: Azure GPU
Check ID Status
lightning-lite (GPUs) success
🟢 mypy
Check ID Status
mypy success
🟢 install
Check ID Status
install-pkg (ubuntu-22.04, app, 3.7) success
install-pkg (ubuntu-22.04, app, 3.10) success
install-pkg (ubuntu-22.04, lite, 3.7) success
install-pkg (ubuntu-22.04, lite, 3.10) success
install-pkg (ubuntu-22.04, pytorch, 3.7) success
install-pkg (ubuntu-22.04, pytorch, 3.10) success
install-pkg (ubuntu-22.04, lightning, 3.7) success
install-pkg (ubuntu-22.04, lightning, 3.10) success
install-pkg (macOS-12, app, 3.7) success
install-pkg (macOS-12, app, 3.10) success
install-pkg (macOS-12, lite, 3.7) success
install-pkg (macOS-12, lite, 3.10) success
install-pkg (macOS-12, pytorch, 3.7) success
install-pkg (macOS-12, pytorch, 3.10) success
install-pkg (macOS-12, lightning, 3.7) success
install-pkg (macOS-12, lightning, 3.10) success
install-pkg (windows-2022, app, 3.7) success
install-pkg (windows-2022, app, 3.10) success
install-pkg (windows-2022, lite, 3.7) success
install-pkg (windows-2022, lite, 3.10) success
install-pkg (windows-2022, pytorch, 3.7) success
install-pkg (windows-2022, pytorch, 3.10) success
install-pkg (windows-2022, lightning, 3.7) success
install-pkg (windows-2022, lightning, 3.10) success

This comment was automatically generated and updates for 60 minutes every 180 seconds.

Thank you for your contribution! 💜

@mergify mergify bot added ready PRs ready to be merged and removed has conflicts ready PRs ready to be merged labels Nov 21, 2022
@Borda Borda enabled auto-merge (squash) November 21, 2022 19:42
@Borda Borda merged commit 88b2e5a into master Nov 21, 2022
@Borda Borda deleted the feature/revert-hydra-cwd branch November 21, 2022 20:19
Borda pushed a commit that referenced this pull request Nov 21, 2022
* revert new hydra cwd behavior
* remove debug statements
* changelog

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>

(cherry picked from commit 88b2e5a)
lantiga added a commit that referenced this pull request Nov 23, 2022
* chlog update

* Fix typo in script name (#15724)

(cherry picked from commit d925077)

* Torch inference mode for prediction (#15719)

torch inference mode for prediction

(cherry picked from commit 08d14ec)

* [App] Update multi-node examples (#15700)

Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: Carlos Mocholí <[email protected]>
(cherry picked from commit 8306797)

* feature(docs/app/lit_tabs): add works (#15731)

(cherry picked from commit 1a31d13)

* [App] Fix VSCode IDE debugger (#15747)

(cherry picked from commit 6714ca7)

* Update tensorboard requirement from <2.11.0,>=2.9.1 to >=2.9.1,<2.12.0 in /requirements (#15746)

Update tensorboard requirement in /requirements

Updates the requirements on [tensorboard](https://github.com/tensorflow/tensorboard) to permit the latest version.
- [Release notes](https://github.com/tensorflow/tensorboard/releases)
- [Changelog](https://github.com/tensorflow/tensorboard/blob/master/RELEASE.md)
- [Commits](tensorflow/tensorboard@2.9.1...2.11.0)

---
updated-dependencies:
- dependency-name: tensorboard
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit 0b58b69)

* Update beautifulsoup4 requirement from <=4.8.2 to <4.11.2 in /requirements (#15745)

* Update beautifulsoup4 requirement in /requirements

Updates the requirements on [beautifulsoup4](https://www.crummy.com/software/BeautifulSoup/bs4/) to permit the latest version.

---
updated-dependencies:
- dependency-name: beautifulsoup4
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* Apply suggestions from code review

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <[email protected]>
(cherry picked from commit 1ffbe1b)

* [App] Fix multi-node pytorch example CI (#15753)

(cherry picked from commit bc797fd)

* [App] Improve `LightningTrainerScript` start-up time (#15751)

(cherry picked from commit c2c1974)

* Enable Probot CheckGroup v5 (#15670)

(cherry picked from commit 6c8ee01)

* [App] Enable properties for the Lightning flow (#15750)

(cherry picked from commit 5cfb176)

* test for Enable setting property (#15755)

Co-authored-by: thomas chaton <[email protected]>
Co-authored-by: Ethan Harris <[email protected]>
(cherry picked from commit ba14038)

* Move s3fs to cloud extras (#15729)

Co-authored-by: Luca Antiga <[email protected]>
(cherry picked from commit dd75906)

* Revert new Hydra launch behavior (#15737)

* revert new hydra cwd behavior
* remove debug statements
* changelog

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>

(cherry picked from commit 88b2e5a)

* FCCV Docs (#15598)

* add custom data iter docs

* add custom data iter docs

* Update docs/source-pytorch/data/custom_data_iterables.rst

* remove ToDevice

* nit

* Update docs/source-pytorch/data/custom_data_iterables.rst

Co-authored-by: Luca Antiga <[email protected]>

* clarification for @lantiga

* typo

* Update docs/source-pytorch/data/custom_data_iterables.rst

* Update docs/source-pytorch/data/custom_data_iterables.rst

* Update docs/source-pytorch/data/custom_data_iterables.rst

Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: Akihiro Nitta <[email protected]>
Co-authored-by: Luca Antiga <[email protected]>
(cherry picked from commit 006fde9)

* Switch from tensorboard to tensorboardx in logger (#15728)

* Switch from tensorboard to tensorboardx in logger
* Warn if log_graph is set to True but tensorboard is not installed
* Fix warning message formatting
* Apply suggestions from code review
* simplify for TBX as required pkg
* docs example
* chlog
* tbx 2.2

Co-authored-by: Luca Antiga <[email protected]>
Co-authored-by: William Falcon <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: Jirka <[email protected]>

(cherry picked from commit 9c2eb52)

* resolve conflicts

* Fix azure path excludes (#15756)

Co-authored-by: Jirka Borovec <[email protected]>
(cherry picked from commit aef94ce)

* Disable XSRF protection in StreamlitFrontend to support upload in localhost (#15684)

* Enable CORS in StreamlitFrontend to support upload
* Only disable XSRF when running on localhost
* Update test
* Use utility fn to detect if localhost

Co-authored-by: Luca Antiga <[email protected]>
(cherry picked from commit ed3eef0)

* Enable Probot CheckGroup v5.1 (#15763)

(cherry picked from commit c55f80f)

* Bump pytest from 7.1.3 to 7.2.0 in /requirements (#15677)

Bumps [pytest](https://github.com/pytest-dev/pytest) from 7.1.3 to 7.2.0.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](pytest-dev/pytest@7.1.3...7.2.0)

---
updated-dependencies:
- dependency-name: pytest
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit cfb27bd)

* Fix the `examples/app_dag` App (#14359)

* Fix app dag example
* Add test
* Update doc
* Update tests/tests_app_examples/test_app_dag.py

Co-authored-by: Sherin Thomas <[email protected]>
(cherry picked from commit 2b61c92)

* mergify: drop ready for draft (#15766)

(cherry picked from commit 1a07a9c)

* lightning delete cluster CLI command help text update (#15760)

* updated the lighting delete cluster CLI command help text output
* updated changelog
* typo fix
* Apply suggestions from code review

Co-authored-by: Jirka Borovec <[email protected]>
(cherry picked from commit 75b0573)

* Deduplicate top level lighting CLI command groups (#15761)

* unify remove and delete command groups & the add and delete command groups
* added changelog
* fix tests
* Apply suggestions from code review

Co-authored-by: Jirka Borovec <[email protected]>
(cherry picked from commit 7b2788e)

* releasing 1.8.3

* CI: lite on GPU

* Fix App Docs for lightning ssh-keys command (#15773)

fixed ssh-keys docs

(cherry picked from commit 317591d)

Co-authored-by: thomas chaton <[email protected]>
Co-authored-by: yiftachbeer <[email protected]>
Co-authored-by: Sherin Thomas <[email protected]>
Co-authored-by: Ethan Harris <[email protected]>
Co-authored-by: Yurij Mikhalevich <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Carlos Mocholí <[email protected]>
Co-authored-by: Luca Antiga <[email protected]>
Co-authored-by: Adrian Wälchli <[email protected]>
Co-authored-by: Justus Schock <[email protected]>
Co-authored-by: Kaushik B <[email protected]>
Co-authored-by: Rick Izzo <[email protected]>
nisheethlahoti added a commit to nisheethlahoti/lightning that referenced this pull request Jul 27, 2023
* Create different hydra output subdirectories for processes started by DDP
* Support experimental-rerun
* If rerun is not enabled but multi-run used, raise explicit error
Reverts parts of Lightning-AI#15737
nisheethlahoti added a commit to nisheethlahoti/lightning that referenced this pull request Jul 27, 2023
* Create different hydra output subdirectories for processes started by DDP
* Support experimental-rerun
* If rerun is not enabled but multi-run used, raise explicit error
Reverts parts of Lightning-AI#15737
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3rd party Related to a 3rd-party breaking change Includes a breaking change pl Generic label for PyTorch Lightning package ready PRs ready to be merged strategy: ddp DistributedDataParallel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hydra + DDP Fails for NeMo after Hydra refactor in 1.8
6 participants