-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use hydra.run.dir (not os.getcwd) for DDP subprocesses' run dir #18145
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #18145 +/- ##
=========================================
- Coverage 83% 61% -22%
=========================================
Files 431 426 -5
Lines 32799 32710 -89
=========================================
- Hits 27364 19936 -7428
- Misses 5435 12774 +7339 |
All the failing tests are failing with "ModuleNotFound: torch", which cannot possibly be caused by this PR. |
@aniketmaurya can you look into this once? We discussed this in our call yesterday. |
@nisheethlahoti Thanks for the PR. The change here looks good overall, but we need to make sure we don't regress on the issue here: #15689 If you would be so kind to go over these two issues to check whether there are any conflicts between your approach and the concerns raised there. That would be very helpful (forgive me, I'm a bit unfamiliar with hydra).
Don't worry about this, we will take care of it once the PR can be merged :) |
The behaviour defined in #11617 actually seems better than what's proposed here (since it handles multi-run properly and has separate subdirectories for all the different PL processes), with 1 exception: that it actually tries to read the config from the However, this could also be fixed by reinstating the code in #11617, removing the line |
Update to previous comment: Removing the
|
Thanks @nisheethlahoti for your helpful analysis! Based on this I suggest we move forward with your PR as is.
This is good news. I would suggest that we break it in two parts, this PR + multi-run in a different PR. |
Would it be possible to add a small unit test in |
Done |
tests/tests_pytorch/strategies/launchers/test_subprocess_script.py
Outdated
Show resolved
Hide resolved
Awesome! Thank you @nisheethlahoti 🎉 |
Co-authored-by: Jirka Borovec <[email protected]> (cherry picked from commit ebbd538)
Co-authored-by: Jirka Borovec <[email protected]> (cherry picked from commit ebbd538)
What does this PR do?
Fixes #16694
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Reviewer checklist