-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tune] Disable pytorch-lightning multiprocessing per default #28335
Conversation
Signed-off-by: Kai Fricke <[email protected]>
Thanks @krfricke! Seems this is a problem only with PTL 1.7 (Lightning-AI/pytorch-lightning#14292). Should we update the PTL version used in CI to test that these changes work? |
Good idea, I'll add this to this PR |
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
@amogkam unfortunately we land in a dependency loop here - we can't upgrade to PTL 1.7.X as ray-lightning 0.3.0 is not compatible, but upgrading ray lightning for compatibility requires changes to the library - and the CI won't pass as the trials can hang, as this fix is not merged. |
Signed-off-by: Kai Fricke [email protected]
Why are these changes needed?
Pytorch lightning uses multiprocessing pools per default (e.g. for device lookup), which can lead to hangs (see #28328). This PR sets an environment variable to disable this until #28328 is addressed.
Related issue number
Closes #28197
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.