-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Prototype] [Core] Make expensive subpackage imports dynamic. #27658
[Prototype] [Core] Make expensive subpackage imports dynamic. #27658
Conversation
def __getattr__(name: str): | ||
if name in _subpackages: | ||
return importlib.import_module("." + name, __name__) | ||
raise AttributeError(f"module {__name__!r} has no attribute {name!r}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that, similar to an object's __getattr__
method, this __getattr__
will only be called if normal attribute lookup (e.g. via module.__getattribute__
) fails, so this will not run for any of the above subpackages that are eagerly imported and should only run for the packages in _subpackages
(or non-existent module attributes, which will error anyway).
# TODO(Clark): Remove this one we drop Python 3.6 support. | ||
from ray import autoscaler # noqa: F401 | ||
from ray import data # noqa: F401 | ||
from ray import workflow # noqa: F401 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could implement something similar for Python 3.6 via a "replace module with class hack":
class module(types.ModuleType):
__all__ = __all__
def __init__(self, orig_module: types.ModuleType):
self._orig_module = orig_module
def __getattr__(self, name: str):
try:
return getattr(self._orig_module, name)
except AttributeError as e:
if name in _subpackages:
return importlib.import_module("." + name, self._orig_module.__name__)
raise e from None
sys.modules[__name__] = module(sys.modules[__name__])
But IMO this is more hacky than its worth since we're going to be dropping Python 3.6 support in the near-term.
a0a233c
to
5e8328f
Compare
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message. Please feel free to reopen or open a new issue if you'd still like it to be addressed. Again, you can always ask for help on our discussion forum or Ray's public slack channel. Thanks again for opening the issue! |
29d05e9
to
d202401
Compare
0a1de12
to
486a339
Compare
This one seems to be pretty good? What was the original reason why we decided to defer this PR? |
I'd love to merge it. What's the main concern of this approach? |
@rkooo567 it was originally deferred because the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Maybe we can just add minimal tests? Also the approach seems reasonable. Is there any concern on top of your head doing this?
from ray import internal # noqa: E402,F401 | ||
from ray import util # noqa: E402 | ||
from ray import util # noqa: E402,F401 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just OOC, why did it work before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure 🧐
|
||
del os | ||
del logging | ||
del sys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just leave it as it is? A little concern there might have been a reason why we didn't del... haha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We previously weren't importing sys
in this module, so this shouldn't be a change in behavior. sys
import was added here: https://github.com/ray-project/ray/pull/27658/files#diff-f95026a08bcb464b58b036437876716d21d3b8630e61258303bcd5384d1d707cR4
Also can you merge the latest master? I'd like to run many_tasks to see if the perf regression is fixed |
@@ -150,7 +150,7 @@ def check(self): | |||
runtime_env = {"py_modules": [S3_PACKAGE_URI]} | |||
|
|||
# Note: We should set a bigger timeout if downloads the s3 package slowly. | |||
get_timeout = 10 | |||
get_timeout = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this test even valid with this reduced timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this change in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test fails otherwise, where the timeout error is never raised
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will merge after a couple runs of many_Tasks. Please do not merge latest master @clarkzinzow. I'd like to check if this fixes the issue without @jiaodong'sPR that will be merged soon.
@@ -150,7 +150,7 @@ def check(self): | |||
runtime_env = {"py_modules": [S3_PACKAGE_URI]} | |||
|
|||
# Note: We should set a bigger timeout if downloads the s3 package slowly. | |||
get_timeout = 10 | |||
get_timeout = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this change in this PR?
@@ -47,6 +54,43 @@ def test_non_ray_modules(): | |||
assert "ray" in str(mod), f"Module {mod} should not be reachable via ray.{name}" | |||
|
|||
|
|||
def test_dynamic_subpackage_import(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love the test!
@rkooo567 sounds good, although before merging we should double-check the failing |
promising sign btw. It passes the test https://buildkite.com/ray-project/release-tests-pr/builds/18731#0183f7a5-b88f-4269-b1ba-b03fdf4f9c8e. I am rerunning it |
@rkooo567 No spooky CI failures in the most recent run, worker registration timeout adjustment for the runtime env test appears to have done the trick. |
@rkooo567 All currently failing tests were flaky in master at the time of the run, do you think this is good to merge? |
Tests are looking good, merging! |
…ray-project#27658)" This reverts commit 241a02e.
This is a quick and relatively safer attempt to address ray-project#29324 In ray-project#28418 we attempted to unify ray.air utils with shared utils function but triggered expensive ray.data imports. Where longer term and more robust solution should be ray-project#27658 Signed-off-by: Weichen Xu <[email protected]>
…oject#27658) Certain Ray subpackages are expensive to import, either due to their size, dependencies, or their import-time logic that must be executed. E.g. a change in a Datasets import caused a regression for the many_tasks nightly test, despite many_tasks only using Ray Core. This PR delays the import of these expensive subpackages until attribute access, e.g. none of the Datasets code will be run until ray.data is accessed. Signed-off-by: Weichen Xu <[email protected]>
…ray-project#27658)" (ray-project#29659) This reverts commit 241a02e, reverting PR ray-project#27658. This PR was making some GCS tests flaky (somehow). Signed-off-by: Weichen Xu <[email protected]>
…dynamic. (ray-project#27658)" (ray-project#29659)" (ray-project#30219) This reverts commit b0bd270. Signed-off-by: Weichen Xu <[email protected]>
Certain Ray subpackages are expensive to import, either due to their size, dependencies, or their import-time logic that must be executed. E.g. a change in a Datasets import caused a regression for the
many_tasks
nightly test, despitemany_tasks
only using Ray Core.This PR delays the import of these expensive subpackages until attribute access, e.g. none of the Datasets code will be run until
ray.data
is accessed.Related issue number
Closes #27606, closes #29557
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.