-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug in which remote function redefinition doesn't happen. #6175
Conversation
Can one of the admins verify this patch? |
Test FAILed. |
@edoakes I just tried the following on 3840 cores. @ray.remote
def f():
@ray.remote
def g():
return 1
return ray.get(g.remote())
%time results = ray.get([g.remote() for _ in range(3840 // 2)]) It took 1 minute, which of course is absurdly slow because of the N^2 effect. One possible way to help (though certainly not the perfect fix) is to detect the scenario where the N^2 effect is happening (e.g., by detecting that many of the remote functions defined have the same source code) and give a good warning to the user (e.g., tell them to pull the remote function definition back to the driver instead of having it defined in the workers. |
Test PASSed. |
Test PASSed. |
Test PASSed. |
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed this looks good, let's just be sure to give a good warning when there are many redefinitions and fix it by pulling from workers in the near future.
a6fd9a5
to
dfe9eba
Compare
Test PASSed. |
dfe9eba
to
8ef02fe
Compare
Test FAILed. |
python/ray/function_manager.py
Outdated
# Source code may not be available: | ||
# e.g. Cython or Python interpreter. | ||
function_source_hash = b"" | ||
hasher = hashlib.sha1() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just do hashlib.sha1(pickled_function).digest()
python/ray/function_manager.py
Outdated
"""The identifier is used to detect excessive duplicate exports. | ||
|
||
The identifier is used to determine when the same function or class is | ||
exported many times. This can yielf false positives. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
python/ray/function_manager.py
Outdated
collision_identifier = function_or_class.__name__ | ||
|
||
# Return a hash of the identifier in case it is too large. | ||
hasher = hashlib.sha1() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
python/ray/import_thread.py
Outdated
@@ -30,13 +32,19 @@ class ImportThread(object): | |||
redis_client: the redis client used to query exports. | |||
threads_stopped (threading.Event): A threading event used to signal to | |||
the thread that it should exit. | |||
imported_collision_identifiers: This is a dicitonary mapping strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imported_collision_identifiers: This is a dicitonary mapping strings | |
imported_collision_identifiers: This is a dictionary mapping strings |
python/ray/import_thread.py
Outdated
@@ -30,13 +32,19 @@ class ImportThread(object): | |||
redis_client: the redis client used to query exports. | |||
threads_stopped (threading.Event): A threading event used to signal to | |||
the thread that it should exit. | |||
imported_collision_identifiers: This is a dicitonary mapping strings | |||
containing the collision identifier of the remote functions that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strings containing the collision identifier of the remote functions that have been exported
-> collision identifiers for exported remote functions
python/ray/import_thread.py
Outdated
imported_collision_identifiers: This is a dicitonary mapping strings | ||
containing the collision identifier of the remote functions that | ||
have been exported to the number of times each remote function has | ||
been imported. this is used to provide good error messages when the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
been imported. this is used to provide good error messages when the | |
been imported. This is used to provide good error messages when the |
python/ray/import_thread.py
Outdated
containing the collision identifier of the remote functions that | ||
have been exported to the number of times each remote function has | ||
been imported. this is used to provide good error messages when the | ||
same function is exported many many times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same function is exported many many times. | |
same function is exported many times. |
python/ray/import_thread.py
Outdated
if key.startswith(b"RemoteFunction"): | ||
collision_identifier, function_name = ( | ||
self.redis_client.hmget( | ||
key, ["collision_identifier", "name"])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
key, ["collision_identifier", "name"])) | |
key, ["collision_identifier", "function_name"])) |
This would be more clear and matches class_name
used for actors. If this requires making many changes, don't worry about it.
python/ray/import_thread.py
Outdated
"https://github.com/ray-project/ray/issues/6240 for more " | ||
"discussion.") | ||
|
||
if key.startswith(b"RemoteFunction"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we abstract this whole if/else
block into a helper function that returns a collision identifier, type, and name? Better to separate out the redis implementation details from the actual logic.
Test PASSed. |
Test PASSed. |
76085d9
to
8cf3e5a
Compare
Test FAILed. |
dis
module.dis.dis
is sufficiently nondeterministic that we don't always get the warning when we should. However, I haven't noticed this happening.