-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issues with defining remote functions and actor classes from within tasks. #6240
Comments
FYI I get this issue for Ray Tune's internals which sent me here: WARNING import_thread.py:126 -- The actor 'WrappedFunc' has been exported 100 times. It's possible that this warning is accidental, but this may indicate that the same remote function is being defined repeatedly from within many tasks and exported to all of the workers. This can be a performance issue and can be resolved by defining the remote function on the driver instead. See #6240 for more discussion. I'm using remote calls within my trainable function (each Tune task has 3k sub tasks) but I'm defining it outside of the Trainable function, like in the second example) so I'm not sure why this would still apply? |
@markgoodhead |
I get the same warning; how can I solve it? |
I also have the same issue. I defined two functions which one remote task calls other one 100s of times depending on the number of rows. |
Got the same warning with Tune. No ray calls in my tunable function. |
Consider the following code.
If the 10 copies of
f
are scheduled on 10 different workers, they will all defineg
. Each copy ofg
will be pickled and exported through Redis and then imported by each worker process. So there is an N^2 effect here.Ideally, we would deduplicate the imports. However, there doesn't appear to be an easy way to determine if the
g
functions that are exported are actually the same or not. If you just look at the body of the function (e.g., withinspect.getsource
), then you will think that two functions are the same if they have the same body but close over different variables in the environment. We can compare the serialized strings generated by cloudpickle, but cloudpickle is nondeterministic, so the same function pickled in different processes will often give rise to different strings. Therefore not enough deduplication will happen.In #6175, we're settling for not doing any deduplication but giving a warning whenever the source returned by
inspect.getsource
looks the same.The longer term solution will likely be to remove the N^2 effect, e.g., perhaps by treating remote functions as objects stored in the object store (instead of Redis) or perhaps by having the workers pull the remote functions from Redis when needed (instead of pushing the remote functions proactively to the workers).
Workaround
Modify the above code to define the remote function on the driver instead. E.g.,
You can look at the different values of
len(ray.worker.global_worker.redis_client.lrange('Exports', 0, -1))
produced by the two workloads.The text was updated successfully, but these errors were encountered: