Performance issues with defining remote functions and actor classes from within tasks. #6240

robertnishihara · 2019-11-22T02:50:32Z

Consider the following code.

import ray
ray.init(num_cpus=10)

@ray.remote
def f():
    @ray.remote
    def g():
        return 1
    return ray.get(g.remote())

ray.get([f.remote() for _ in range(10)])

If the 10 copies of f are scheduled on 10 different workers, they will all define g. Each copy of g will be pickled and exported through Redis and then imported by each worker process. So there is an N^2 effect here.

Ideally, we would deduplicate the imports. However, there doesn't appear to be an easy way to determine if the g functions that are exported are actually the same or not. If you just look at the body of the function (e.g., with inspect.getsource), then you will think that two functions are the same if they have the same body but close over different variables in the environment. We can compare the serialized strings generated by cloudpickle, but cloudpickle is nondeterministic, so the same function pickled in different processes will often give rise to different strings. Therefore not enough deduplication will happen.

In #6175, we're settling for not doing any deduplication but giving a warning whenever the source returned by inspect.getsource looks the same.

The longer term solution will likely be to remove the N^2 effect, e.g., perhaps by treating remote functions as objects stored in the object store (instead of Redis) or perhaps by having the workers pull the remote functions from Redis when needed (instead of pushing the remote functions proactively to the workers).

Workaround

Modify the above code to define the remote function on the driver instead. E.g.,

import ray
ray.init(num_cpus=10)

@ray.remote
def g():
    return 1

@ray.remote
def f():
    return ray.get(g.remote())

ray.get([f.remote() for _ in range(10)])

You can look at the different values of len(ray.worker.global_worker.redis_client.lrange('Exports', 0, -1)) produced by the two workloads.

The text was updated successfully, but these errors were encountered:

markgoodhead · 2020-06-07T08:03:23Z

FYI I get this issue for Ray Tune's internals which sent me here:

WARNING import_thread.py:126 -- The actor 'WrappedFunc' has been exported 100 times. It's possible that this warning is accidental, but this may indicate that the same remote function is being defined repeatedly from within many tasks and exported to all of the workers. This can be a performance issue and can be resolved by defining the remote function on the driver instead. See #6240 for more discussion.

I'm using remote calls within my trainable function (each Tune task has 3k sub tasks) but I'm defining it outside of the Trainable function, like in the second example) so I'm not sure why this would still apply?

amogkam · 2020-07-29T03:27:23Z

@markgoodhead WrappedFunc is part of the Tune internals. The warning shouldn't be due to your subtasks.

yywangvr · 2021-03-17T08:56:46Z

FYI I get this issue for Ray Tune's internals which sent me here:

WARNING import_thread.py:126 -- The actor 'WrappedFunc' has been exported 100 times. It's possible that this warning is accidental, but this may indicate that the same remote function is being defined repeatedly from within many tasks and exported to all of the workers. This can be a performance issue and can be resolved by defining the remote function on the driver instead. See #6240 for more discussion.

I'm using remote calls within my trainable function (each Tune task has 3k sub tasks) but I'm defining it outside of the Trainable function, like in the second example) so I'm not sure why this would still apply?

I get the same warning; how can I solve it?

oakkas · 2021-09-20T15:47:18Z

I also have the same issue. I defined two functions which one remote task calls other one 100s of times depending on the number of rows.

jmakov · 2021-09-23T23:20:00Z

Got the same warning with Tune. No ray calls in my tunable function.

rkooo567 self-assigned this Oct 12, 2020

rkooo567 added enhancement Request for new feature and/or capability P2 Important issue, but not time-critical labels Oct 12, 2020

yywangvr mentioned this issue Mar 17, 2021

[core] The remote function has been exported 100 times.. #14730

Open

richardliaw mentioned this issue Apr 28, 2021

[Data Loading] Round-based per-epoch shuffling data loader for distributed training. #15531

Closed

wiljohnhong mentioned this issue May 13, 2021

May be performance issues when redefining lots of remote actors sjtu-marl/malib#1

Closed

andras-kth mentioned this issue Sep 20, 2021

[Feature][rllib/tune] Deprecate RLLib's rollout/evaluate in favor of tune.run(training=False) #18758

Closed

2 tasks

anna-geller mentioned this issue Apr 6, 2022

RayTaskRunner performance could be potentially improved based on the warning message PrefectHQ/prefect#5653

Closed

PKUFlyingPig mentioned this issue Jun 15, 2022

[RUNTIME] initialize move_worker in driver process alpa-projects/alpa#513

Merged

cmsxbc mentioned this issue Jul 7, 2022

Warnning from ray backend:The remote function 'ray.util.client.ommon.get_class_info' has been exported 100 times. mars-project/mars#3181

Closed

rkooo567 removed their assignment Dec 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issues with defining remote functions and actor classes from within tasks. #6240

Performance issues with defining remote functions and actor classes from within tasks. #6240

robertnishihara commented Nov 22, 2019 •

edited

Loading

markgoodhead commented Jun 7, 2020

amogkam commented Jul 29, 2020

yywangvr commented Mar 17, 2021 •

edited

Loading

oakkas commented Sep 20, 2021

jmakov commented Sep 23, 2021

Performance issues with defining remote functions and actor classes from within tasks. #6240

Performance issues with defining remote functions and actor classes from within tasks. #6240

Comments

robertnishihara commented Nov 22, 2019 • edited Loading

Workaround

markgoodhead commented Jun 7, 2020

amogkam commented Jul 29, 2020

yywangvr commented Mar 17, 2021 • edited Loading

oakkas commented Sep 20, 2021

jmakov commented Sep 23, 2021

robertnishihara commented Nov 22, 2019 •

edited

Loading

yywangvr commented Mar 17, 2021 •

edited

Loading