You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR forces a `SPREAD` scheduling strategy for scan tasks when using
the Ray runner. This should result in better load balancing of read
tasks across the Ray cluster, yielding:
- better utilization of the aggregate network bandwidth of the cluster,
- better memory stability due to a more even post-read object
distribution,
- better performance of downstream parallel compute operations due to a
more even distribution of data over the compute bandwidth of the
cluster.
Closes#1940
This PR forces a `SPREAD` scheduling strategy for scan tasks when using
the Ray runner. This should result in better load balancing of read
tasks across the Ray cluster, yielding:
- better utilization of the aggregate network bandwidth of the cluster,
- better memory stability due to a more even post-read object
distribution,
- better performance of downstream parallel compute operations due to a
more even distribution of data over the compute bandwidth of the
cluster.
Closes#1940
This should be a pretty simple fix:
When scheduling any tasks involving a read, we should spread them across the cluster (similar to what we do for reduces)
The text was updated successfully, but these errors were encountered: