-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PERF] Spread scan tasks over Ray cluster. #1950
Conversation
) | ||
] | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decided to move these to execution_step.py
since it's a more sensible location for them, and to not introduce an otherwise unnecessary ray_runner.py
-> rust_physical_plan_shim.py
dependency.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1950 +/- ##
=======================================
Coverage 83.93% 83.93%
=======================================
Files 55 55
Lines 6111 6112 +1
=======================================
+ Hits 5129 5130 +1
Misses 982 982
|
BenchmarkingSetup28 GiB Parquet data, 32 partitions, 8 i3.2xlarge Ray nodes each with 61 GiB RAM (18.3 GiB object store) ResultsBefore PR
After PR
|
I'm considering this change to be validated, merging now! |
This PR forces a `SPREAD` scheduling strategy for scan tasks when using the Ray runner. This should result in better load balancing of read tasks across the Ray cluster, yielding: - better utilization of the aggregate network bandwidth of the cluster, - better memory stability due to a more even post-read object distribution, - better performance of downstream parallel compute operations due to a more even distribution of data over the compute bandwidth of the cluster. Closes #1940
This PR forces a
SPREAD
scheduling strategy for scan tasks when using the Ray runner. This should result in better load balancing of read tasks across the Ray cluster, yielding:Closes #1940