Performance benchmark on RayDP v.s. Spark #340

chenya-zhang · 2023-05-05T21:13:11Z

Hi there,

In the talk "RayDP: Build Large-scale End-to-end Data Analytics and AI Pipelines Using Spark and Ray" https://youtu.be/ELSrR1Geqg4?t=819, @carsonwang mentioned that RayDP would have better performance.

We are curious which type of queries / workflows you run and your analysis on the performance differences.

Thanks a lot!

carsonwang · 2023-05-06T06:45:29Z

Hi @chenya-zhang , there is a plan to integrate RayDP with Gluten which offloads the sql operations to native engine such as Velox. For TPC-H or TPC-DS like benchmark, we observed more than 2x speedup. You can find more details from the Gluten project https://github.com/oap-project/gluten.

We are also running RayDP + XGBoost on Ray workflows and observed performance advantage over running XGBoost on Spark. We will share more once the data is ready to publish.

rishabh-dream11 · 2024-03-01T20:47:07Z

Hi @carsonwang, Can you please share the performance benchmark numbers for Ray + XGBoost vs XGboost on Spark.

rishabh-dream11 · 2024-03-01T20:48:45Z

@carsonwang Did the plan to integrate RayDP with Gluten materialize?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance benchmark on RayDP v.s. Spark #340

Performance benchmark on RayDP v.s. Spark #340

chenya-zhang commented May 5, 2023 •

edited

Loading

carsonwang commented May 6, 2023

rishabh-dream11 commented Mar 1, 2024

rishabh-dream11 commented Mar 1, 2024

Performance benchmark on RayDP v.s. Spark #340

Performance benchmark on RayDP v.s. Spark #340

Comments

chenya-zhang commented May 5, 2023 • edited Loading

carsonwang commented May 6, 2023

rishabh-dream11 commented Mar 1, 2024

rishabh-dream11 commented Mar 1, 2024

chenya-zhang commented May 5, 2023 •

edited

Loading