-
Notifications
You must be signed in to change notification settings - Fork 750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Bump OpenDAL 0.46, arrow 51, tonic 0.11, reqwest 0.12, hyper 1, http 1 #15442
Conversation
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
This comment was marked as resolved.
This comment was marked as resolved.
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
I'm waiting for the benchmark results. |
Docker Image for PR
|
The benchmark seems quiet slow, I'm working on this. |
Signed-off-by: Xuanwo <[email protected]>
Docker Image for PR
|
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Docker Image for PR
|
Signed-off-by: Xuanwo <[email protected]>
…to upgrade-opendal
Docker Image for PR
|
Hi, @youngsofun, this PR is ready for review now! |
Signed-off-by: Xuanwo <[email protected]>
OpenDAL v0.46 introduces a new feature known as concurrent read, also referred to as auto ranged read. This feature allows us to read large files with concurrency. However, I've only enabled concurrency in a few continuous reading areas like the bytes reader. Other areas remain unchanged, and I'm satisfied with the current performance since there's no significant regression. I plan to optimize additional areas in upcoming PRs. |
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
This PR will bump the following pacakges:
I have to submit them all in one PR to make sure databend can build.
Changes that worth of mention
OpenDAL v0.46 has transitioned from AsyncRead-based IO to Range-based IO. While some internal implementations have been refactored, not all are performing optimally. We need to address this regression.
We have transitioned to range-based IO for both Parquet and native formats; please give this area extra attention.
To minimize the number of changes in the PR, I refactored directly without robust abstraction, such as using opendal::Reader directly in the native format API. I intend to refine this aspect in upcoming PRs.
Benchmarks
Concurrent reading is enabled only for loading CSV/TSV files; other queries remain unaffected. I will first ensure there are no regressions, and then I'll optimize Parquet reading.
Qeury Performance
hits
tpch
Load (with 2 concurrent)
Load (with 4 concurrent)
Tests
Type of change
This change is