Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Bump OpenDAL 0.46, arrow 51, tonic 0.11, reqwest 0.12, hyper 1, http 1 #15442

Merged
merged 28 commits into from
May 11, 2024

Conversation

Xuanwo
Copy link
Member

@Xuanwo Xuanwo commented May 8, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

This PR will bump the following pacakges:

  • OpenDAL 0.46
  • arrow 51
  • tonic 0.11
  • reqwest 0.12
  • hyper 1
  • http 1

I have to submit them all in one PR to make sure databend can build.

Changes that worth of mention

OpenDAL v0.46 has transitioned from AsyncRead-based IO to Range-based IO. While some internal implementations have been refactored, not all are performing optimally. We need to address this regression.

We have transitioned to range-based IO for both Parquet and native formats; please give this area extra attention.

To minimize the number of changes in the PR, I refactored directly without robust abstraction, such as using opendal::Reader directly in the native format API. I intend to refine this aspect in upcoming PRs.

Benchmarks

Concurrent reading is enabled only for loading CSV/TSV files; other queries remain unaffected. I will first ensure there are no regressions, and then I'll optimize Parquet reading.

Qeury Performance

hits

image

tpch

image

Load (with 2 concurrent)

image

Load (with 4 concurrent)

image

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@Xuanwo Xuanwo changed the title Upgrade opendal refactor: Bump OpenDAL to 0.46, arrow to 51, tonic to 0.11, reqwest to 0.12, hyper to 1 May 8, 2024
@github-actions github-actions bot added the pr-refactor this PR changes the code base without new features or bugfix label May 8, 2024
@Xuanwo Xuanwo changed the title refactor: Bump OpenDAL to 0.46, arrow to 51, tonic to 0.11, reqwest to 0.12, hyper to 1 refactor: Bump OpenDAL 0.46, arrow 51, tonic 0.11, reqwest 0.12, hyper 1, http 1 May 8, 2024
@Xuanwo

This comment was marked as resolved.

Signed-off-by: Xuanwo <[email protected]>
@Xuanwo Xuanwo added the ci-benchmark Benchmark: run all test label May 9, 2024
@Xuanwo Xuanwo marked this pull request as ready for review May 9, 2024 14:15
@Xuanwo
Copy link
Member Author

Xuanwo commented May 9, 2024

I'm waiting for the benchmark results.

Copy link
Contributor

github-actions bot commented May 9, 2024

Docker Image for PR

  • tag: pr-15442-9e07209

note: this image tag is only available for internal use,
please check the internal doc for more details.

@Xuanwo Xuanwo marked this pull request as draft May 9, 2024 15:12
@Xuanwo
Copy link
Member Author

Xuanwo commented May 9, 2024

The benchmark seems quiet slow, I'm working on this.

@Xuanwo Xuanwo removed the ci-benchmark Benchmark: run all test label May 9, 2024
@Xuanwo Xuanwo added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels May 10, 2024
Copy link
Contributor

Docker Image for PR

  • tag: pr-15442-3c97076

note: this image tag is only available for internal use,
please check the internal doc for more details.

@Xuanwo Xuanwo added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels May 10, 2024
Copy link
Contributor

Docker Image for PR

  • tag: pr-15442-9013fda

note: this image tag is only available for internal use,
please check the internal doc for more details.

@Xuanwo Xuanwo added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels May 10, 2024
@Xuanwo Xuanwo marked this pull request as ready for review May 10, 2024 17:21
Copy link
Contributor

Docker Image for PR

  • tag: pr-15442-41987a4

note: this image tag is only available for internal use,
please check the internal doc for more details.

@Xuanwo
Copy link
Member Author

Xuanwo commented May 11, 2024

Hi, @youngsofun, this PR is ready for review now!

Signed-off-by: Xuanwo <[email protected]>
@BohuTANG
Copy link
Member

image The data load times improved significantly in OpenDAL 0.46, but query performance as measured by the hits benchmark seems unchanged. What specific optimizations did 0.46 include that sped up loading but not queries?

@Xuanwo
Copy link
Member Author

Xuanwo commented May 11, 2024

What specific optimizations did 0.46 include that sped up loading but not queries?

OpenDAL v0.46 introduces a new feature known as concurrent read, also referred to as auto ranged read. This feature allows us to read large files with concurrency.

However, I've only enabled concurrency in a few continuous reading areas like the bytes reader. Other areas remain unchanged, and I'm satisfied with the current performance since there's no significant regression. I plan to optimize additional areas in upcoming PRs.

@Xuanwo Xuanwo added this pull request to the merge queue May 11, 2024
Merged via the queue into databendlabs:main with commit 220787d May 11, 2024
78 checks passed
@Xuanwo Xuanwo deleted the upgrade-opendal branch May 11, 2024 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-benchmark Benchmark: run all test pr-refactor this PR changes the code base without new features or bugfix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants