Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implements row group level parallel unordered scanner #3992

Merged
merged 18 commits into from
May 29, 2024

Conversation

evenyag
Copy link
Contributor

@evenyag evenyag commented May 20, 2024

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

This PR changes the UnorderedScan to read parquet row groups in parallel to improve scan performance. It also implements the RegionScanner trait for UnorderedScan.

  • Adds build_parts method to ScanInput to collect all FileRanges (row groups) and memtables
    • ranges and memtables are distributed by parallelism
  • UnorderedScan can return parallelism streams
  • Adds CompatBatch to adapt batch schema.
    • We might remove the CompatReader in the future as CompatBatch doesn't require implementing the BatchReader trait.

It also defines a ScanPartBuilder trait to allow different parallel scan strategies.

pub(crate) trait ScanPartBuilder {
    fn set_parallelism(&mut self, parallelism: usize);

    fn append_file_ranges(&mut self, file_ranges: impl Iterator<Item = FileRange>);

    fn build_parts(self, memtables: &[MemtableRef]) -> Vec<ScanPart>;
}

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.

@github-actions github-actions bot added the docs-not-required This change does not impact docs. label May 20, 2024
@evenyag evenyag changed the title feat: Implements row group level unordered scanner feat: Implements row group level parallel unordered scanner May 20, 2024
@evenyag evenyag force-pushed the feat/unordered-scanner branch 5 times, most recently from 85e9cf2 to 209cce9 Compare May 21, 2024 09:03
@evenyag evenyag marked this pull request as ready for review May 21, 2024 09:11
Copy link

codecov bot commented May 21, 2024

Codecov Report

Attention: Patch coverage is 80.85938% with 49 lines in your changes are missing coverage. Please review.

Project coverage is 85.01%. Comparing base (dfc1acb) to head (c10f839).
Report is 23 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3992      +/-   ##
==========================================
- Coverage   85.44%   85.01%   -0.43%     
==========================================
  Files         980      987       +7     
  Lines      170140   171396    +1256     
==========================================
+ Hits       145379   145716     +337     
- Misses      24761    25680     +919     

@evenyag evenyag requested a review from fengjiachun May 23, 2024 12:34
src/mito2/src/read/unordered_scan.rs Outdated Show resolved Hide resolved
src/mito2/src/sst/file.rs Outdated Show resolved Hide resolved
src/mito2/src/read/scan_region.rs Outdated Show resolved Hide resolved
src/mito2/src/read/scan_region.rs Outdated Show resolved Hide resolved
src/mito2/src/read/scan_region.rs Outdated Show resolved Hide resolved
replaces ScanPartBuilder with FileRangeCollector which only collect file
ranges
@evenyag evenyag requested a review from waynexia May 24, 2024 13:54
Copy link
Collaborator

@fengjiachun fengjiachun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@killme2008 killme2008 mentioned this pull request May 27, 2024
8 tasks
@evenyag evenyag requested a review from a team as a code owner May 28, 2024 11:38
@waynexia waynexia enabled auto-merge May 29, 2024 10:54
@waynexia
Copy link
Member

🛫

@waynexia waynexia added this pull request to the merge queue May 29, 2024
Merged via the queue into GreptimeTeam:main with commit 848bd7e May 29, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants