Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement RegionScanner for SeqScan #4060

Merged
merged 39 commits into from
Jun 12, 2024

Conversation

evenyag
Copy link
Contributor

@evenyag evenyag commented May 28, 2024

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

This PR implements RegionScanner for SeqScan.

  • It changes the definition of ScanPart so it can represent the time range of the part and keeps ranges from different files separately.
  • It moves StreamContext and ScanPartList to the scan_region mod so we can reuse it in SeqScan.
  • Adds SeqDistributor to organize parts for parallel scan
    • It groups parts by their time range and yields parts with non-overlapping ranges
    • If the number of ranges to scan is greater than parallelism, it merges small ranges

If there are multiple files to read in a time range, it will spawn a task to read each file. The parallelism is limited by a semaphore in the scanner.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.

@github-actions github-actions bot added the docs-not-required This change does not impact docs. label May 28, 2024
@evenyag evenyag force-pushed the feat/seq-scanner branch 3 times, most recently from 3e15ab6 to 6a04fa5 Compare June 5, 2024 10:14
@evenyag evenyag marked this pull request as ready for review June 5, 2024 13:01
@evenyag evenyag requested review from v0y4g3r, waynexia and a team as code owners June 5, 2024 13:01
@evenyag evenyag mentioned this pull request Jun 6, 2024
7 tasks
@evenyag evenyag requested a review from fengjiachun June 6, 2024 03:29
Copy link

codecov bot commented Jun 6, 2024

Codecov Report

Attention: Patch coverage is 91.52542% with 45 lines in your changes missing coverage. Please review.

Project coverage is 85.17%. Comparing base (587e99d) to head (09ad54a).
Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4060      +/-   ##
==========================================
- Coverage   85.46%   85.17%   -0.30%     
==========================================
  Files         994      994              
  Lines      174348   174947     +599     
==========================================
  Hits       149005   149005              
- Misses      25343    25942     +599     

Copy link
Collaborator

@fengjiachun fengjiachun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/mito2/src/read/scan_region.rs Show resolved Hide resolved
@waynexia
Copy link
Member

waynexia commented Jun 6, 2024

If there are multiple files to read in a time range, it will spawn a task to read each file. So the number of parallel tasks may be greater than parallelism.

This seems to be a risky thing. Can we add some indirect layer like a queue or something else to take back the control of parallelism?

@evenyag evenyag marked this pull request as draft June 6, 2024 12:58
@evenyag
Copy link
Contributor Author

evenyag commented Jun 6, 2024

This seems to be a risky thing. Can we add some indirect layer like a queue or something else to take back the control of parallelism?

There is already a semaphore to control the parallelism of reading files.

@evenyag evenyag marked this pull request as ready for review June 7, 2024 02:56
@evenyag evenyag requested a review from waynexia June 11, 2024 06:11
@evenyag evenyag added this pull request to the merge queue Jun 12, 2024
Merged via the queue into GreptimeTeam:main with commit 65f8b72 Jun 12, 2024
42 checks passed
@evenyag evenyag deleted the feat/seq-scanner branch June 12, 2024 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants