Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: introduce CasManager to support chunk dedup at runtime #1626

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

Desiki-high
Copy link
Member

@Desiki-high Desiki-high commented Sep 21, 2024

Relevant Issue (if applicable)

If there are Issues related to this PullRequest, please list it.

Details

Base #1507, complete implementation and testing.

Types of changes

What types of changes does your PullRequest introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation Update (if none of the other choices apply)

Checklist

Go over all the following points, and put an x in all the boxes that apply.

  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.

@Desiki-high Desiki-high requested a review from a team as a code owner September 21, 2024 07:52
@Desiki-high Desiki-high requested review from imeoer, hsiangkao and power-more and removed request for a team September 21, 2024 07:52
Copy link

codecov bot commented Sep 21, 2024

Codecov Report

Attention: Patch coverage is 72.96512% with 93 lines in your changes missing coverage. Please review.

Project coverage is 61.35%. Comparing base (a4683ba) to head (a9b8fe4).

Files with missing lines Patch % Lines
storage/src/cache/dedup/mod.rs 83.90% 25 Missing and 8 partials ⚠️
storage/src/cache/cachedfile.rs 10.00% 25 Missing and 2 partials ⚠️
src/bin/nydusd/main.rs 0.00% 18 Missing ⚠️
storage/src/cache/filecache/mod.rs 62.50% 6 Missing ⚠️
storage/src/cache/fscache/mod.rs 60.00% 6 Missing ⚠️
storage/src/utils.rs 96.42% 0 Missing and 2 partials ⚠️
utils/src/digest.rs 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1626      +/-   ##
==========================================
+ Coverage   61.28%   61.35%   +0.06%     
==========================================
  Files         146      146              
  Lines       48156    48491     +335     
  Branches    46123    46458     +335     
==========================================
+ Hits        29514    29752     +238     
- Misses      17084    17165      +81     
- Partials     1558     1574      +16     
Files with missing lines Coverage Δ
storage/src/cache/dedup/db.rs 79.09% <100.00%> (+0.08%) ⬆️
storage/src/cache/mod.rs 57.95% <ø> (ø)
utils/src/digest.rs 87.81% <0.00%> (-0.51%) ⬇️
storage/src/utils.rs 95.98% <96.42%> (+0.10%) ⬆️
storage/src/cache/filecache/mod.rs 66.77% <62.50%> (-0.13%) ⬇️
storage/src/cache/fscache/mod.rs 75.63% <60.00%> (-0.76%) ⬇️
src/bin/nydusd/main.rs 0.00% <0.00%> (ø)
storage/src/cache/cachedfile.rs 37.58% <10.00%> (-0.58%) ⬇️
storage/src/cache/dedup/mod.rs 77.82% <83.90%> (+77.82%) ⬆️

... and 3 files with indirect coverage changes

@Desiki-high Desiki-high force-pushed the storage/copy-range branch 8 times, most recently from a1abc57 to b386bde Compare September 27, 2024 10:14
jiangliu and others added 6 commits September 27, 2024 18:15
Add helper copy_file_range() which:
- avoid copy data into userspace
- may support reflink on xfs etc

Signed-off-by: Jiang Liu <[email protected]>
- improve copy_file_range when target os is not linux
- add more comprehensive tests

Signed-off-by: Yadong Ding <[email protected]>
Implement CasManager to support chunk dedup at runtime.
The manager provides to major interfaces:
- add chunk data to the CAS database
- check whether a chunk exists in CAS database and copy it to blob file
  by copy_file_range() if the chunk exists.

Signed-off-by: Jiang Liu <[email protected]>
- Changed `delete_blobs` method in `CasDb` to take an immutable reference (`&self`) instead of a mutable reference (`&mut self`).
- Updated `dedup_chunk` method in `CasMgr` to correctly handle the deletion of non-existent blob files from both the file descriptor cache and the database.
- Implemented the `gc` (garbage collection) method in `CasMgr` to identify and remove blobs that no longer exist on the filesystem, ensuring the database and cache remain consistent.

Signed-off-by: Yadong Ding <[email protected]>
Enable chunk deduplication for file cache. It works in this way:
- When a chunk is not in blob cache file yet, inquery CAS database
  whether other blob data files have the required chunk. If there's
  duplicated data chunk in other data files, copy the chunk data
  into current blob cache file by using copy_file_range().
- After downloading a data chunk from remote, save file/offset/chunk-id
  into CAS database, so it can be reused later.

Co-authored-by: Jiang Liu <[email protected]>
Co-authored-by: Yading Ding <[email protected]>
Signed-off-by: Yadong Ding <[email protected]>
Add documentation for cas.

Signed-off-by: Jiang Liu <[email protected]>
@Desiki-high Desiki-high force-pushed the storage/copy-range branch 4 times, most recently from a2918de to 60478a0 Compare October 1, 2024 06:08
Add smoking test case for cas and chunk dedup.

Signed-off-by: Yadong Ding <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants