Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: introduce CasManager to support chunk dedup at runtime #1626

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Commits on Sep 27, 2024

  1. storage: add helper copy_file_range

    Add helper copy_file_range() which:
    - avoid copy data into userspace
    - may support reflink on xfs etc
    
    Signed-off-by: Jiang Liu <[email protected]>
    jiangliu authored and Desiki-high committed Sep 27, 2024
    Configuration menu
    Copy the full SHA
    beb5cfc View commit details
    Browse the repository at this point in the history
  2. storage: improve copy_file_range

    - improve copy_file_range when target os is not linux
    - add more comprehensive tests
    
    Signed-off-by: Yadong Ding <[email protected]>
    Desiki-high committed Sep 27, 2024
    Configuration menu
    Copy the full SHA
    38b5708 View commit details
    Browse the repository at this point in the history
  3. storage: implement CasManager to support chunk dedup at runtime

    Implement CasManager to support chunk dedup at runtime.
    The manager provides to major interfaces:
    - add chunk data to the CAS database
    - check whether a chunk exists in CAS database and copy it to blob file
      by copy_file_range() if the chunk exists.
    
    Signed-off-by: Jiang Liu <[email protected]>
    jiangliu authored and Desiki-high committed Sep 27, 2024
    Configuration menu
    Copy the full SHA
    57985b8 View commit details
    Browse the repository at this point in the history
  4. storage: add garbage collection in CasMgr

    - Changed `delete_blobs` method in `CasDb` to take an immutable reference (`&self`) instead of a mutable reference (`&mut self`).
    - Updated `dedup_chunk` method in `CasMgr` to correctly handle the deletion of non-existent blob files from both the file descriptor cache and the database.
    - Implemented the `gc` (garbage collection) method in `CasMgr` to identify and remove blobs that no longer exist on the filesystem, ensuring the database and cache remain consistent.
    
    Signed-off-by: Yadong Ding <[email protected]>
    Desiki-high committed Sep 27, 2024
    Configuration menu
    Copy the full SHA
    f737a3c View commit details
    Browse the repository at this point in the history
  5. storage: enable chunk deduplication for file cache

    Enable chunk deduplication for file cache. It works in this way:
    - When a chunk is not in blob cache file yet, inquery CAS database
      whether other blob data files have the required chunk. If there's
      duplicated data chunk in other data files, copy the chunk data
      into current blob cache file by using copy_file_range().
    - After downloading a data chunk from remote, save file/offset/chunk-id
      into CAS database, so it can be reused later.
    
    Co-authored-by: Jiang Liu <[email protected]>
    Co-authored-by: Yading Ding <[email protected]>
    Signed-off-by: Yadong Ding <[email protected]>
    Desiki-high and jiangliu committed Sep 27, 2024
    Configuration menu
    Copy the full SHA
    c2e5bfa View commit details
    Browse the repository at this point in the history
  6. docs: add documentation for cas

    Add documentation for cas.
    
    Signed-off-by: Jiang Liu <[email protected]>
    jiangliu authored and Desiki-high committed Sep 27, 2024
    Configuration menu
    Copy the full SHA
    a938849 View commit details
    Browse the repository at this point in the history

Commits on Oct 1, 2024

  1. smoke: add smoking test for cas and chunk dedup

    Add smoking test case for cas and chunk dedup.
    
    Signed-off-by: Yadong Ding <[email protected]>
    Desiki-high committed Oct 1, 2024
    Configuration menu
    Copy the full SHA
    a9b8fe4 View commit details
    Browse the repository at this point in the history