Skip to content

cdelorme/level

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A cross platform FOSS library to scan for duplicate files written in go.

Title inspired by the Japanese anime Toaru Kagaku no Railgun.

Documentation

design

All processing is done synchronously because the bottleneck will always be the persistent storage.

The primary operation (LastOrder) will scan a folder for files, discard all files with no size or which contain excluded segments, group the rest by size, then iterate the groups checking all but the first to ignore hard links using os.SameFile, read the remaining files two at a time in 4K chunks to compare them byte-by-byte.

If run in test mode no further actions will take place, and it is expected that the caller will print the metrics collected and the groups of duplicates so the user may act upon them.

Otherwise, it will perform a weighted sort of each group favoring depth then frequency of directory discarding the first record with the lowest score so the rest may be deleted.

If the file system uses a larger block size than the 4K buffer used by the software it may negatively affect the performance of the software.

usage

Import the library:

import "github.com/cdelorme/level"

Please use the godocs for further instructions.

Installation process:

go get github.com/cdelorme/level/...

tests

Tests can be run via:

go test -v -cover -race

future

  • add intelligent buffer size to detect disk block sizes and use the lowest common denominator

About

A file deduplication program for linux.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages