Releases · markfasheh/duperemove

25 Nov 11:29

JackSlateur

v0.14.1

9912c03

v0.14.1 Latest

Latest

This release includes a couple of contributions to fix some bugs introduces in v0.14.

Assets 2

20 Nov 18:49

JackSlateur

v0.14

ebc8c1f

v0.14

Notable changes:

Batching has been reimplemented on top of the dedupe_seq.
The "scan" phase has been reimplemented (see 8264336 for details).
Filesystem locking has been implemented. See f3947e9 for details.

This release focuses on improving the "scan" phase: restructure, remove bugs and improve its performance, as well as pave the way for future features. More bugs probably joined the party in the mean time, sadly.

Special thanks to Sergei Trofimovich for his kind insights, helps and contributions

Assets 2

29 Sep 10:07

JackSlateur

v0.13

9996a96

v0.13

Notable changes:

Add a new dedupe option: [no]rescan_files. It will increase performance in some use cases.
New behaviors from v0.12 has been consolidated. Extent-based lookup is always enabled, as is fiemap. The v2 hashfile is no longer supported.
Hashfile are now updated after deduplication, to reflect the new physical offsets. This avoid (re)deduplicating extents in some cases.
Partial mode has been enhanced to support batching. The overall performance of this mode (which was previously known as "block-based mode") has been improved.
All files are now open in readonly mode.
Hashfile version has been increased to reflect the new database behaviors. Previous hashfiles are not compatible.
Always compute a hash for the entire file. This let us deduplicate same files easily, regardless of their extents mappings.
Deduplicating only parts of a file can be disabled using the [no]only_whole_files dedupe option.
Hashfiles with unsupported features or hash algorithm are now recreated transparently. Migration of the old content is not implemented.
Relative exclude patterns are no longer silently ingested. Such patterns are now rebuilt on top of the current working directory.
Batching is now set to 1024 by default.

Assets 2

15 Jul 09:25

JackSlateur

v0.12

58ab87f

v0.12

Notable changes:

Duplication lookup is now based on extents. This leads to a massive increase of the performances. Block-based lookup is still possible via --dedupe-options=partial.
Following that change, a new hashfile format has been introduced. Previous hashfile format is still supported when extents lookup are disabled, this is not recommended.
Batching has been implemented. When enabled with the -B <batchsize> option, duperemove will run the deduplication phase every <batchsize> scanned files. This is meant to help running duperemove on large dataset, with small blocksize, or on memory-constrained systems.
All hash algorithm has been removed and replaced by xxh128. This variant is as robust as murmur3 while being faster. Choosing a hash function via the --hash option has been removed. Hashfiles built with other algorithm must be removed.

Assets 2

10 Aug 06:34

lorddoskias

v0.11.3

21b03c3

Duperemove v0.11.3

Increase open file limit. (#269)
Create hash database file with 600 permission for improved security. (#262)
Read more data per pread, for v2 hashfile format this reduces the overall number of syscalls made which in turns results in better performance.
Fix truncated file handling, eliminating a an infinite loop case. (#255)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: markfasheh/duperemove

v0.14.1

v0.14

v0.13

v0.12

Duperemove v0.11.3