Welcome! spacesavers2
:
- crawls through the provided folder (and its subfolders),
- gathers stats for each file like size, inode, user/group information, etc.,
- calculates unique hashes for each file,
- using the information gathers determines "duplicates",
- reports "high-value" duplicates, i.e., the ones that will give back most diskspace, if deleted,and
- makes a "counts-matrix" style matrix with folders as rownames and users a columnnames with each cell representing duplicate bytes
New improved parallel implementation of
spacesavers
.spacesavers
is soon to be decommissioned!
Note:
spacesavers2
requires python version 3.11 or later and the xxhash library. These dependencies are already installed on biowulf (as a conda env).
- spacesavers2_catalog
- spacesavers2_mimeo
- spacesavers2_grubbers
- spacesavers2_e2e
- spacesavers2_usurp
- spacesavers2_pdq
Check out the detailed documentation for more details. Please reach out to Vishal Koparde with queries/comments.