Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow for '-checksum none' to disable checksum filtering. #118 #119

Open
wants to merge 4 commits into
base: devel
Choose a base branch
from

Conversation

szabgab
Copy link

@szabgab szabgab commented Sep 15, 2022

Let's give a try to this

@szabgab
Copy link
Author

szabgab commented Sep 19, 2022

I've also blogged about it and created a youtube video explaining: https://code-maven.com/patch-rdfind-cpp-checksum

@pauldreik pauldreik force-pushed the devel branch 4 times, most recently from 40f2069 to b9f43aa Compare June 16, 2023 06:49
@einsteinx2
Copy link

einsteinx2 commented Mar 17, 2024

Funny I actually just did this myself and then saw this when I went to open the PR :P

For anyone that may be interested, my implementation has less changes to the code and is based off the latest release (1.6.0) as this PR has conflicts due to it's age.

Here's the commit in my fork: einsteinx2@1b591d0

To the maintainer(s), I think there is a real use case for this:

In my example, I have a NAS with dozens of TB of storage. I have a lot of files that are significantly large (dozens to hundreds of GB). I've been doing some file reorganization and my goal is just to get a short list of files that might be duplicates that I can then process later.

In my case, if these large files have the same name and size and first/last bytes, I can be basically 100% sure they're the same file. It would take exponentially longer to checksum every file at discovery time only to find out what I already know...they're the same.

Later if needed I can manually checksum files in the results list, or even do something like test/checksum random byte ranges in the files rather than checksumming the whole thing to save a ton of time, but in my case (and I assume others' as well) I already know they're the same so even that isn't needed.

Anyway I won't open another PR for this as this one already exists, but wanted to link my code and explain why it might be good to merge. To be clear, I think the default should stay the same (or even go up to sha256/512), but at least having the option to disable checksumming can be very useful and the change is only a few lines.

@szabgab
Copy link
Author

szabgab commented Mar 17, 2024

Thanks for the comment. I'd recommend you open the separate PR with your code. Especially if you feel it is a better solution.

@einsteinx2
Copy link

I've pushed my PR here: #153

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants