Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable checksum filtering #118

Open
szabgab opened this issue Sep 15, 2022 · 3 comments · May be fixed by #153 or robfrawley/rdfind#4
Open

Disable checksum filtering #118

szabgab opened this issue Sep 15, 2022 · 3 comments · May be fixed by #153 or robfrawley/rdfind#4

Comments

@szabgab
Copy link

szabgab commented Sep 15, 2022

I have a slow external disks connected to the computer with thousands of large files (videos). comparing the checksums take a lot of time. Would it be possible to disable it?

@fire-eggs
Copy link

You might try the -maxsize option to ignore files larger than a given size.

@szabgab
Copy link
Author

szabgab commented Sep 19, 2022

Thanks for the suggestion, but I think the -maxsize option would mean I don't get the report about the large file and I would like to see if I might have 2-3 copies of the same 1GB files so I can get rid of them.

@einsteinx2 einsteinx2 linked a pull request Mar 17, 2024 that will close this issue
@einsteinx2
Copy link

I have this text in my PR, but figured it would be good to comment with it here as well for discussion.

To the maintainer(s), I think there is a real use case for this:

In my example, I have a NAS with dozens of TB of storage. I have a lot of files that are significantly large (dozens to hundreds of GB). I've been doing some file reorganization and my goal is just to get a short list of files that might be duplicates that I can then process later.

In my case, if these large files have the same name and size and first/last bytes, I can be basically 100% sure they're the same file. It would take exponentially longer to checksum every file at discovery time only to find out what I already know...they're the same.

Later if needed I can manually checksum files in the results list, or even do something like test/checksum random byte ranges in the files rather than checksumming the whole thing to save a ton of time, but in my case (and I assume others' as well) I already know they're the same so even that isn't needed.

To be clear, I think the default should stay the same (or even go up to sha256/512), but at least having the option to disable checksumming can be very useful and the change is only a few lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants