Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Skip already downloaded files by (optionally?) ignoring file endings #5505

Open
Auravendill opened this issue Apr 24, 2024 · 4 comments

Comments

@Auravendill
Copy link

I have a tool running, that is often correcting/changing the file endings. gallery-dl is generally very good at giving files the correct ending, but I noticed, that my tool will often disagrees on the differentiation between mp4 and m4v.

So when i want to check for new files in an already downloaded folder, gallery-dl will think, that those mp4-files are missing, while they may just have a different ending. Which is a bit annoying, because I might end up with a bunch of duplicated files, which then need to be removed again. But having a file, which keeps track of everything, that has been downloaded, would cause different issues, including the issue, that files may get deleted accidentally and I just want to "restore" the missing files.

Therefore I would like to have the option to have the check, whether the file already exists, ignore the file ending completely.

I assume, that gallery-dl currently generates the filename of the next file, checks if the file already exists and skips it, if it does, right? In that case my proposal would be to generate a list of all filenames already in the target folder, but without the file ending and then check each new filename without ending against that list. To my understanding this shouldn't have any adverse effect on the performance, but may even reduce the calls needed to be made to the HDD/SDD.

In case of any curiosity how my different endings do get detected, the script simply uses the output of
temp = subprocess.run(['file', '--mime-type', '-b', filepath], stdout=subprocess.PIPE)
info = temp.stdout.decode('utf-8').strip()
suggested_ending = info.split("/")[-1].split("-")[-1].strip().lower()
for most files (and some special cases for specific files, but that's effectively how mp4 and m4v gets differentiated)

@kattjevfel
Copy link
Contributor

This is just about exactly what the database function is for, does that not work for you?

@Auravendill
Copy link
Author

The last time I experimented with it, I had some serious issues due to using a NAS as the destination.

I also don't quite see, how a database would react correctly to different kinds of changes, that are unfortunately not uncommon on my NAS:

  1. Files "missing" due to changed file endings -> to correct for that it could just save already downloaded files and not download them again
  2. Files actually missing due to human error, a script malfunctioning (which is also a human error, but I like to pretend the script is at fault) or similar -> in this case the "fix" to 1 would cause 2 to become the problem instead, wouldn't it? If it isn't on the HDD, but downloaded according to the DB, what is the program supposed to do?

@Hrxn
Copy link
Contributor

Hrxn commented Apr 24, 2024

There's no need to store the archive database on a NAS. Only for long-term backup.

@mikf
Copy link
Owner

mikf commented Jun 1, 2024

that my tool will often disagrees on the differentiation between mp4 and m4v.

Should be fixed in v1.27.0 (cd241be)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants