[Feature Request] Skip already downloaded files by (optionally?) ignoring file endings #5505

Auravendill · 2024-04-24T13:20:38Z

I have a tool running, that is often correcting/changing the file endings. gallery-dl is generally very good at giving files the correct ending, but I noticed, that my tool will often disagrees on the differentiation between mp4 and m4v.

So when i want to check for new files in an already downloaded folder, gallery-dl will think, that those mp4-files are missing, while they may just have a different ending. Which is a bit annoying, because I might end up with a bunch of duplicated files, which then need to be removed again. But having a file, which keeps track of everything, that has been downloaded, would cause different issues, including the issue, that files may get deleted accidentally and I just want to "restore" the missing files.

Therefore I would like to have the option to have the check, whether the file already exists, ignore the file ending completely.

I assume, that gallery-dl currently generates the filename of the next file, checks if the file already exists and skips it, if it does, right? In that case my proposal would be to generate a list of all filenames already in the target folder, but without the file ending and then check each new filename without ending against that list. To my understanding this shouldn't have any adverse effect on the performance, but may even reduce the calls needed to be made to the HDD/SDD.

In case of any curiosity how my different endings do get detected, the script simply uses the output of
temp = subprocess.run(['file', '--mime-type', '-b', filepath], stdout=subprocess.PIPE)
info = temp.stdout.decode('utf-8').strip()
suggested_ending = info.split("/")[-1].split("-")[-1].strip().lower()
for most files (and some special cases for specific files, but that's effectively how mp4 and m4v gets differentiated)

The text was updated successfully, but these errors were encountered:

kattjevfel · 2024-04-24T13:24:25Z

This is just about exactly what the database function is for, does that not work for you?

Auravendill · 2024-04-24T13:41:25Z

The last time I experimented with it, I had some serious issues due to using a NAS as the destination.

I also don't quite see, how a database would react correctly to different kinds of changes, that are unfortunately not uncommon on my NAS:

Files "missing" due to changed file endings -> to correct for that it could just save already downloaded files and not download them again
Files actually missing due to human error, a script malfunctioning (which is also a human error, but I like to pretend the script is at fault) or similar -> in this case the "fix" to 1 would cause 2 to become the problem instead, wouldn't it? If it isn't on the HDD, but downloaded according to the DB, what is the program supposed to do?

Hrxn · 2024-04-24T14:59:26Z

There's no need to store the archive database on a NAS. Only for long-term backup.

mikf · 2024-06-01T19:19:15Z

that my tool will often disagrees on the differentiation between mp4 and m4v.

Should be fixed in v1.27.0 (cd241be)

mikf added a commit that referenced this issue Apr 27, 2024

[downloader:http] add MIME type and signature for .m4v files (#5505)

cd241be

mikf added enhancement feature-request labels Jun 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Skip already downloaded files by (optionally?) ignoring file endings #5505

[Feature Request] Skip already downloaded files by (optionally?) ignoring file endings #5505

Auravendill commented Apr 24, 2024

kattjevfel commented Apr 24, 2024

Auravendill commented Apr 24, 2024

Hrxn commented Apr 24, 2024

mikf commented Jun 1, 2024

[Feature Request] Skip already downloaded files by (optionally?) ignoring file endings #5505

[Feature Request] Skip already downloaded files by (optionally?) ignoring file endings #5505

Comments

Auravendill commented Apr 24, 2024

kattjevfel commented Apr 24, 2024

Auravendill commented Apr 24, 2024

Hrxn commented Apr 24, 2024

mikf commented Jun 1, 2024