Fix rare ExHentai duplicated metadata bug #3033
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I'm using gallery-dl as a library and I've created a custom job for metadata extraction, similar to
DataJob
. But I was getting identical metadata for different pages on ExHentai. Digging a bit deeper, here's what I've found.ExhentaiGalleryExtractor.items
yields the same object for each image in a gallery. This causes problems when the object is not immediately used.To reproduce:
-o output.private=true
causes imagekwdict
to be passed throughutil.identity
inDataJob.handle_url
:gallery-dl/gallery_dl/job.py
Line 700 in 560f7b4
gallery-dl/gallery_dl/job.py
Lines 734 to 735 in 560f7b4
Since
util.identity
doesn't have a side-effect of creating a new dictionary likeutil.filter_dict
, and sinceDataJob
doesn't use the objects immediately, this triggers the bug.