You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I'm making zip files using the postprocessor, upon resumption of some interrupted gallery-dl process, I can only view the files added after restarting gallery-dl in my zips with comic viewers because the filemodes/attributes for older files were not set before the interruption. See the end for what I presume is a fix to what is less a bug and more inconvenient and work-aroundable behavior.
I have the following postprocessor my config to create zip files and I am using the Windows binary release of gallery-dl.
I find a gallery and run the command, gallery-dl <gallery-url>, which produces <gallery>.zip. I allow the command to execute. Looking at the zip file's NameToInfo in a python REPL shows me everything is okay:
Next, I delete the output from my previous run and repeat the command. However, I interrupt partway with a C-c keyboard interrupt (page 6 completes, but pages 7 and 8 are absent as expected). The zip file now has no filemode/permissions! Furthermore, the file's NameToInfo is empty at {}! A comic viewer can see everything just fine, but that is odd! I suppose something magical happens in your finalize() function in postprocessor/zip.py, but I don't know very much python so this is a mystery to me. All I know is when I do what you do in your run() function to append a file, everything works just fine; furthermore, if I make a zip file from scratch, then NameToInfo will show all its contents with appropriate filemode/attributes:
zfile=zipfile.ZipFile("comic.zip", "a", zipfile.ZIP_STORED, True)
zfile.write("p0007.png) # this will have rw filemode, unlike pages 1-6zfile.NameToInfo# Only contains { 'p0007.png': <...> } and nothing elsezfile.close()
So, to reproduce this:
gallery-dl <gallery-url>
C-c partway through
gallery-dl <gallery-url>
The result is not the NameToInfo structure above, but only the subset containing the newer files for pages 7 and 8. Furthermore, comic viewers will only show/display the new files because their filemode is set but pages 1 through 6 have no filemode.
I've attached an image as a visual summary of what happened to me. In practice, substitute C-c for a loss of power, an application crash or something like that.
I suppose in practice this would not normally be a real issue. If someone is scraping with gallery-dl then they would probably be interrupted via their config (in which case zfile.close() occurs and this issue does not apply to them), but if they did suffer some interruption otherwise, the fix is to simply manually fix the zip contents' permissions.
However, this manual intervention need could be precluded by opening and closing the zip file on every single instance that zfile.run() is called, which might seem bananas but considering the lion's share of each iteration is not spent calculating file opens and file closes, doing so would make everything much more idiot-proof and allow us to assume no manual intervention is ever required on interruptions. At least, that how things seem to me.
An example of a fix, I suppose, would be the following change in postprocessor/zip.py, or to do something like with ZipFile('spam.zip', 'a') as zfile: ... ![gallery-dl zip bug demo](https://user-images.githubusercontent.com/4463623/61985424-0b50c300-afbe-11e9-9593-0fb3656f8ae0.png) or whatever, python is really something I'm rather awful at... mode 'a' seems to work even if file doesn't exist, not that I know what I'm talking about):
def__init__(self, pathfmt, options):
PostProcessor.__init__(self)
self.delete=notoptions.get("keep-files", False)
self.ext="."+options.get("extension", "zip")
algorithm=options.get("compression", "store")
ifalgorithmnotinself.COMPRESSION_ALGORITHMS:
self.log.warning(
"unknown compression algorithm '%s'; falling back to 'store'",
algorithm)
algorithm="store"self.path=pathfmt.realdirectory# DELETE THIS:self.zfile=zipfile.ZipFile(
self.path+self.ext, "a",
self.COMPRESSION_ALGORITHMS[algorithm], True)
defrun(self, pathfmt):
# MOVE FROM ABOVE:#self.zfile = zipfile.ZipFile(# self.path + self.ext, "a",# self.COMPRESSION_ALGORITHMS[algorithm], True)# 'NameToInfo' is not officially documented, but it's available# for all supported Python versions and using it directly is a lot# better than calling getinfo()ifpathfmt.filenamenotinself.zfile.NameToInfo:
self.zfile.write(pathfmt.temppath, pathfmt.filename)
pathfmt.delete=self.delete# MOVE FROM BELOW:self.zfile.close()
deffinalize(self):
# DELETE THIS:#self.zfile.close()ifself.delete:
try:
os.rmdir(self.path)
exceptOSError:
passifnotself.zfile.NameToInfo:
try:
os.unlink(self.zfile.filename)
exceptOSError:
pass
The text was updated successfully, but these errors were encountered:
First of all, thank you for all the work you've already put into this. Having detailed error descriptions is always nice.
The crux of the issue appears to be that the zip file index (?) only get completely written when ZipFile.close() is called and is left in an "intermediate" state between having written a file to it and actually closing it. 20f7b07 now makes sure that ZipFile.close() always get called during exceptions, KeyboardInterrupt included, but there is no way to have this happen when the Python interpreter itself is killed or a power outage happens.
Of course your with ZipFile('spam.zip', 'a') as zfile: ... suggestion would drastically reduce the chance of the program unexpectedly dying while the zip file is in an "intermediate" state, but I generally dislike opening and parsing the zip archive over and over again when doing this once is more than enough - except in this case.
There is also the possibility that the process is interrupted during a ZipFile.write() and a file is only partially written. Detecting and fixing such a case also needs to happen somehow ...
When I'm making zip files using the postprocessor, upon resumption of some interrupted
gallery-dl
process, I can only view the files added after restartinggallery-dl
in my zips with comic viewers because the filemodes/attributes for older files were not set before the interruption. See the end for what I presume is a fix to what is less a bug and more inconvenient and work-aroundable behavior.I have the following postprocessor my config to create zip files and I am using the Windows binary release of gallery-dl.
I find a gallery and run the command,
gallery-dl <gallery-url>
, which produces<gallery>.zip
. I allow the command to execute. Looking at the zip file'sNameToInfo
in a python REPL shows me everything is okay:Next, I delete the output from my previous run and repeat the command. However, I interrupt partway with a C-c keyboard interrupt (page 6 completes, but pages 7 and 8 are absent as expected). The zip file now has no filemode/permissions! Furthermore, the file's
NameToInfo
is empty at{}
! A comic viewer can see everything just fine, but that is odd! I suppose something magical happens in yourfinalize()
function inpostprocessor/zip.py
, but I don't know very much python so this is a mystery to me. All I know is when I do what you do in yourrun()
function to append a file, everything works just fine; furthermore, if I make a zip file from scratch, thenNameToInfo
will show all its contents with appropriate filemode/attributes:So, to reproduce this:
gallery-dl <gallery-url>
C-c
partway throughgallery-dl <gallery-url>
The result is not the
NameToInfo
structure above, but only the subset containing the newer files for pages 7 and 8. Furthermore, comic viewers will only show/display the new files because their filemode is set but pages 1 through 6 have no filemode.I've attached an image as a visual summary of what happened to me. In practice, substitute
C-c
for a loss of power, an application crash or something like that.I suppose in practice this would not normally be a real issue. If someone is scraping with
gallery-dl
then they would probably be interrupted via their config (in which casezfile.close()
occurs and this issue does not apply to them), but if they did suffer some interruption otherwise, the fix is to simply manually fix the zip contents' permissions.However, this manual intervention need could be precluded by opening and closing the zip file on every single instance that
zfile.run()
is called, which might seem bananas but considering the lion's share of each iteration is not spent calculating file opens and file closes, doing so would make everything much more idiot-proof and allow us to assume no manual intervention is ever required on interruptions. At least, that how things seem to me.An example of a fix, I suppose, would be the following change in
postprocessor/zip.py
, or to do something likewith ZipFile('spam.zip', 'a') as zfile: ... ![gallery-dl zip bug demo](https://user-images.githubusercontent.com/4463623/61985424-0b50c300-afbe-11e9-9593-0fb3656f8ae0.png)
or whatever, python is really something I'm rather awful at... mode 'a' seems to work even if file doesn't exist, not that I know what I'm talking about):The text was updated successfully, but these errors were encountered: