Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZipFile.extractall fails if zipfile.Path object is created based on the ZipFile object #101566

Closed
mmohrhard opened this issue Feb 4, 2023 · 6 comments · Fixed by jaraco/zipp#90
Assignees
Labels
3.10 only security fixes 3.11 only security fixes 3.12 bugs and security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@mmohrhard
Copy link
Contributor

mmohrhard commented Feb 4, 2023

Bug report

The following code works fine with Python 3.8 and 3.9 but starts failing with Python 3.10.

It seems that zipfile.Path creates an entry in the zipfile.ZipFile object that does not exist in the underlying file and therefore makes zipfile.ZipFile.extractall fail.

import zipfile
import tempfile

path = tempfile.mktemp()

f = zipfile.ZipFile(path, "w")
f.writestr("data/test.txt", "test1")
f.writestr("data/test2.txt", "test2")
f.close()

dest = tempfile.mkdtemp()

p = zipfile.ZipFile(path)
src_zipfile_path = zipfile.Path(p, "data")
p.extractall(dest)

Your environment

Working version:

(python39) [moggi@mars cloudfluid]$ python --version
Python 3.9.16
(python39) [moggi@mars cloudfluid]$ python test.py
(python39) [moggi@mars cloudfluid]$ 

Failing versions:

3.10.9:

(python_310_2) [moggi@mars cloudfluid]$ python --version
Python 3.10.9
(python_310_2) [moggi@mars cloudfluid]$ python test.py
Traceback (most recent call last):
  File "/home/moggi/devel/cloudfluid/test.py", line 15, in <module>
    p.extractall(dest)
  File "/home/moggi/devel/envs/python_310_2/lib/python3.10/zipfile.py", line 1645, in extractall
    self._extract_member(zipinfo, path, pwd)
  File "/home/moggi/devel/envs/python_310_2/lib/python3.10/zipfile.py", line 1667, in _extract_member
    member = self.getinfo(member)
  File "/home/moggi/devel/envs/python_310_2/lib/python3.10/zipfile.py", line 1441, in getinfo
    raise KeyError(
KeyError: "There is no item named 'data/' in the archive"

and 3.11.0

(python311) [moggi@mars cloudfluid]$ python --version
Python 3.11.0
(python311) [moggi@mars cloudfluid]$ python test.py
Traceback (most recent call last):
  File "/home/moggi/devel/cloudfluid/test.py", line 15, in <module>
    p.extractall(dest)
  File "/home/moggi/devel/envs/python311/lib/python3.11/zipfile.py", line 1677, in extractall
    self._extract_member(zipinfo, path, pwd)
  File "/home/moggi/devel/envs/python311/lib/python3.11/zipfile.py", line 1699, in _extract_member
    member = self.getinfo(member)
             ^^^^^^^^^^^^^^^^^^^^
  File "/home/moggi/devel/envs/python311/lib/python3.11/zipfile.py", line 1473, in getinfo
    raise KeyError(
KeyError: "There is no item named 'data/' in the archive"

If this is the expected new behavior I think the documentation for 3.10 should mention this breaking change and the zipfile.Path documentation might need an entry about this problem.

The zipfile.Path object is used in the original code to check if the "data/" directory exists in the zip file.

Linked PRs

@mmohrhard mmohrhard added the type-bug An unexpected behavior, bug, or error label Feb 4, 2023
@arhadthedev arhadthedev added the stdlib Python modules in the Lib dir label Feb 4, 2023
@arhadthedev
Copy link
Member

arhadthedev commented Feb 4, 2023

@serhiy-storchaka, @Yhg1s, @gpshead as zipfile experts.

@gpshead gpshead added 3.11 only security fixes 3.10 only security fixes 3.12 bugs and security fixes labels Feb 4, 2023
@gpshead
Copy link
Member

gpshead commented Feb 4, 2023

@jaraco

@jaraco
Copy link
Member

jaraco commented Feb 5, 2023

This issue occurs because of the optimizations / tradeoffs made in attempting to extend an open file object. Here's what's going on:

  • zipfile.Path needs to be able to infer directories when there are none in the zipfile (i.e. the presence of data/test.txt implies the presence of data/ even though it's not explicitly a member of the zipfile).
  • To accomplish this goal, the CompleteDirs wraps the original zipfile.
  • Due to challenges with the lifecycle of a zipfile (which may have been created in memory or exist on disk as an open file), wrapping a Zipfile in a Path actually mutates its class, giving the original Zipfile object the CompleteDirs behavior.
  • In extractall, it's assumed that every item in the namelist is also a member, but with CompleteDirs, that's no longer true.

This report is the first I've seen that this (admittedly impure) behavior has caused any trouble.

I agree that at the very least, it's worthwhile documenting these limitations.

I did try to track down the origins of the change that led to this regression in behavior. I took a brief look at the readme for zipp, the forward/backport, and it doesn't mention any changes for Python 3.10 (now updated to reflect new findings).

Okay. I see the change occurred in ebbe803, bpo-40564/#84744. That bug pretty thoroughly describes the thought process and the tradeoffs considered.

I can think of a few ways to address the issue.

  • document it as unsupported behavior
  • document it as unsupported behavior, but provide a mechanism to restore the original object
  • add support for zipfile.Zipfile to be lenient to dirs not present
  • add support to CompleteDirs to return a virtual member info for an implied dir
  • CompleteDirs to override extractall to bypass the issue

I'm leaning toward (3) or (4). (4) has an advantage over all of the other options in that, if it can be implemented, it provides better compatibility for more use cases (not just ZipFile.extractall).

@jaraco
Copy link
Member

jaraco commented Feb 5, 2023

In jaraco/zipp#90, I've drafted a patch implementing option 4, to be released as zipp 3.12.1. I'd like to get some feedback on the concept before porting that to cpython.

@jaraco
Copy link
Member

jaraco commented Feb 5, 2023

(re-opened as merging the fix marked this issue as closed, but it's still open with cpython)

clrpackages pushed a commit to clearlinux-pkgs/pypi-zipp that referenced this issue Feb 7, 2023
…n 3.12.1

Jason R. Coombs (8):
      ALLOW_UNICODE no longer needed on Python 3. As a result, ELLIPSES is also now enabled by default.
      Enable default encoding warning where available. See PEP 597.
      Suppress EncodingWarning in pytest_black. Workaround for shopkeep/pytest-black#67.
      Exempt warning. Workaround for realpython/pytest-mypy#152
      Update README to reflect finding that 3.2 incorporated into Python 3.10, but probably not 3.3 into 3.9.
      Add xfail test capturing missed expectation. Ref python/cpython#101566.
      Override ZipFile.getinfo to supply a ZipInfo for implied dirs. Fixes python/cpython#101566.
      Update changelog. Ref python/cpython#101566.
jaraco added a commit to jaraco/cpython that referenced this issue Feb 18, 2023
jaraco added a commit to jaraco/cpython that referenced this issue Feb 18, 2023
jaraco added a commit to jaraco/cpython that referenced this issue Feb 20, 2023
(cherry picked from commit 36854bb)

Co-authored-by: Jason R. Coombs <[email protected]>
jaraco added a commit to jaraco/cpython that referenced this issue Feb 20, 2023
(cherry picked from commit 36854bb)

Co-authored-by: Jason R. Coombs <[email protected]>
jaraco added a commit to jaraco/cpython that referenced this issue Feb 20, 2023
(cherry picked from commit 36854bb)

Co-authored-by: Jason R. Coombs <[email protected]>
miss-islington pushed a commit that referenced this issue Feb 20, 2023
(cherry picked from commit 36854bb)

Includes the bugfix only.

Automerge-Triggered-By: GH:jaraco
miss-islington pushed a commit that referenced this issue Feb 20, 2023
(cherry picked from commit 36854bb)

Backport of bugfix only.

Automerge-Triggered-By: GH:jaraco
carljm added a commit to carljm/cpython that referenced this issue Feb 20, 2023
* main: (60 commits)
  pythongh-102056: Fix a few bugs in error handling of exception printing code (python#102078)
  pythongh-102011: use sys.exception() instead of sys.exc_info() in docs where possible (python#102012)
  pythongh-101566: Sync with zipp 3.14. (pythonGH-102018)
  pythonGH-99818: improve the documentation for zipfile.Path and Traversable (pythonGH-101589)
  pythongh-88233: zipfile: handle extras after a zip64 extra (pythonGH-96161)
  pythongh-101981: Apply HOMEBREW related environment variables (pythongh-102074)
  pythongh-101907: Stop using `_Py_OPCODE` and `_Py_OPARG` macros (pythonGH-101912)
  pythongh-101819: Adapt _io types to heap types, batch 1 (pythonGH-101949)
  pythongh-101981: Build macOS as recommended by the devguide (pythonGH-102070)
  pythongh-97786: Fix compiler warnings in pytime.c (python#101826)
  pythongh-101578: Amend PyErr_{Set,Get}RaisedException docs (python#101962)
  Misc improvements to the float tutorial (pythonGH-102052)
  pythongh-85417: Clarify behaviour on branch cuts in cmath module (python#102046)
  pythongh-100425: Update tutorial docs related to sum() accuracy (FH-101854)
  Add missing 'is' to `cmath.log()` docstring (python#102049)
  pythongh-100210: Correct the comment link for unescaping HTML (python#100212)
  pythongh-97930: Also include subdirectory in makefile. (python#102030)
  pythongh-99735: Use required=True in argparse subparsers example (python#100927)
  Fix incorrectly documented attribute in csv docs (python#101250)
  pythonGH-84783: Make the slice object hashable (pythonGH-101264)
  ...
carljm added a commit to carljm/cpython that referenced this issue Feb 22, 2023
* main: (225 commits)
  pythongh-102056: Fix a few bugs in error handling of exception printing code (python#102078)
  pythongh-102011: use sys.exception() instead of sys.exc_info() in docs where possible (python#102012)
  pythongh-101566: Sync with zipp 3.14. (pythonGH-102018)
  pythonGH-99818: improve the documentation for zipfile.Path and Traversable (pythonGH-101589)
  pythongh-88233: zipfile: handle extras after a zip64 extra (pythonGH-96161)
  pythongh-101981: Apply HOMEBREW related environment variables (pythongh-102074)
  pythongh-101907: Stop using `_Py_OPCODE` and `_Py_OPARG` macros (pythonGH-101912)
  pythongh-101819: Adapt _io types to heap types, batch 1 (pythonGH-101949)
  pythongh-101981: Build macOS as recommended by the devguide (pythonGH-102070)
  pythongh-97786: Fix compiler warnings in pytime.c (python#101826)
  pythongh-101578: Amend PyErr_{Set,Get}RaisedException docs (python#101962)
  Misc improvements to the float tutorial (pythonGH-102052)
  pythongh-85417: Clarify behaviour on branch cuts in cmath module (python#102046)
  pythongh-100425: Update tutorial docs related to sum() accuracy (FH-101854)
  Add missing 'is' to `cmath.log()` docstring (python#102049)
  pythongh-100210: Correct the comment link for unescaping HTML (python#100212)
  pythongh-97930: Also include subdirectory in makefile. (python#102030)
  pythongh-99735: Use required=True in argparse subparsers example (python#100927)
  Fix incorrectly documented attribute in csv docs (python#101250)
  pythonGH-84783: Make the slice object hashable (pythonGH-101264)
  ...
carljm added a commit to carljm/cpython that referenced this issue Feb 23, 2023
* main: (76 commits)
  Fix syntax error in struct doc example (python#102160)
  pythongh-99108: Import MD5 and SHA1 from HACL* (python#102089)
  pythonGH-101777: `queue.rst`: use 2 spaces after a period to be consistent. (python#102143)
  Few coverage nitpicks for the cmath module (python#102067)
  pythonGH-100982: Restrict `FOR_ITER_RANGE` to a single instruction to allow instrumentation. (pythonGH-101985)
  pythongh-102135: Update turtle docs to rename wikipedia demo to rosette (python#102137)
  pythongh-99942: python.pc on android/cygwin should link to libpython per configure.ac (pythonGH-100356)
  pythongh-95672 fix typo SkitTest to SkipTest (pythongh-102119)
  pythongh-101936: Update the default value of fp from io.StringIO to io.BytesIO (pythongh-102100)
  pythongh-102008: simplify test_except_star by using sys.exception() instead of sys.exc_info() (python#102009)
  pythongh-101903: Remove obsolete undefs for previously removed macros Py_EnterRecursiveCall and Py_LeaveRecursiveCall (python#101923)
  pythongh-100556: Improve clarity of `or` docs (python#100589)
  pythongh-101777: Make `PriorityQueue` docs slightly clearer (python#102026)
  pythongh-101965: Fix usage of Py_EnterRecursiveCall return value in _bisectmodule.c (pythonGH-101966)
  pythongh-101578: Amend exception docs (python#102057)
  pythongh-101961 fileinput.hookcompressed should not set the encoding value for the binary mode (pythongh-102068)
  pythongh-102056: Fix a few bugs in error handling of exception printing code (python#102078)
  pythongh-102011: use sys.exception() instead of sys.exc_info() in docs where possible (python#102012)
  pythongh-101566: Sync with zipp 3.14. (pythonGH-102018)
  pythonGH-99818: improve the documentation for zipfile.Path and Traversable (pythonGH-101589)
  ...
@jaraco
Copy link
Member

jaraco commented Jul 14, 2023

This issue was closed back in Feb.

@jaraco jaraco closed this as completed Jul 14, 2023
python-sidebar pushed a commit to python-sidebar/Python-Documentation-Fork-With-TOC that referenced this issue Sep 1, 2024
python-sidebar pushed a commit to python-sidebar/Python-Documentation-Fork-With-TOC that referenced this issue Sep 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.10 only security fixes 3.11 only security fixes 3.12 bugs and security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants