-
-
Notifications
You must be signed in to change notification settings - Fork 30.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-42369: Fix thread safety of zipfile._SharedFile.tell #26974
Conversation
The `_SharedFile` tracks its own virtual position into the file as `self._pos` and updates it after reading or seeking. `tell()` should return this position instead of calling into the underlying file object, since if multiple `_SharedFile` instances are being used concurrently on the same file, another one may have moved the real file position. Additionally, calling into the underlying `tell` may expose thread safety issues in the underlying file object because it was called without taking the lock. Prior to this fix, the test case in https://bugs.python.org/issue42369#msg381212 reliably caused a `zipfile.BadZipFile: Bad CRC-32 for file 'file1'` after a few dozen reads; with this fix I have not seen this error.
Nice! This could certainly help me solve the issue in machine learning when I want to load data from a zip file. |
This PR is stale because it has been open for 30 days with no activity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but please add a NEWS file.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase And if you don't make the requested changes, you will be poked with soft cushions! |
I have made the requested changes; please review again (hopefully got the rst formatting right) |
Thanks for making the requested changes! @serhiy-storchaka: please review the changes made to this pull request. |
Thanks @kevinmehall for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.9. |
Thanks @kevinmehall for the PR, and @serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10. |
GH-32008 is a backport of this pull request to the 3.9 branch. |
GH-32009 is a backport of this pull request to the 3.10 branch. |
Thank you @kevinmehall. |
) The `_SharedFile` tracks its own virtual position into the file as `self._pos` and updates it after reading or seeking. `tell()` should return this position instead of calling into the underlying file object, since if multiple `_SharedFile` instances are being used concurrently on the same file, another one may have moved the real file position. Additionally, calling into the underlying `tell` may expose thread safety issues in the underlying file object because it was called without taking the lock. (cherry picked from commit e730ae7) Co-authored-by: Kevin Mehall <[email protected]>
) The `_SharedFile` tracks its own virtual position into the file as `self._pos` and updates it after reading or seeking. `tell()` should return this position instead of calling into the underlying file object, since if multiple `_SharedFile` instances are being used concurrently on the same file, another one may have moved the real file position. Additionally, calling into the underlying `tell` may expose thread safety issues in the underlying file object because it was called without taking the lock. (cherry picked from commit e730ae7) Co-authored-by: Kevin Mehall <[email protected]>
The `_SharedFile` tracks its own virtual position into the file as `self._pos` and updates it after reading or seeking. `tell()` should return this position instead of calling into the underlying file object, since if multiple `_SharedFile` instances are being used concurrently on the same file, another one may have moved the real file position. Additionally, calling into the underlying `tell` may expose thread safety issues in the underlying file object because it was called without taking the lock. (cherry picked from commit e730ae7) Co-authored-by: Kevin Mehall <[email protected]>
The `_SharedFile` tracks its own virtual position into the file as `self._pos` and updates it after reading or seeking. `tell()` should return this position instead of calling into the underlying file object, since if multiple `_SharedFile` instances are being used concurrently on the same file, another one may have moved the real file position. Additionally, calling into the underlying `tell` may expose thread safety issues in the underlying file object because it was called without taking the lock. (cherry picked from commit e730ae7) Co-authored-by: Kevin Mehall <[email protected]>
) The `_SharedFile` tracks its own virtual position into the file as `self._pos` and updates it after reading or seeking. `tell()` should return this position instead of calling into the underlying file object, since if multiple `_SharedFile` instances are being used concurrently on the same file, another one may have moved the real file position. Additionally, calling into the underlying `tell` may expose thread safety issues in the underlying file object because it was called without taking the lock. (cherry picked from commit e730ae7) Co-authored-by: Kevin Mehall <[email protected]>
The
_SharedFile
tracks its own virtual position into the file asself._pos
and updates it after reading or seeking.tell()
should return this position instead of calling into the underlying file object, since if multiple_SharedFile
instances are being used concurrently on the same file, another one may have moved the real file position. Additionally, calling into the underlyingtell
may expose thread safety issues in the underlying file object because it was called without taking the lock.Prior to this fix, the test case in https://bugs.python.org/issue42369#msg381212 reliably caused a
zipfile.BadZipFile: Bad CRC-32 for file 'file1'
in less than a second; with this fix I have not seen this error.https://bugs.python.org/issue42369