Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use better temporary files mechanism in install_unpacked_wheel #7703

Closed
wants to merge 1 commit into from

Conversation

rudolfovic
Copy link

Fixes #7699

@rudolfovic
Copy link
Author

The documentation talks about three types of news:

removal, feature, bugfix

The issue has type refactor so I am not sure what the protocol is.

@xavfernandez xavfernandez added the skip news Does not need a NEWS file entry (eg: trivial changes) label Feb 5, 2020
Copy link
Member

@sbidoul sbidoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rudolfovic thank you for picking this up.

Two remarks:

  • there are changes to various NEWS file that are unrelated and should not be part of this PR
  • there are other place in wheel.py where the same pattern is used that need to be updated too

with open_for_csv(record, 'r') as record_in:
with open_for_csv(temp_record, 'w+') as record_out:
with adjacent_tmp_file(record, mode='w') as record_out:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to change w+ to w?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, are we sure that adjacent_tmp_file is equivalent to open_for_csv for our purpose here? (I've not investigated myself yet)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to change w+ to w?

We are not reading anything from this file so why request the extra permissions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, are we sure that adjacent_tmp_file is equivalent to open_for_csv for our purpose here? (I've not investigated myself yet)

All open_for_csv does is append some parameters to open() to account for differences between different versions of Python.

adjacent_tmp_file calls NamedTemporaryFile which in turn calls _sanitize_params, calls close() on exit (in addition to that of file) and ensures file is written to disk before it's closed.

Tests would not pass if binary mode was used even with Python 2 but they do otherwise (on both versions). So I do not see any obstacles to replacing open_for_csv.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting the open parameters correct for CSV files is fairly subtle, and having open_for_csv is important to make sure it's easy for people to get it correct even if they aren't aware of platform differences. Having text mode and newline='' is crucial for correct handling on Windows under Python 3 (and having binary mode is crucial on Python 2). So I'd be concerned that the switch to adjacent_tmp_file is generating a CSV file with incorrect line endings. Do we know for sure that we have a test that checks this?

Note: I don't actually know (without doing some research) the precise "rules"1 for the format - but I do know that the docs in the CSV module (in particular the footnote here) were arrived at after quite a lot of frustrating investigation into weird bugs.

1 CSV is not a particularly well defined format, so they aren't actually so much rules as "hard-won knowledge about what works" 🙄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the link to the newline comment and this is addressed by the io module internally. I do not see where it says Python 2 insists on binary mode. A simple test shows:

Python 2.7.10 (default, Feb 22 2019, 21:17:52) 
[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
>>> with open('/tmp/bla', 'w') as f:
...     c = csv.writer(f)
...     c.writerow(['hello'])
... 
>>> with open('/tmp/bla', 'r') as f:
...     c = csv.reader(f)
...     print '\n'.join(','.join(l) for l in c)
... 
hello
>>> 

Binary also works:

>>> with open('/tmp/bla', 'rb') as f:
...     c = csv.reader(f)
...     print '\n'.join(','.join(l) for l in c)
... 
hello

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per documentation for open():

Thus, when opening a binary file, you should append 'b' to the mode value to open the file in binary mode, which will improve portability. (Appending 'b' is useful even on systems that don’t treat binary and text files differently, where it serves as documentation.) See below for more possible values of mode.

Since we only ever deal with text data in CSVs there is no reason why it should matter. All Python documentation uses b everywhere because in Python 2 it essentially had no adverse effect. It was required for some systems and harmless for others.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.python.org/2/library/csv.html#csv.reader - "If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference". You're running on MacOS, which is a platform where it doesn't make a difference. SO it's not surprising you can't demonstrate an issue (your specific test does work on Windows, but again, that just means I suspect your test is incomplete - I have had actual, real-life errors caused by failing to use binary mode on Windows).

Anyhow, regardless of how much we debate test cases that demonstrate one thing or another, that's irrelevant, We should follow the documented requirements of the CSV module, not waste time debating whether they are needed. If you feel sufficiently strongly, by all means raise a CPython bug that claims the requirements in the docs are obsolete and should be removed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So then I can extract the version -> mode part from open_for_csv and call it there in addition to adjacent_tmp_file(file, 'w' + get_mode()).

Copy link
Member

@chrahunt chrahunt Feb 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth I did this in our recent wheel builder helper test code, which was the easiest way I could find that was compatible.

This would also avoid needing to make changes to adjacent_tmp_file.

Copy link
Member

@pfmoore pfmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment about open_for_csv doesn't appear to have registered in github as a review, so referencing it here.

@pradyunsg
Copy link
Member

pradyunsg commented Mar 19, 2020

My suggestion is to use pip._internal.utils.temp_dir.TempDirectory for this, like so:

record = os.path.join(dest_info_dir, 'RECORD')
temp_dir = TempDirectory(kind="record", globally_managed=True)
temp_record = os.path.join(temp_dir.path, 'RECORD') 

And let the rest of the code stay as-is for now.

@BrownTruck
Copy link
Contributor

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

@BrownTruck BrownTruck added the needs rebase or merge PR has conflicts with current master label Apr 2, 2020
@pradyunsg
Copy link
Member

This PR was superceded by #7929. Many thanks for filing this PR @rudolfovic!

Let us know if you'd like a different issue to contribute on! ^>^

@pradyunsg pradyunsg closed this Apr 2, 2020
@lock lock bot added the auto-locked Outdated issues that have been locked by automation label May 5, 2020
@lock lock bot locked as resolved and limited conversation to collaborators May 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation needs rebase or merge PR has conflicts with current master skip news Does not need a NEWS file entry (eg: trivial changes)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use better temporary files mechanism in install_unpacked_wheel
7 participants