Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run pip in isolated env by building zip #9689

Merged
merged 10 commits into from
Apr 3, 2021
Merged

Conversation

uranusjr
Copy link
Member

@uranusjr uranusjr commented Mar 5, 2021

This puts pip into a zip file, and change the isolated environment to run that instead of the pip installation in site-packages. This avoids other packages in site-packages to be injected into the isolated environment.

Fix #8214.

As discussed in #8214 (comment), this has a ~1s overhead for each call, so an sdist build can be slowed down for up to 2s (one for pyproject.toml and another for get_requires_for_build_wheel). There are possible optimisations, e.g. creating only one pip.zip for each pip invocation, but the globally managed state would introduce much complexity, so I’m not getting into them unless we decide this is the way to go.

@uranusjr uranusjr force-pushed the isolated-pip branch 2 times, most recently from 96c02ff to f53ccfb Compare March 5, 2021 09:36
@pfmoore
Copy link
Member

pfmoore commented Mar 5, 2021

In get-pip, we do some clever stuff to extract the certificate bundle, because openssl needs it to be an actual file. Is that likely to be a problem here? I've never actually had an issue where that was necessary myself, so I don't know why it's important, maybe someone who worked on get-pip would know?

@uranusjr
Copy link
Member Author

uranusjr commented Mar 5, 2021

There are some tests failing due to unexpected temp files at $TEMPDIR/pip/_vendor/certifi/cacert.pem, which seems related.

@pfmoore
Copy link
Member

pfmoore commented Mar 5, 2021

Ah, looks like certifi these days uses importlib.resources to get the cacert.pem file, which means it will unpack it on demand if necessary...

Which probably also means we can simplify the get-pip bootstrap, and maybe even just ship pip as a pyz for people who want that...

@uranusjr
Copy link
Member Author

uranusjr commented Mar 5, 2021

I read the implementation and unfortunately the logic only works on Python 3.7. So unless we add importlib-resources to the vendor list, we’ll still need to do the get-pip trick for this to work.

@uranusjr uranusjr marked this pull request as draft March 5, 2021 14:06
@uranusjr
Copy link
Member Author

uranusjr commented Mar 5, 2021

I tried simply copying the whole tree into a temp directory, the overhead is about twice as much (slightly under 2s per distribution), which could be a bit much?

Since we already have a real-file certificate bundle (in the original installation), I’m thinking we probably can just point --cert to that file to work around the issue.

@pfmoore
Copy link
Member

pfmoore commented Mar 5, 2021

Since we already have a real-file certificate bundle (in the original installation), I’m thinking we probably can just point --cert to that file to work around the issue.

That sounds like a better idea. I'm not keen on a 2s overhead, by the time you start adding in things like virus checkers, slow non-SSD disks, temp on NFS disks, old hardware, etc, these things get a lot worse than we expect, very fast 🙁

@uranusjr
Copy link
Member Author

uranusjr commented Mar 5, 2021

Huh, the test failures are about forking bomb pervention. It seems to me (I’m not very familiar with this part of pip and may be wrong) the requirement tracker is not correctly communicated between the parent and child pip, but can’t tell why making the child a zip would affect this 😞

@uranusjr
Copy link
Member Author

uranusjr commented Mar 5, 2021

I managed to fix the fork bomb, but the tests are still failing due to unexpected temp files. This is because requests calls certifi.where() unconditionally, and a temporary file is created to read the cert bundle even if we pass in --cert explicitly.

I think the only way to work around this is to patch certifi or requests to make where() always return parent’s bundle without triggering importlib.resources. But that is very hacky.

@uranusjr
Copy link
Member Author

uranusjr commented Mar 5, 2021

It’s ugly, but it’s working.

@uranusjr uranusjr marked this pull request as ready for review March 5, 2021 20:53
@pradyunsg pradyunsg closed this Mar 5, 2021
@pradyunsg pradyunsg reopened this Mar 5, 2021
@uranusjr uranusjr closed this Mar 6, 2021
@uranusjr uranusjr reopened this Mar 6, 2021
@pradyunsg
Copy link
Member

pradyunsg commented Mar 6, 2021

Hmm... I wonder if a better mechanism would be to have a variable somewhere that we patch for the child process?

That way we can have a branch that does things only in the subprocess.

@pfmoore
Copy link
Member

pfmoore commented Mar 6, 2021

So you're suggesting that the actual pip code carries a vendor patch here? That sounds reasonable, we could patch certifi to recognise when it's being run under an isolated pip. Maybe something like replacing

try:
    from importlib.resources import path as get_path, read_text
    ...
except ImportError:
    ...

with

class EarlyExit(Exception):
    """Hack to do an early exit from the try-except"""

try:
    if "PIP__ISOLATED__CERT__" in os.environ:
        def where():
            return os.environ["PIP__ISOLATED__CERT__"]
        raise EarlyExit

    from importlib.resources import path as get_path, read_text
    ...
except ImportError:
    ...
except EarlyExit:
    pass

The EarlyExit hack avoids us needing to re-indent basically the whole of the file, which would make the patch much harder to maintain...

Edit: Use an environment variable, so we don't need to hack pip's main function.

@pradyunsg
Copy link
Member

    if "PIP__ISOLATED__CERT__" in os.environ:

I'd even start the environment variable with an underscore, to make it clearer that it's not meant for "external users".

@uranusjr
Copy link
Member Author

uranusjr commented Mar 6, 2021

Sounds like a good idea, let me try this out.

@uranusjr uranusjr closed this Mar 10, 2021
@uranusjr uranusjr reopened this Mar 10, 2021
@pradyunsg pradyunsg closed this Mar 10, 2021
@pradyunsg pradyunsg reopened this Mar 10, 2021
@uranusjr
Copy link
Member Author

Tests seem to pass (finally).

@@ -190,8 +234,9 @@ def install_requirements(
args.append('--prefer-binary')
args.append('--')
args.extend(requirements)
extra_environ = {"_PIP_STANDALONE_CERT": where()}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, and I'm not suggesting that you try updating this PR, but I do wonder: can we use the --cert option that we have on the CLI to achieve this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works in practice, but some tests will fail because requests always triggers temp file creation even when the default cert is never used. The environ + certifi patch solution is the only way I can think of to prevent that temp file from being created entirely.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to fix the fork bomb, but the tests are still failing due to unexpected temp files. This is because requests calls certifi.where() unconditionally, and a temporary file is created to read the cert bundle even if we pass in --cert explicitly.

From #9689 (comment) in this thread

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that there's a reason we can't just mark these as "expected temp files" in the test suite, or a reason why the temp files are bad in real usage - and we're not introducing a complicated hack just to cover over a test issue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are tests on temp files being correctly cleaned up after build, and allowing temp files would defeat their purpose. Another way to do this is to ensure the copied cert file is also cleaned up by the PEP 517 process, but that would require even more patching in certifi.

@BrownTruck
Copy link
Contributor

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

@BrownTruck BrownTruck added the needs rebase or merge PR has conflicts with current master label Apr 2, 2021
The standalone pip doesn't have a correct sys.path when __main__.py
is invoked, and we'd be patching the wrong certifi at that point.
The sys.path is guaranteed to be correct when __init__.py is loaded
(otherwise we wouldn't be able to import it in the first place).
@pypa-bot pypa-bot removed the needs rebase or merge PR has conflicts with current master label Apr 2, 2021
@uranusjr uranusjr added this to the 21.1 milestone Apr 2, 2021
@uranusjr uranusjr merged commit 9e8a544 into pypa:main Apr 3, 2021
@uranusjr uranusjr deleted the isolated-pip branch April 3, 2021 15:36
chia7712 added a commit to apache/kafka that referenced this pull request May 29, 2021
The following error happens on my mac m1 when building docker image for system tests.

Collecting pynacl
  Using cached PyNaCl-1.4.0.tar.gz (3.4 MB)
  Installing build dependencies ... error
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python3 /usr/local/lib/python3.8/dist-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-k867aac0/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'setuptools>=40.8.0' wheel 'cffi>=1.4.1; python_implementation != '"'"'PyPy'"'"''
       cwd: None
  Complete output (14 lines):
  Traceback (most recent call last):
    File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
      return _run_code(code, main_globals, None,
    File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
      exec(code, run_globals)
    File "/usr/local/lib/python3.8/dist-packages/pip/__main__.py", line 23, in <module>
      from pip._internal.cli.main import main as _main  # isort:skip # noqa
    File "/usr/local/lib/python3.8/dist-packages/pip/_internal/cli/main.py", line 5, in <module>
      import locale
    File "/usr/lib/python3.8/locale.py", line 16, in <module>
      import re
    File "/usr/lib/python3.8/re.py", line 145, in <module>
      class RegexFlag(enum.IntFlag):
  AttributeError: module 'enum' has no attribute 'IntFlag'
  ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python3 /usr/local/lib/python3.8/dist-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-k867aac0/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'setuptools>=40.8.0' wheel 'cffi>=1.4.1; python_implementation != '"'"'PyPy'"'"'' Check the logs for full command output.

There was a related issue: pypa/pip#9689 and it is already fixed by pypa/pip#9689 (included by pip 21.1.1). I test the pip 21.1.1 and it works well on mac m1.

Reviewers: Ismael Juma <[email protected]>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

With Enum34 installed on python3.6+, pip picks up enum34 instead of the stdlib enum package
6 participants