Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to using a virtual environment for app dependencies #253

Closed
edmorley opened this issue Aug 29, 2024 · 1 comment · Fixed by #257
Closed

Switch to using a virtual environment for app dependencies #253

edmorley opened this issue Aug 29, 2024 · 1 comment · Fixed by #257
Assignees
Labels
enhancement New feature or request semver: major

Comments

@edmorley
Copy link
Member

edmorley commented Aug 29, 2024

When using a Dockerfile, the file content contributed by different steps in the build is split into different layers, which are then combined via use of an overlay filesystem. In this model, it's possible for multiple steps of the build to write to the same directory locations - albeit at the cost of changes in earlier layers triggering cache invalidation of later layers.

With CNBs, the file content contributed by different steps in the build (whether that be from separate buildpacks, or steps within the same buildpack) are kept separate via the concept of CNB layers:
https://buildpacks.io/docs/for-buildpack-authors/concepts/layer/

This provides several advantages (finer grained caching; easier multi-language images etc), however, to take full advantage of them we have to write the build content to separate layer directories.

For Python, this means we cannot simply install everything into the system site-packages directory (which lives inside the Python installation directory).

Until now, the way we've handled this is by:

  • Installing pip (and formerly setuptools/wheel) into the system site-packages directory
  • Installing the app dependencies into the user site-packages directory using pip install --user combined with PYTHONUSERBASE (which changes the user site-packages location from its default of the user home directory, to the location of the dependencies layer)

However, this has a number of downsides:

  1. Some packages are broken with --user installs when using relocated Python, and otherwise require other workarounds (such as setting PYTHONHOME). eg: Incorrect stdlib path for relocated Python installs, resulting in ModuleNotFoundError: No module named 'encodings' unbit/uwsgi#2525
  2. Several package managers don't support the equivalent of --user installs (such as Poetry or uv), meaning when we add support for them, we would have to use a different approach for them - which would then mean app dependency environments are set up differently depending on what package manager an app uses, which doesn't seem ideal.
  3. Python and pip have to exist in the same layer, which has a number of disadvantages (see Move pip into its own layer #254).

Given that PEP-405 style virtual environments (venvs) are:

  • very lightweight (they are only a few binary symlinks, activation scripts and empty directories vs the old style virtualenvs)
  • much more frequently used in the wild compared to say --user (and therefore the better tested path)

...then it makes more sense to use a venv for the app dependencies instead of a user install.

Note: We can't use PYTHONPATH instead of a user site-packages install, since any directories specified via PYTHONPATH are given a higher precedence in Python's sys.path than the Python stdlib (unlike system and user site-packages, which are added to sys.path after the Python stdlib). This can then cause hard to debug issues if apps use outdated backport libraries (which can often happen unintentionally via broken/suboptimal packages in their transitive dependency tree).

GUS-W-16616226.

@edmorley edmorley added enhancement New feature or request semver: major labels Aug 29, 2024
@edmorley edmorley self-assigned this Aug 29, 2024
@edmorley
Copy link
Member Author

edmorley commented Aug 30, 2024

One thing I forgot to add: Switching to venvs is now only possible because pip 22.3 added support for a new --python option (also usable via the PIP_PYTHON env var), which allows pip to manage an environment other than the one into which it was installed. Prior to that option existing, if we wanted to use a venv, we would have needed to install pip into that same venv as the app dependencies, meaning pip couldn't be cached (since we can't cache the app dependencies layer, given that installs without a lockfile are non-deterministic, and don't handle package removals etc).

See:

edmorley added a commit that referenced this issue Aug 30, 2024
App dependencies are now installed into a virtual environment (aka venv
or virtualenv) instead of into a custom user site-packages location.

This:
1. Avoids user site-packages compatibility issues with some packages
   when using relocated Python (see #253)
2. Improves parity with how dependencies will be installed when using
   Poetry in the future (since Poetry doesn't support `--user`)
3. Unblocks being able to move pip into its own layer (see #254)

This approach is possible since pip 22.3+ supports a new `--python`
/ `PIP_PYTHON` option which can be used to make pip operate against
a different environment to the one in which it is installed. This
allow us to continuing keeping pip in a separate layer to the app
dependencies (currently the Python layer, but in a later PR pip will
be moved to its own layer).

Now that app dependencies are installed into a venv, we no longer need
to make the system site-packages directory read-only to protect against
later buildpacks installing into the wrong location.

This has been split out of the Poetry PR for easier review.

See also:
- https://docs.python.org/3/library/venv.html
- https://pip.pypa.io/en/stable/cli/pip/#cmdoption-python

Closes #253.
GUS-W-16616226.
edmorley added a commit that referenced this issue Sep 4, 2024
The Python package manager Poetry is now supported for installing app
dependencies:
https://python-poetry.org

To use Poetry, apps must have a `poetry.lock` lockfile, which can be
created by running `poetry lock` locally, after adding Poetry config to
`pyproject.toml` (which can be done either manually or by using
`poetry init`). Apps must only have one package manager file (either
`requirements.txt` or `poetry.lock`, but not both) otherwise the
buildpack will abort the build with an error (which will help prevent
some of the types of support tickets we see in the classic buildpack
with users unknowingly mixing and matching pip + Pipenv).

Poetry is installed into a build-only layer (to reduce the final app
image size), so is not available at run-time. The app dependencies are
installed into a virtual environment (the same as for pip after #257,
for the reasons described in #253), which is on `PATH` so does not need
explicit activation when using the app image. As such, use of
`poetry run` or `poetry shell` is not required at run-time to use
dependencies in the environment.

When using Poetry, pip is not installed (possible thanks to #258), since
Poetry includes its own internal vendored copy that it will use instead
(for the small number of Poetry operations for which it still calls out
to pip, such as package uninstalls).

Both the Poetry and app dependencies layers are cached, however, the
Poetry download/wheel cache is not cached, since using it is slower than
caching the dependencies layer (for more details see the comments on
`poetry_dependencies::install_dependencies`).

The `poetry install --sync` command is run using `--only main` so as to
only install the main `[tool.poetry.dependencies]` dependencies group
from `pyproject.toml`, and not any of the app's other dependency groups
(such as test/dev groups, eg `[tool.poetry.group.test.dependencies]`).

I've marked this `semver: major` since in the (probably unlikely) event
there are any early-adopter projects using this CNB that have both a
`requirements.txt` and `poetry.lock` then this change will cause them to
error (until one of the files is deleted).

Relevant Poetry docs:
- https://python-poetry.org/docs/cli/#install
- https://python-poetry.org/docs/configuration/
- https://python-poetry.org/docs/managing-dependencies/#dependency-groups

Work that will be handled later:
- Support for selecting Python version via `tool.poetry.dependencies.python`:
  #260
- Build output and error messages polish/CX review (this will be performed
  when switching the buildpack to the new logging style).
- More detailed user-facing docs:
  #11

Closes #7.
GUS-W-9607867.
GUS-W-9608286.
GUS-W-9608295.
edmorley added a commit that referenced this issue Sep 4, 2024
The Python package manager Poetry is now supported for installing app
dependencies:
https://python-poetry.org

To use Poetry, apps must have a `poetry.lock` lockfile, which can be
created by running `poetry lock` locally, after adding Poetry config to
`pyproject.toml` (which can be done either manually or by using
`poetry init`). Apps must only have one package manager file (either
`requirements.txt` or `poetry.lock`, but not both) otherwise the
buildpack will abort the build with an error (which will help prevent
some of the types of support tickets we see in the classic buildpack
with users unknowingly mixing and matching pip + Pipenv).

Poetry is installed into a build-only layer (to reduce the final app
image size), so is not available at run-time. The app dependencies are
installed into a virtual environment (the same as for pip after #257,
for the reasons described in #253), which is on `PATH` so does not need
explicit activation when using the app image. As such, use of
`poetry run` or `poetry shell` is not required at run-time to use
dependencies in the environment.

When using Poetry, pip is not installed (possible thanks to #258), since
Poetry includes its own internal vendored copy that it will use instead
(for the small number of Poetry operations for which it still calls out
to pip, such as package uninstalls).

Both the Poetry and app dependencies layers are cached, however, the
Poetry download/wheel cache is not cached, since using it is slower than
caching the dependencies layer (for more details see the comments on
`poetry_dependencies::install_dependencies`).

The `poetry install --sync` command is run using `--only main` so as to
only install the main `[tool.poetry.dependencies]` dependencies group
from `pyproject.toml`, and not any of the app's other dependency groups
(such as test/dev groups, eg `[tool.poetry.group.test.dependencies]`).

I've marked this `semver: major` since in the (probably unlikely) event
there are any early-adopter projects using this CNB that have both a
`requirements.txt` and `poetry.lock` then this change will cause them to
error (until one of the files is deleted).

Relevant Poetry docs:
- https://python-poetry.org/docs/cli/#install
- https://python-poetry.org/docs/configuration/
- https://python-poetry.org/docs/managing-dependencies/#dependency-groups

Work that will be handled later:
- Support for selecting Python version via `tool.poetry.dependencies.python`:
  #260
- Build output and error messages polish/CX review (this will be performed
  when switching the buildpack to the new logging style).
- More detailed user-facing docs:
  #11

Closes #7.
GUS-W-9607867.
GUS-W-9608286.
GUS-W-9608295.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request semver: major
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant