Skip to content

Latest commit

 

History

History
252 lines (196 loc) · 10.5 KB

pip.md

File metadata and controls

252 lines (196 loc) · 10.5 KB

Pip

⚠️ Cachito's way of supporting Git and HTTP(S) dependencies is currently only compatible with pip >= 10.0

This document describes some of the more intricate details of Cachito support for pip. For a high level overview, look here in the README.

Cachito has a number of specific requirements when it comes to pip packages. Some of those stem from the general ideas behind Cachito (e.g. reproducibility), some from the technical challenges of supporting a packaging system which defines most metadata through a Python executable. Read on for more details.

requirements.txt

One of the main component of a pip package is the requirements.txt file. Typically, the file might look something like this:

requests
git+https://github.com/containerbuildsystem/dockerfile-parse
https://github.com/containerbuildsystem/operator-manifest/archive/v0.0.3.zip

The dependencies in this file are:

  • requests, a PyPI dependency
  • dockerfile-parse, a Git dependency
  • operator-manifest, an HTTPS dependency

Git and HTTP(S) dependencies will henceforth collectively be referred to as "external."

Pinning versions

To make sure builds are reproducible, Cachito will require that all dependencies be pinned to a specific version.

For PyPI dependencies, use the == operator:

requests==2.24.0

For Git dependencies, specify the commit hash:

git+https://github.com/containerbuildsystem/dockerfile-parse@<full-commit-hash>

For HTTP(S) dependencies, include the hash of the source archive using #cachito_hash:

https://github.com/containerbuildsystem/operator-manifest/archive/v0.0.3.zip#cachito_hash=sha256:<full-sha-digest>

Specifying dependencies

In addition to specifying direct dependencies, recursive dependencies also need to be explicitly defined for two reasons:

  1. Further enable reproducibility by explicitly specifying every needed package
  2. Prevent the need for remote execution of setup.py

pip-compile

While this might be onerous to manually maintain, pip-compile from pip-tools can be used to automate this process for you using the following procedure.

  1. rename requirements.txt to requirements.in (by convention)
  2. run pip-compile requirements.in -o requirements.txt

This is the output of the above command:

#
# This file is autogenerated by pip-compile
# To update, run:
#
#    pip-compile --output-file=requirements.txt requirements.in
#
certifi==2020.6.20        # via requests
chardet==3.0.4            # via requests
git+https://github.com/containerbuildsystem/dockerfile-parse  # via -r requirements.in
idna==2.10                # via requests
https://github.com/containerbuildsystem/operator-manifest/archive/v0.0.3.zip  # via -r requirements.in
requests==2.24.0          # via -r requirements.in
ruamel.yaml.clib==0.2.2   # via ruamel.yaml
ruamel.yaml==0.16.12      # via operator-manifest
six==1.15.0               # via dockerfile-parse
urllib3==1.25.10          # via requests

As you can see, pip-compile gathered all the recursive dependencies and pinned all PyPI packages. It did not pin external dependencies, the mechanism for doing so is specific to Cachito. You can pin these beforehand in the requirements.in file, but if any of the recursive dependencies are external, you may need to edit the generated file anyway.

Note that pip-compile considers some packages "unsafe" in a requirements file (e.g. setuptools). If you do use these packages as runtime dependencies, you will need to pass the --allow-unsafe flag to pip-compile. If you only use them as build time dependencies, you will need to put them in a separate requirements file as described in Build dependencies.

Explicit package names

Cachito needs to know the package name for all of your dependencies. For PyPI dependencies, this is trivial, as the name is already present in the requirements file. For external dependencies, resolving the name may require executing the setup.py file. Cachito does have a mechanism for extracting package metadata from setup.py, but it is very limited. That is why, for external dependencies, you will need to explicitly specify package names using one of the mechanisms that pip supports.

a) use @:

<package-name> @ git+https://github.com/namespace/repo

b) use #egg:

git+https://github.com/namespace/repo#egg=<package-name>

Similarly to the procedure used for pinning external dependency versions, you can specify explicit package names in requirements.in to avoid having to edit the file generated by pip-compile. However, pip-compile seems to ignore the @ mechanism, so using #egg may be preferrable.

After pinning versions and specifying package names for external dependencies, the requirements.in file at the top of this section would look like this:

requests==2.24.0
git+https://github.com/containerbuildsystem/dockerfile-parse@<full-commit-hash>#egg=dockerfile-parse
https://github.com/containerbuildsystem/operator-manifest/archive/v0.0.3.zip#egg=operator-manifest&cachito_hash=sha256:<full-sha-digest>

Hash checking

In general, Cachito handles hash checking the same way that pip does. If --require-hashes is present in the requirements file, or if any dependency uses the --hash option, Cachito will require that all dependencies specify a hash and will check that the hashes are valid.

For HTTP(S) dependencies, Cachito will always require a hash and will always validate it. You can provide it using --hash, but as mentioned above, that will turn on hash checking for all your dependencies. If that is not desirable, use the Cachito-specific #cachito_hash URL fragment as shown in the HTTP(S) dependencies example in the Pinning versions section.

Build dependencies

Setuptools provides a way to specify build dependencies via the setup_requires keyword argument. It is deprecated in favor of the PEP-518 approach but, for similar reasons as mentioned in the sections above, Cachito supports neither. If you have any build-only dependencies, you will need to put them in a requirements-build.txt file which follows the same rules as requirements.txt.

There are two implications which may not be immediately obvious for build requirements files:

  1. you need to specify all the runtime and build dependencies for each direct build dependency (recursively)
  2. you need to repeat the above for all your recursive runtime dependencies

You can use the pip_find_builddeps.py script to find all the build dependencies you will need. Here is how you would use it:

  1. set up requirements.txt as described above
  2. if you have any direct build dependencies, put them in requirements-build.in
  3. run pip_find_builddeps.py requirements.txt -o requirements-build.in --append
  4. run pip-compile requirements-build.in -o requirements-build.txt --allow-unsafe

You could also use this script as pre-commit hooks. To do so, copy pip_find_builddeps.py create a .pre-commit-hooks.yaml with the follwing:

id: update-build-requirements
  name: update-build-requirements
  description: find build dependencies with cachito's pip_find_builddeps.py script
  entry: path/to/pip_find_builddeps.py
  language: python
  language_version: python3
  pass_filenames: false
  files: ^requirements.txt$
  args: ["requirements.txt", "-o", "requirements-build.in", "-a", "--only-write-on-update"]

...then add the following lines to .pre-commit-config.yaml:

repos:
  - repo: https://github.com/containerbuildsystem/cachito.git
    rev: ... # a sha or tag from cachito that contains the  .pre-commit-hooks.yaml file.
    hooks:
      - id: update-build-requirements
  - repo: https://github.com/jazzband/pip-tools
    rev: 6.8.0 # or whichever version you prefer
    hooks:
      - id: pip-compile
        name: pip-compile requirements-build.in
        args: [requirements-build.in, -o, requirements-build.txt, --allow-unsafe]

When building your app using the Cachito-provided content, you will need to make sure build dependencies are installed before runtime dependencies. If you use a packaging system, specify all the build dependencies in the proper location (e.g. options.setup_requires in setup.cfg or build_system.requires in pyproject.toml). If you do not, make sure to pip install the build requirements file(s) before the runtime requirements file(s).

setup.py, setup.cfg

Pip packages can define their metadata in two files -- setup.py or setup.cfg (or a combination of the two). Cachito will scan both of these files (if present) for the name and version of your package. If Cachito fails to resolve either of those values, the request will fail. More details about how (and to what extent) Cachito supports setup files can be found in the docstrings of the corresponding classes in pip.py: SetupPY, SetupCFG.

Support for setup.cfg is more complete and allows greater flexibility when defining the package version compared to setup.py. Nevertheless, both approaches are subject to some compromises on the Cachito side. If Cachito cannot resolve the metadata it needs, you may unfortunately need to make changes in your packaging code.

User configuration

Cachito allows you to configure some aspects of a request that uses the pip package manager. You can specify multiple subpackages within the source repository. For each subpackage, you can specify custom locations for your requirements and build requirements file(s). Below is an example request that uses all the available configuration options.

{
  "repo": "https://github.com/example/repo.git",
  "ref": "8adec82cf2fc557d23a6dac2563ed25bb0f46b72",
  "pkg_managers": ["pip"],
  "packages": {
    "pip": [
      {
        "path": ".",
        "requirements_files": ["requirements.txt", "requirements-extras.txt"]
      },
      {
        "path": "some/subpackage",
        "requirements_build_files": ["requirements-build-only.txt"]
      }
    ]
  }
}