Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CEP for specifying the location of site-packages in repodata #90

Merged
merged 23 commits into from
Oct 22, 2024
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
162 changes: 162 additions & 0 deletions cep-999.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
<table>
<tr><td> Title </td><td> Optional python site-packages path in repodata </td>
<tr><td> Status </td><td> Proposed </td></tr>
<tr><td> Author(s) </td><td> Jonathan Helmus &lt;[email protected]&gt;</td></tr>
<tr><td> Created </td><td> Sep 13, 2024</td></tr>
<tr><td> Updated </td><td> Sep 13, 2024</td></tr>
<tr><td> Discussion </td><td> https://github.com/conda/ceps/pull/90 </td></tr>
<tr><td> Implementation </td><td> NA </td></tr>
</table>

To avoid confusion this document will use a stylized `python` to indicate a conda package with that name.
Other uses of “Python” will be followed by a qualifier, like "programming language" or "interpreter".
Specific implementations of the Python programming language will be referred to by name, such as CPython or PyPy.

## Abstract

We propose adding a new optional field in repodata.json that `python` packages can use to specify the destination path when installing `noarch: python` packages.
jjhelmus marked this conversation as resolved.
Show resolved Hide resolved

## Background

In the conda ecosystem the package named `python` provides, either directly or via a dependency, an interpreter for the Python programming language.
This interpreter is often CPython but can also be alternative implementations such as PyPy or GraalPy.

Tools like conda, mamba and pixi support installing `noarch: python` packages into conda environments.
Whereas the content of other conda packages are linked into the target environment based upon their path within the package, files within the "site-packages" directory of `noarch: python` packages are linked into the site-packages directory of the target environment.

In conda 24.7.1 and many other versions the location of the site-packages directory within the environment is "Lib/site-packages" on Windows and "lib/pythonX.Y/site-packages" on other platforms where X and Y and the major and minor versions of the `python` package installed in the environment.
jjhelmus marked this conversation as resolved.
Show resolved Hide resolved
These paths are the defaults for CPython but are not the correct paths for alternative implementation of the Python programming language or for some configurations of CPython.
For example PyPy uses "lib/pypyX.Y/site-packages" on POSIX systems.
jjhelmus marked this conversation as resolved.
Show resolved Hide resolved

In these cases kludgy workarounds, [often a symlink](https://github.com/conda-forge/pypy3.6-feedstock/blob/ac4201e80432469971626d277718edf80b22d6ef/recipe/build.sh#L102), are used in the `python` package to support the installation of `noarch: python` packages.
In the next section a new field will be proposed that allows `python` package to explicitly declare the location of the site-packages directory to avoid the need for these work-arounds.

## Specification

An optional field `python_site_packages_path` may be included in a conda package’s info/index.json file as a string and in the corresponding repodata.json entry for the package.
jjhelmus marked this conversation as resolved.
Show resolved Hide resolved
When defined, this field specifies the path of the Python interpreter’s site-packages directory relative to the root of the environment.
If a `python` package that includes this field is installed into an environment, all `noarch: python` packages that are installed into the environment will have the files in their "site-packages" directory linked into the path specified by this field.

A value of "null" can be used to indicate that this field is not specified.
It is acceptable to not include the field in repodata.json in this case.
jjhelmus marked this conversation as resolved.
Show resolved Hide resolved

If this field is missing or specified as "null" a default site-packages path will be used.
jjhelmus marked this conversation as resolved.
Show resolved Hide resolved
This path is "Lib/site-packages" if the environment targets a Windows platform and "lib/pythonX.Y/site-packages" on other platforms where X and Y and the major and minor versions of the `python` package installed in the environment.
jjhelmus marked this conversation as resolved.
Show resolved Hide resolved
This matches the current behavior of conda, mamba and pixi.

This field should only be included in the metadata for `python` packages but tools must not fail if packages with other names include this field, rather the entry should be ignored.

To avoid unnecessarily increasing the size of repodata.json, it is highly recommended that this field only be included in `python` packages where it is required, that is where the site-packages directory is not the default value.
jjhelmus marked this conversation as resolved.
Show resolved Hide resolved
Packages with other names or `python` packages that use the default site-packages path should not include this field.

This path must not point to a location outside of the root of the environment (e.g. it cannot navigate up a directory or be an absolute path).
If a package specifies a path which violates this requirement the package must not be installed and an appropriate error should be shown to the user.

A possible way to check the validity of the `python_site_packages_path` field is with this function:

``` Python
def is_valid(target_prefix: str, python_site_packages_path: str) -> bool:
target_prefix = os.path.realpath(target_prefix)
full_path = os.path.realpath(os.path.join(target_prefix, python_site_packages_path))
test_prefix = os.path.commonpath((target_prefix, full_path))
return test_prefix == target_prefix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def is_valid(target_prefix: str, python_site_packages_path: str) -> bool:
target_prefix = os.path.realpath(target_prefix)
full_path = os.path.realpath(os.path.join(target_prefix, python_site_packages_path))
test_prefix = os.path.commonpath((target_prefix, full_path))
return test_prefix == target_prefix
from pathlib import Path
def is_valid(target_prefix: str, python_site_packages_path: str) -> bool:
target_prefix = Path(target_prefix).resolve()
full_path = (target_prefix / python_site_packages_path).resolve()
return target_prefix in full_path.parents

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This style of check would require resolving target_prefix as well as it needs to take into account symbolic links and various re-directs on Windows file systems.
The function as written in the CEP closely matches the filter used by by tarfile module in the CPython standard library for validating paths, https://github.com/python/cpython/blob/v3.13.0/Lib/tarfile.py#L816-L818

```

When a package which includes this optional field is indexed the `python_site_packages_path` field will be included in the repodata entry for the package.
For example the repodata entry for a python-3.13.0rc1 package using the free-threading build configuration might look as follows:
jjhelmus marked this conversation as resolved.
Show resolved Hide resolved

```
"python-3.13.0rc1-haa6bb3f_0_cpython_cp313t.tar.bz2": {
"build": "haa6bb3f_0_cpython_cp313t",
"build_number": 0,
"depends": [
"bzip2 >=1.0.8,<2.0a0",
"expat >=2.6.2,<3.0a0",
"libffi >=3.4,<4.0a0",
"ncurses >=6.4,<7.0a0",
"openssl >=3.0.14,<4.0a0",
"readline >=8.1.2,<9.0a0",
"sqlite >=3.45.3,<4.0a0",
"tk >=8.6.14,<8.7.0a0",
"tzdata",
"xz >=5.4.6,<6.0a0",
"zlib >=1.2.13,<1.3.0a0"
],
"license": "PSF-2.0",
"license_family": "PSF",
"timestamp": 1722610680768,
"track_features": "free-threading",
"md5": "c09289eb86239e1221533457d861f1a3",
"name": "python",
"size": 16332720,
"subdir": osx-arm64,
"version": "3.13.0rc1",
"sha256": "fa0ae22c13450fe6c30c754ee5efbd7fe7e7533b878d7be96e74de56211d19df",
"python_site_packages_path": "lib/python3.13t/site-packages"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"python_site_packages_path": "lib/python3.13t/site-packages"
"python_site_packages": "lib/python3.13t/site-packages"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field specified in the CEP is python_site_packages_path with _path

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, suggestion is to rename without _path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to either name. Does anyone have a strong opinion one way or another?

},
```

This field will also be included in repodata derivatives, like current_repodata.json or sharded repodata when appropriate.
jjhelmus marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current_repodata.json is less important ever since the libmamba solver has been rolled out, do you think it would be a dealbreaker if we'd not support this in that file or am I jumping the gun on this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a dealbreaker to not include the field in current_repodata.json. The field should be included in a python shard of repodata if/when that becomes a reality. I don't see including this as a big lift in either case as the field is tied to the package entry from which I expect both to be derived.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Individual entries in current_repodata and sharded repodata will be the same as the ones in monolithic repodata.


Because this field is present in repodata.json it can be hot-fixed to correct mistakes or omissions.
jjhelmus marked this conversation as resolved.
Show resolved Hide resolved
This allows existing `python` packages to retroactively specify the locations of their site-packages directory.

## Motivation

This proposal was motivated by the free-threading configuration of CPython 3.13 which uses a different site-packages location on POSIX systems.
Specifically, the free-threading build uses "lib/python3.13t/site-packages".
jjhelmus marked this conversation as resolved.
Show resolved Hide resolved

A symlink between these paths or logic in [sitecustomize.py](https://docs.python.org/3/library/site.html#module-sitecustomize) allows packages installed into the incorrect site-packages directory to operate but this is less than ideal.

For additional details and discussion see [conda issue 14053](https://github.com/conda/conda/issues/14053).

Although this is motivated by the free-threading configuration of CPython, the same underlying issue occurs in many non-CPython implementations.
This includes both PyPy and GraalPy which use different paths for site-packages than CPython.
The conda packages in the conda-forge channel for both of these use symlinks to work around tools which link files into the incorrect site-packages directory.

## Security

Because this proposal can change the location where files are installed it is worth considering what, if any, security implementation this change entails.

A primary security concern is that by allowing the `python` package to specify where files will be installed, a malicious package could write or overwrite files with malicious content.

If files can only be installed into the environment where the package is installed, the damage from an attack is limited since a malicious package can already install files anywhere in the environment.

For example a malicious `python` package could specify an invalid site-packages path, in which case `noarch: python` packages installed in the environment would not work.
This is inconvenient but not a security concern and should be addressed by removing or disabling the package in question.

The malicious package could also specify a site-packages path that would cause files from `noarch: python` packages to populate or replace key files in the environment.
But this type of attack is already possible, in fact the `python` package itself could include these files.

Of more concern is the possibility of a package writing files outside of the environment.
This could replace critical system files with malicious content.
Because of this risk, paths which are outside of the environment are invalid as `python_site_packages_path` entries.
Tools must check that the field is valid and error out when it is invalid.

## Backwards Compatibility

This change maintains backwards compatibility.
All existing `python` packages do not include this field.
The behavior when this field is not specified matches what is currently done by conda, mamba and pixi.
The only behavior change occurs when this field is included.

Because existing releases of conda, mamba and pixi do not support this field, `python` packages should continue to provide work around to allow files to be installed into an incorrect site-packages directory until such time as this proposal has been implemented and available for a sufficient amount of time.

## Other sections

A number of alternatives were discussed and considered to address this problem. These include:

* Using symlinks or a `sitecustomize.py` file to allow linking into the incorrect site-packages directory.
This is a work-around for the problem, not a fix.
* Querying the interpreter to determine the site-packages directory.
This was rejected as it requires running the interpreter as part of the install step and requires that the `python` package be installed and operational before `noarch: python` packages can be linked.
* Using the build string of the `python` or `python_abi` package to determine the site-packages location.
This is an indirect method of obtaining information that can be more effectively specified explicitly.
* Storing this information is another metadata file, such as "about.json" or "paths.json".
jjhelmus marked this conversation as resolved.
Show resolved Hide resolved
The downside of this approach is that the data cannot be hotfixed and the path for the site-packages directory would not be known until the `python` package is downloaded and unpacked, potentially limiting the order in which packages can be linked.

For more alternatives and discussion see [conda issue 14053](https://github.com/conda/conda/issues/14053).

## Copyright

All CEPs are explicitly [CC0 1.0 Universal](https://creativecommons.org/publicdomain/zero/1.0/).