Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow binding between source index and individual dependencies #1645

Closed
AdamJel opened this issue Jan 25, 2023 · 19 comments · Fixed by #2323
Closed

allow binding between source index and individual dependencies #1645

AdamJel opened this issue Jan 25, 2023 · 19 comments · Fixed by #2323
Labels
⭐ enhancement Improvements for existing features

Comments

@AdamJel
Copy link

AdamJel commented Jan 25, 2023

Scenario

It is already possible to define more then one pypi-like index source. Since v2.4.0 there are 2 ways to do that (AFAIK). In pyproject.toml and in config.toml.

It is also possible to define priority for these sources to some extend. In case of pyproject.toml, it is possible to utilize the order resolution. In case of config.toml, it is possible to inverse priority by defining internal/alternative source within [pypi] section and pypi itself in [pypi.extra] section.

This is insufficient, because there are lot of deficiencies in the dependency resolution process.

Consider this:

Lets use v2.4.0 and define internal server in config.toml in [pypi.extra] section. If the internal python package shares its name with some public package on pypi, pdm will resolve and download against pypi, which is wrong for the company. Naming differently is not the right approach, because it is not deterministic -> even though the company chooses unique name, there is no certainty, that the same name will not be published to pypi later.

So we swap the sections and define internal server in [pypi] instead. This solves the priority problem, but introduces http-rest flood problem. If the project uses 100 public packages and only 1 internal, then 99 packages will be first resolved against internal server with no success and only then against public pypi. Which is waste of network resources.

There are more subtle quirks as well. E.g. if access token for internal server expires, pdm resolution against internal server fails with credentials error, but the resolution continues and if there is public package on pypi with the same name, it falls-back to it and download and install it, which is adverse.

Describe the solution you'd like

All problems mentioned above and possibly even more edge cases I haven't bumped into yet, would be solved, if there was a way how to bind individual dependencies against specific (array of) indexes.

E.g. pdm add <package> --only-source=internal

Such flag would cause all related actions to be bonded exclusively to that source. So, pdm install; pdm lock; etc. would fail if that package was not found on that source, or if credentials didn't work, etc..

@AdamJel AdamJel added the ⭐ enhancement Improvements for existing features label Jan 25, 2023
@baggiponte
Copy link
Contributor

As a reference, poetry has a poetry add --source <mysource> package syntax (here full add API reference)

@frostming
Copy link
Collaborator

frostming commented Feb 9, 2023

Supplying the index with the command line option is not sufficient, we must also encode that in pyproject.toml and pdm.lock for reproducibility.

The problem is that we are following PEP 621 and every dependency specification is a string, compared to a table which is the case in Poetry. If we want to add the per-package index, it must reside in a separate table from dependencies, it doesn't seem that user-friendly to me. To illustrate, the file would look like the following:

[project]
dependencies = [
    "flake8",
    "pytest",
]

[[tool.pdm.source]]
name = "private"
url = "https://private.pypi.com/simple"
packages = [
    "flake8",   # requested dependency
    "pyflakes",   # transitive dependency
]

Here we have to specify the transitive dependencies explicitly since we can't assume it should inherit the parent's config. If you are working on a large project with hundreds of dependencies and more transitive dependencies, writing that list is a huge pain.

But as you can't achieve this with pip+requirements.txt, I'd rather keep this open than hurry to finish an awkward implementation, until an elegant solution was discussed through.

@baggiponte
Copy link
Contributor

There is no standard for specifying dependencies like when you do with git+{https,ssh}, like so?

[project]
dependencies = [
    "flake8",
    "pytest @ ${USER}:${PASSWORD}@private/pypi",
]

My gut feeling is no, and I also suspect it would not be PEP compliant because the fields in [project] must be static, am I wrong? (This issue makes me think this is something that should be regulated in a PEP - which I am afraid will be a huge undertaking)

@frostming
Copy link
Collaborator

yes, we can't put custom fields in project table.

@baggiponte
Copy link
Contributor

baggiponte commented Feb 14, 2023

What if pdm tried all indexes for the first round of resolution and then remembers where it collected the package from? In other words: if I installed package x from PyPI and y from my private index:

  • run pdm add x y:
  • pdm tries installing x from the private index and succeds
  • pdm will remember to scan just the private index for x.
  • pdm tries installing y from the private index and fails
  • pdm tries installing y from PyPI and succeds
  • pdm will remember to scan just PyPI for y.

EDIT: this assumes that once found in an index all the possible package versions will reside in that index. Might not cover all edge cases, but seems reasonable. Also, the first "check to what index the package belongs to and cache it" only applies to the command and is not stored anywhere.

@mikenerone
Copy link
Contributor

Just want to chime in that I kinda desperately want this feature, as well, for performance reasons similar to what's described in #1509, though I prefer explicit package->index binding as a solution. Also, in response to @frostming:

Here we have to specify the transitive dependencies explicitly since we can't assume it should inherit the parent's config. If you are working on a large project with hundreds of dependencies and more transitive dependencies, writing that list is a huge pain.

IMO, there's no need to assume anything about subdependencies. When adding an index-bound dep, subdependencies should not automatically be bound to the same index.

  1. 99% of subdependencies are on public packages, so binding them is decidedly inconvenient.
  2. When a subdep should be bound to an index, the user can always add it as a direct dependency to achieve that.
  3. As a datapoint, Poetry makes no such assumption, and people seem pretty happy with it. It always worked for me (though I lean toward pdm for other reasons).

@sanmai-NL
Copy link
Contributor

@mikenerone Specifying a transitive dependency as direct dependency muddies the dependency status. Maintainers and tools that detect unused dependencies would suffer.

@mikenerone
Copy link
Contributor

mikenerone commented Mar 29, 2023

@mikenerone Specifying a transitive dependency as direct dependency muddies the dependency status. Maintainers and tools that detect unused dependencies would suffer.

I don't disagree - it's not ideal, but assuming that transitive deps should also be index-bound would be even less ideal. A guess a "better" alternative to adding them as direct, at least as far as this concern goes, would be to simply leave it as a manual editing task to add transitive deps to the appropriate tool.pdm.source[].packages list. Having a command to manage these would be fine, too, but IMO, in any case, it should not be done automatically when adding an index-bound dep that actually is direct (and personally, I don't feel a command is necessary, especially for an initial pass at support for this).

@baggiponte
Copy link
Contributor

I guess a PEP proposal came to the rescue? https://peps.python.org/pep-0710/

@baggiponte
Copy link
Contributor

This is extremely cool, thanks!

@frostming
Copy link
Collaborator

This is extremely cool, thanks!

Oh, did you review the change? any feedback is welcome.

@baggiponte
Copy link
Contributor

baggiponte commented Oct 20, 2023

I had a look at the source code, did not get a chance to test it yet because I am no longer working with a company with a private index. When I have the time I will try out with a local PyPI server 🚀

EDIT: clever solution by the way! I would add a couple of recommendations whether such configs should be in pyproject.toml or .pdm.toml and be kept private.

@GabDug
Copy link
Contributor

GabDug commented Oct 20, 2023

Nice work! We have the use-case as well, I'll try it today or next week. I agree with @baggiponte, the recommendations on where to put the config (global? pyproject.toml?) would be nice to have.

@AdamJel
Copy link
Author

AdamJel commented Oct 23, 2023

Same here - haven't had the time yet to test it out, but I am so happy for this. Hopefully I'll find the time this week and leave a feedback here after. Thank you

@sanmai-NL
Copy link
Contributor

@frostming I have tested it. I observe that perhapsinclude_packages maps the packages specified there to an index, but that in itself doesn't map other packages to PyPI/a default index. How to solve that?

[[tool.pdm.source]]
include_packages = [
  "torch-cuda80",
  "torch-model-archiver",
  "torch-tb-profiler",
  "torch",
  "torchaudio",
  "torchcsprng",
  "torchdata",
  "torchdistx",
  "torchmetrics",
  "torchrec-cpu",
  "torchrec",
  "torchserve",
  "torchtext",
  "torchvision",
]
name = "PyTorch"
url = "https://download.pytorch.org/whl/cpu"
verify_ssl = true
pdm.termui:   Adding requirement mysql-connector-python>=8.2.0
unearth.collector: Failed to collect links from https://download.pytorch.org/whl/cpu/mysql-connector-python/: Client Error(403): Forbidden
pdm.termui:   Adding requirement opencv-python>=4.8.1.78
unearth.collector: Failed to collect links from https://download.pytorch.org/whl/cpu/opencv-python/: Client Error(403): Forbidden
pdm.termui:   Adding requirement python-dotenv>=1.0.0
unearth.collector: Failed to collect links from https://download.pytorch.org/whl/cpu/python-dotenv/: Client Error(403): Forbidden
pdm.termui:   Adding requirement supervision>=0.16.0
unearth.collector: Failed to collect links from https://download.pytorch.org/whl/cpu/supervision/: Client Error(403): Forbidden
pdm.termui:   Adding requirement ultralytics>=8.0.201

@lmmx
Copy link

lmmx commented Nov 11, 2023

@sanmai-NL I would open a new issue to help Frost triage as this feature request ticket is closed as complete.

I just installed torch with the following config and it worked:

[[tool.pdm.source]]
name = "PyTorch"
url = "https://download.pytorch.org/whl/cu118"
include_packages = ["torch"]

This achieves the same result as the official PyTorch: Start Locally help page suggests via --index-url:

pip install torch --index-url https://download.pytorch.org/whl/cu118

I also note that when I publish this as a package, and other '2nd degree' packages include my package as a dependency, those 2nd degree packages don't 'inherit' the index URL config (i.e. they pull torch from the regular PyPI index URL), which is reasonable to expect, and can be overridden in those downstream projects' TOML configs likewise.

Also, for reference this was discussed on the official Python discussion forum: https://discuss.python.org/t/how-to-specify-extra-index-in-a-pyproject-toml-for-pip-and-pip-tools/23592/6

@polarathene
Copy link

I observe that perhapsinclude_packages maps the packages specified there to an index, but that in itself doesn't map other packages to PyPI/a default index.
How to solve that?

https://pdm-project.org/en/latest/usage/config/#respect-the-order-of-the-sources

[tool.pdm.resolution]
respect-source-order = true

include_packages will then be used for that source but those not specified in include_packages will default to PyPi first, so no error logged like you experienced.

  • Note that torch would depend on nvidia packages (for say whl/cu121 instead of whl/cpu), but instead of resolving those from whl/cu121 index, you'd get the PyPi packages instead since these are implicit/transitive deps.
  • That can be an issue due to version mismatch (eg: CUDA 12.5 packages instead of CUDA 12.1 that were intended for that torch index and package), thus when this matters be sure to add these into the include_packages with a pattern like nvidia-*.

This will also affect some other behaviours in PDM such as pdm outdated output for latest: #2962 (comment)

@nightblure
Copy link

@sanmai-NL I would open a new issue to help Frost triage as this feature request ticket is closed as complete.

I just installed torch with the following config and it worked:

[[tool.pdm.source]]
name = "PyTorch"
url = "https://download.pytorch.org/whl/cu118"
include_packages = ["torch"]

This achieves the same result as the official PyTorch: Start Locally help page suggests via --index-url:

pip install torch --index-url https://download.pytorch.org/whl/cu118

I also note that when I publish this as a package, and other '2nd degree' packages include my package as a dependency, those 2nd degree packages don't 'inherit' the index URL config (i.e. they pull torch from the regular PyPI index URL), which is reasonable to expect, and can be overridden in those downstream projects' TOML configs likewise.

Also, for reference this was discussed on the official Python discussion forum: https://discuss.python.org/t/how-to-specify-extra-index-in-a-pyproject-toml-for-pip-and-pip-tools/23592/6

do you include the torch package in the project.dependents list or is it only in the source section?

@erklem
Copy link

erklem commented Sep 5, 2024

@sanmai-NL I would open a new issue to help Frost triage as this feature request ticket is closed as complete.

I just installed torch with the following config and it worked:

[[tool.pdm.source]]
name = "PyTorch"
url = "https://download.pytorch.org/whl/cu118"
include_packages = ["torch"]

This achieves the same result as the official PyTorch: Start Locally help page suggests via --index-url:

pip install torch --index-url https://download.pytorch.org/whl/cu118

I also note that when I publish this as a package, and other '2nd degree' packages include my package as a dependency, those 2nd degree packages don't 'inherit' the index URL config (i.e. they pull torch from the regular PyPI index URL), which is reasonable to expect, and can be overridden in those downstream projects' TOML configs likewise.

Also, for reference this was discussed on the official Python discussion forum: https://discuss.python.org/t/how-to-specify-extra-index-in-a-pyproject-toml-for-pip-and-pip-tools/23592/6

Thanks, @lmmx! This is exactly what I wanted to find. Search results only pointed me to #1032, #2982 but this is really the answer I needed. Works great! More details in the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⭐ enhancement Improvements for existing features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants