Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global configuration of custom sources #5958

Open
2 tasks done
andreas-vester opened this issue Jul 6, 2022 · 20 comments
Open
2 tasks done

Global configuration of custom sources #5958

andreas-vester opened this issue Jul 6, 2022 · 20 comments
Labels
area/sources Releated to package sources/indexes/repositories kind/feature Feature requests/implementations

Comments

@andreas-vester
Copy link

andreas-vester commented Jul 6, 2022

  • I have searched the issues of this repo and believe that this is not a duplicate.
  • I have searched the documentation and believe that my question is not covered.

Feature Request

I am using private repositories to download dependencies as I am working behind a corporate firewall. According to the docs (https://python-poetry.org/docs/repositories/#install-dependencies-from-a-private-repository), I am supposed to put a private repo into pyproject.toml.

# pyproject.toml
[[tool.poetry.source]]
name = "foo"
url = "https://foo.bar/simple/"
default = true

If I do that, I am basically not able to commit pyproject.toml to GitHub as I don't want to reveal internal server names. Committing these information would be useless for others anyway, as they are not able to access our internal repos.

I am wondering if it would make sense to define private repos in poetry's own configuration on my local machine? Or maybe I can set some environment variables?

@andreas-vester andreas-vester added kind/feature Feature requests/implementations status/triage This issue needs to be triaged labels Jul 6, 2022
@andreas-vester
Copy link
Author

andreas-vester commented Aug 18, 2022

I am also wondering why we need to put the private repo into pyproject.toml at all?

We have to configure it in the actual poetry configuration, too (https://python-poetry.org/docs/repositories/#adding-a-repository)? Why can't we use this information?

Can't we just have a configuration for a repo to publish to and for repo(s) to download packages from without exposing any information in pyproject.toml?

@andreas-vester
Copy link
Author

Any kind of response would be appreciated.

@andreas-vester
Copy link
Author

If I can't get rid of the [[tool.poetry.source]] section in pyproject.toml, I am not able to commit my code to GitHub at all for the simple reason that corporate rules rightly forbid to publish internal server names on GitHub.

As a result, I can't use poetry.

I think this is an important issue/feature request as other developers working in corporations will probably face comparable restrictions.

@finswimmer
Copy link
Member

Hey,

to be honest I haven't used privat repos until now, so please forgive me if my question is might be naiv 😃

If the source of the package is not defined in the pyproject.toml, how do you avoid a supply chain attack? I mean, you have to be sure, that the source is configured everywhere, where one tries to install the package, otherwise the deps are received from PyPi and there can be a package with the same name, not under you control. Sounds like a good way to shoot yourself into the foot 🤔

fin swimmer

@PidgeyBE
Copy link

PidgeyBE commented Sep 6, 2022

I'm facing the same issues...
Would be great to have something similar as pip's --index-url or --extra-index-url ...

In this way @andreas-vester could install locally via poetry install --index-url https://company-pypi-mirror.

@PidgeyBE
Copy link

PidgeyBE commented Sep 6, 2022

@andreas-vester
I see poetry==1.2 passes env vars to pip: https://github.com/python-poetry/poetry/blob/1.2/src/poetry/utils/env.py#L1443
So probably you can set the env var PIP_INDEX_URL to guide pip/poetry to the right index?

@neersighted
Copy link
Member

It is possible we could allow configuring the implicit PyPI URL, but authentication would get messy. I'm still not 100% sure on if this is a useful/desirable feature as it decreases the repeatability of Poetry installations and may have security/correctness implications.

As far as configuring primary (--extra-index-url) sources through the environment, that seems a bit more fraught from a design standpoint, but also would require much less refactoring and is easier to expand existing test coverage to cover.

In short, what is desired here is something we could work towards, but I think we would want to think long and hard about Poetry fetching different code from different sources due to environmental factors (up to this point, everything Poetry fetches and installs is strictly defined in the lock file and pyproject.toml).

@andreas-vester
Copy link
Author

@finswimmer To be honest, I haven't thought about a supply chain attack because the packages on our corporate private repo are exactly the same as on pypi. In fact, we are maintaining a pypi mirror for the sole reason that it is forbidden to download software from the internet.

In this case, if I install, let's say, the pandas library from our private repo, it's the same package that you would install from pypi.

You're right in demanding that the dependencies must be reachable for everybody. That's probably the task of the package maintainer.

In my particular case, all the packages are freely available from pypi for those outside our corporate network. If this wouldn't be the case, it probably wouldn't make sense to take the project public in the first place. For example, you wouldn't publish a project that has dependencies from, let's say, a private GitHub repo, wouldn't you?

@andreas-vester
Copy link
Author

@neersighted I am not saying that there is an easy solution to this problem. All I can say is that putting a private repo into pyproject.toml doesn't make any sense as soon as you want to publish your project.

Let's assume I were allowed to place our private repo/server names in pyproject.toml and set it as the default source. As soon as you fork the repo from GitHub, you get a configuration that is not working on your end because you are technically not able to reach our internal server. You need pypi.

At this point in this discussion it is firstly important to me that we are on the same page with respect to acknowledging the fact that some corporate circumstances are restricting the use of poetry when it comes to public open source projects.

The next step would then be to find a suitable solution. At that point I can only make some suggestions and the poetry maintainers have to make a decision.

Some suggestions that come to my mind could involve the following:

  • Manage the sources for up- and downloading dependencies equally using poetry config.
  • Using env vars such as pip does (extra-index etc.)
  • Come up with a workaround, something like a Git hook, that removes the private source from the pyproject.toml before every commit and re-inserts it afterwards.
    -...

@andreas-vester
Copy link
Author

@PidgeyBE So probably you can set the env var PIP_INDEX_URL to guide pip/poetry to the right index?

I did try the following without success:

set PIP_INDEX_URL=https://<my_proxy_username>:<my_proxy_password>@foo.bar/simple/

When trying to add a package, poetry tries to reach pypi.

@brandon-leapyear
Copy link

brandon-leapyear commented Sep 9, 2022

This is an old work account. Please reference @brandonchinn178 for all future communication


We have to configure it in the actual poetry configuration, too

This isn't technically accurate. poetry source add ... just adds it to pyproject.toml, I believe. It only needs to be added to global configuration when publishing: https://python-poetry.org/docs/repositories/#publishable-repositories

If the source of the package is not defined in the pyproject.toml, how do you avoid a supply chain attack?

I'm not sure this needs to be handled by Poetry, though. All Poetry should care about is "Here are dependency requirements, here are the locked versions of all transitive dependencies, install those". If the user cares about supply chain attack, they should set the index url to a trusted url, but the point is that that trusted url should be swappable. Does it matter if user A's trusted url is different from user B's trusted url?

To me, it seems like the issue with not hardcoding a source comes down to two scenarios:

  1. If user A's index URL has a patched version of package A v1.0 and user B's index URL has the unmodified version of package A v1.0.
    • IMO, users should not do this, they should instead serve the patched v1.0 version as a separate v1.0-patch version, if their package does, indeed require the patch
  2. If package A is an internal package only served in user A's index URL, and not user B's index URL.
    • IMO, it would be a perfectly reasonable user experience for Poetry to say "cannot find package at " and for the user to realize "oh the package is internal and i need to use another index url". User A in this case should put in the README "this package depends on an internal package, so use ".

While "repeatability of Poetry installations" is definitely a good thing, IMO that should only be vis a vis locking transitive deps. It shouldn't matter if package A v1.1 came from PyPI or a company mirror; if package A v1.1 is the same in both PyPI or the mirror, it's just as repeatable if you change which source you install from.


My company's usecase is we have an Artifactory mirror that's externally facing, and an Artifactory mirror that is only exposed via our company VPN. The VPN mirror is useful for running installs within the AWS VPC (so requests can stay within the AWS region), but the external mirror is useful for devs, so we don't force devs to be on the VPN to install things.

We also have a lot of Python projects in our monorepo, and it's a bit annoying to have to copy-paste the same tool.poetry.source block in each of them. Ideally, we'd just set an env var or something that our devs can just set once and have it be used in any of the projects in the monorepo.

IMO the ideal workflow here would be:

  1. No tool.poetry.source in pyproject.toml or poetry.lock
  2. Look for sources in the following locations, in order:
    • --index-url flag, that can be given to any poetry command (repeatable)
    • POETRY_INDEX_URL env var (semicolon separated?)
    • $project/poetry.toml (i.e. poetry config --local): TOML list
    • $config_home/config.toml (i.e. poetry config): TOML list
  3. If no sources specified, use pypi.org

Authentication can be put directly into the url (e.g. https://myuser:[email protected]) or in $config_home/config.toml mapping domain to credentials

@shoreadmin
Copy link

It would be great if we could also set up this private repository for all users, rather than having to do it per-user. pip supports this with the index-url and cert fields in %PROGRAMDATA%\pip\pip.ini (on Windows).

@neersighted
Copy link
Member

I think there is certainly room for flexibility here, but any design is going to have to come after we first make the existing functionality more consistent/robust (e.g. as described in #5984 (comment), or an alternative).

To add new sources of config here or new behaviors before first making the existing functionality better would be an own goal. It will be much easier to reason about a safe/consistent design after refactoring of what we already have.

@brandon-leapyear
Copy link

brandon-leapyear commented Sep 23, 2022

This is an old work account. Please reference @brandonchinn178 for all future communication


For people looking for a workaround while this issue is still open, this is what I'm doing at my company (only guaranteed to work for Poetry 1.2.1):

  1. Download the diff below into patch-poetry-default-repo.diff
  2. cd $POETRY_HOME/venv/lib/python*/site-packages/poetry
  3. patch -p0 < patch-poetry-default-repo.diff
  4. Set POETRY_DEFAULT_REPO_URL to what was previously in [[tool.poetry.source]]
  5. (optional) Set POETRY_DEFAULT_REPO_NAME to the name of the source
    Useful if you have authentication:
    export POETRY_DEFAULT_REPO_NAME=foo
    export POETRY_HTTP_BASIC_FOO_USERNAME=...
    export POETRY_HTTP_BASIC_FOO_PASSWORD=...
  6. poetry lock --no-update

With these steps, you can now swap out the repo used for the source repo (replacing PyPI) without changing pyproject.toml, and poetry.lock will also be unaffected.

This is all very rough; you might have to tweak the diff to get it working for your specific use-case. But it should suffice until Poetry adds this functionality out of the box.

patch-poetry-default-repo.diff
--- factory.py
+++ factory.py
@@ -2,6 +2,7 @@ from __future__ import annotations
 
 import contextlib
 import logging
+import os
 import re
 import warnings
 
@@ -162,11 +163,28 @@ class Factory(BaseFactory):
                 io.write_line("Deactivating the PyPI repository")
         else:
             from poetry.repositories.pypi_repository import PyPiRepository
+            from poetry.repositories.legacy_repository import LegacyRepository
 
             default = not poetry.pool.has_primary_repositories()
+
+            # >>>>> PATCH https://github.com/python-poetry/poetry/issues/5958
+            default_repo_name = os.environ.get("POETRY_DEFAULT_REPO_NAME", "default")
+            default_repo_url = os.environ.get("POETRY_DEFAULT_REPO_URL")
+            if default_repo_url:
+                repo = LegacyRepository(
+                    default_repo_name,
+                    default_repo_url,
+                    config=config,
+                    disable_cache=disable_cache,
+                )
+            else:
+                repo = PyPiRepository(disable_cache=disable_cache)
+
             poetry.pool.add_repository(
-                PyPiRepository(disable_cache=disable_cache), default, not default
+                # PyPiRepository(disable_cache=disable_cache), default, not default
+                repo, default, not default
             )
+            # <<<<< ENDPATCH
 
     @classmethod
     def create_package_source(
--- packages/locker.py
+++ packages/locker.py
@@ -423,35 +423,37 @@ class Locker:
 
             data["extras"] = extras
 
-        if package.source_url:
-            url = package.source_url
-            if package.source_type in ["file", "directory"]:
-                # The lock file should only store paths relative to the root project
-                url = Path(
-                    os.path.relpath(
-                        Path(url).resolve(),
-                        Path(self._lock.path.parent).resolve(),
-                    )
-                ).as_posix()
+        # >>>>> PATCH https://github.com/python-poetry/poetry/issues/5958
+        # if package.source_url:
+        #     url = package.source_url
+        #     if package.source_type in ["file", "directory"]:
+        #         # The lock file should only store paths relative to the root project
+        #         url = Path(
+        #             os.path.relpath(
+        #                 Path(url).resolve(),
+        #                 Path(self._lock.path.parent).resolve(),
+        #             )
+        #         ).as_posix()
 
-            data["source"] = {}
+        #     data["source"] = {}
 
-            if package.source_type:
-                data["source"]["type"] = package.source_type
+        #     if package.source_type:
+        #         data["source"]["type"] = package.source_type
 
-            data["source"]["url"] = url
+        #     data["source"]["url"] = url
 
-            if package.source_reference:
-                data["source"]["reference"] = package.source_reference
+        #     if package.source_reference:
+        #         data["source"]["reference"] = package.source_reference
 
-            if package.source_resolved_reference:
-                data["source"]["resolved_reference"] = package.source_resolved_reference
+        #     if package.source_resolved_reference:
+        #         data["source"]["resolved_reference"] = package.source_resolved_reference
 
-            if package.source_subdirectory:
-                data["source"]["subdirectory"] = package.source_subdirectory
+        #     if package.source_subdirectory:
+        #         data["source"]["subdirectory"] = package.source_subdirectory
 
-            if package.source_type in ["directory", "git"]:
-                data["develop"] = package.develop
+        #     if package.source_type in ["directory", "git"]:
+        #         data["develop"] = package.develop
+        # <<<<< ENDPATCH
 
         return data
 

@hrbonz
Copy link

hrbonz commented Nov 10, 2022

To add to the use cases here, currently in China, Pypi timeouts are extremely common so I use a devpi as proxy to prevent that. No point in publishing this in a pyproject.toml but currently no mechanisms for me to use my proxy out of repo.

@hrbonz
Copy link

hrbonz commented Nov 10, 2022

Incidentally, I had the exact same problem with pipenv when I tested it. I added a bit more context on the why in that issue.

@heimalne
Copy link

heimalne commented Mar 22, 2023

This feature seems to be necessary to use poetry plugins when the company only allows private repositories. Poetry plugins are only loaded when installed globally, e.g. poetry self add poetry-multiproject-plugin, see #7657 from discussion monim67/poetry-bumpversion#6.

edit:
I managed to download plugin packages globally from a private repository within Docker with the following workaround:
Poetry apparently uses a global "project" pyproject.toml (/root/.config/pypoetry/pyproject.toml within a Docker image), which we can override to change the defaul repository url.

# file: global-pyproject.toml
[tool.poetry]
name = "poetry-instance"
version = "1.4.1"
description = ""
authors = []
license = ""

[tool.poetry.dependencies]
python = "3.9.16"
poetry = "1.4.1"

# Force poetry to download global packages like Poetry plugins from our pypi mirror.
# CI firewall blocks normal pypi connection, which lead to connection errors.
[[tool.poetry.source]]
name = "<company>-pypi-mirror"
url = "<private repository url>/simple/"
default = true
secondary = false

and

# file: Dockerfile
FROM <private repository url>/python:3.9.16-slim-bullseye

RUN pip install poetry twine --index-url <private repostory-url>/simple

# Hack: Override poetry repository url for global package installation, which
# is needed for Poetry plugins (project-locally installed plugins aren't loaded)
COPY global-pyproject.toml /root/.config/pypoetry/pyproject.toml
RUN poetry self add poetry-multiproject-plugin

@celsiusnarhwal
Copy link

celsiusnarhwal commented Apr 29, 2023

I wrote a plugin that aims to work around this issue by allowing package sources to be configured via the same POETRY_REPOSITORIES_LOREN_IPSUM_URL environment variable already used for publishable repositories.

To use @andreas-vester's example:

[[tool.poetry.source]]
name = "foo"
url = "https://foo.bar/simple/"
default = true

would become:

export POETRY_REPOSITORIES_FOO_URL=https://foo.bar/simple
export POETRY_REPOSITORIES_FOO_DEFAULT=true

Poetry 1.2 or later is required as 1.2 is the earliest version that supports plugins.

See it for yourself: https://github.com/celsiusnarhwal/poetry-source-env

@abn
Copy link
Member

abn commented Mar 6, 2024

The PyPI mirror plugin might be of interest for users on this issue.

https://github.com/arcesium/poetry-plugin-pypi-mirror/

@fpottbaecker
Copy link

fpottbaecker commented Sep 20, 2024

I also ran into this issue (specifically in the context of two environments using different repos). While the aforementioned pypi_mirror plugin offers a valid solution, I feel like this is a common use case (in enterprise settings) and should be part of poetry itself.

Would it be possible (as a first measure) to offer a setting akin to index-name (and accompanying POETRY_INDEX_NAME) which refers to the name of another configured repository (repositories.<name>) to use as the pypi url (ideally as a PEP503 repo without source urls to enable portable lockfiles).

Though I am not entirely sure (and not very knowledgable with the design philosphy around separating the publish repos) if it makes sense to reuse the repositories setting, it might be useful in enabling the reuse of other settings, like http-basic.*, pypi-token.*, and certificates.*.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sources Releated to package sources/indexes/repositories kind/feature Feature requests/implementations
Projects
None yet
Development

No branches or pull requests