Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Being able to explicitly specify package repository namespaces #8695

Open
Kleidukos opened this issue Jan 23, 2023 · 21 comments
Open

Being able to explicitly specify package repository namespaces #8695

Kleidukos opened this issue Jan 23, 2023 · 21 comments

Comments

@Kleidukos
Copy link
Member

Kleidukos commented Jan 23, 2023

Note: This is an effort to finish the work that was started with the Provenance-Qualified Package Imports GHC proposal

Scenario: I am a user in a corporate environment and I wish to rely on packages across several package repositories: Hackage Central, a privately-hosted package repository, and a public third-party repository. As it happens, the package names are not unique across the repositories, because internal forks have happened.

Repository marker

I would like to be able to specify the package repository where a dependency belong so that its origin is made crystal clear from only reading the cabal file. Example:

, @hackage/regex-tdfa    
, @hackage/resource-pool 
, @hackagesandi         
, @hackage/scientific    
, @mycompany/commons
, @mycompany/prelude
, @mycompany/servant
, @cardano/goblins
, servant-auth
, servant-blaze 

Here we have four elements:

  1. @hackage is explicitly referring to the Hackage Central server
  2. @mycompany is explicitly referring to my company's privately hosted package repo
  3. @cardano is a publicly available package repository based on Cardano's Foliage setup.
  4. unqualified packages are implicitly referring to Hackage Central.

The '@' syntax is take from package repositories like NPM (@purescript/node-fs).
Today, the Flora meta-index already supports the namespacing of Hackage packages like @hackage/base-orphans.

Isolation

Hackage Central normally operates under a "closed world" assumption (with the notable exception of packages later removed because they were fraudulent), and it is desirable to maintain this constraint. As the GHC Proposal linked above suggest, it should unequivocally reject packages that make use of this feature.

Regarding third-party repositories, I am of the initial opinion that they could allow packages that refer to other third-party repositories, and the packages themselves would document the configuration procedure to add new package repositories.

Configuration

There are two aspects to consider:

  1. Local & Global cabal configuration of these package repositories: A stanza in $XDG_CONFIG_HOME/cabal/config and/or cabal.project that would declare the configured package repositories:
source-origin-aliases:
  hackage: https://hackage.haskell.org
  cardano: https://input-output-hk.github.io/cardano-haskell-packages/
  mycompany: https://internal.mycompany.corp/packages
  1. Declaring in the .cabal file that third-party repositories are used, in order to give the option to the hosting repository to reject a package whose dependencies would not be entirely self-contained (or depending on that exact third-party repository + Hackage Central)

I am certainly missing some crucial elements here, and I'd be happy to consolidate this ticket based on your inputs. 🙂

cc @gbaz @andreabedini

@michaelpj
Copy link
Collaborator

I'm not sure that this is the behaviour you actually want. For example, the reason there are forks of Hackage packages in the Cardano package repository is because we want to use them instead of the ones from Hackage. That means we're relying on the "late binding" of package names in .cabal files: they don't resolve to a particular unique package (including its source) until we make the plan with the full context of what repositories we want to use.

With this proposal, package metadata would more precisely bind the package dependency, which would prevent this kind of overriding. That's not to say it's not desirable, just that it would make it harder to do what we do today.


I'd like to propose something slightly different. I think what we have here is at least analogous a petname system.

  • a package name in package metadata is "global" and "memorable" but not "securely unique". It's not even clear what package version you will get, let alone where that will come from!
  • a package name plus repository is approximately "global" and "securely unique" but not "memorable". You need the repository information to fully interpret it.
  • we don't have a "memorable" and "securely unique" kind of name, but perhaps we don't need one

The crux here is that we have somewhat-underspecified names that are helpful for generally finding packages, but don't pin things down entirely, and then in a particular context we pin them down to something specific.

So perhaps we could instead have something like an "address book" in cabal.project

address-book:
   regex-tdfa = @hackage/regex-dfa
   goblins = @cardano/goblins

(deliberately terrible syntax and naming to avoid starting that discussion off yet)

That tells cabal to pin down certain non-specific names to specific locations, giving the user finer control over this part of the process.

This might also give us a cleaner alternative to the current active-repositories modifiers. At the moment if you have the same name in multiple repositories you have to carefully arrange things so that you get just the overriding you want, and no more. It would be nicer to just add an explicit statement saying "I want this package (or package version!) to come from this repository".


This problem pops up in another place: the wild namespace of C libraries. At the moment .cabal files can specify dependencies on C libraries, but this relies on the total wild west of C library naming. In particular, there is no guarantee that different distros will package the same C library under the same name. It's amazing that this mostly works, but occasionally it's quite a pain.

The same system would be very useful there! In a cabal.project (or cabal.project.local), imagine being able to write down precisely which library the loose global name should refer to:

c-address-book: 
   libsecp256k1 = /some/path/on/my/system

That would again let users patch up the loose global names as they see fit, in exactly the same way.

@phadej
Copy link
Collaborator

phadej commented Jan 24, 2023

Please don't touch .cabal file specification.

cabal-install already maps package names from in .cabal files to actual package in various repositories.
It's currently possible to specify that later repositories are merged, or completely shadow packages in previous repositories. And there is no reason some other combining methods couldn't be specified.

(It would be technically possible to rename packages as repository indices are made)

.cabal specification doesn't know anything about Hackage, GitHub or tarballs on your local or network filesystem. And it doesn't need to. The names are there to abstract.

Also cabal-install-3.8 allows you to split and import parts of cabal.project (also from HTTP sources!), so sharing that additional configuration across multiple codebases (if you don't use monorepo in your company) is completely viable without copy&pasting everything.

@Bodigrim
Copy link
Collaborator

Scenario: I am a user in a corporate environment and I wish to rely on packages across several package repositories: Hackage Central, a privately-hosted package repository, and a public third-party repository. As it happens, the package names are not unique across the repositories, because internal forks have happened.

Are there many companies in the wild (besides Cardano) which run private Hackage servers? Most of the time people just use source-repository-package, pinning desired versions. The only downside is that you cannot benefit from constraints solving, but this is rarely an issue in a corporate environment.

@gbaz
Copy link
Collaborator

gbaz commented Jan 24, 2023

While I wrote the original proposal this is based on, I'm sympathetic to the idea that we should avoid changing cabal file syntax if possible, and that the new tools and machinery we have built since that proposal was drafted could provide a roadmap to doing so.

One issue that the discussion here has not addressed, but which the original proposal did, is what happens when emergent overlap occurs.

Assume I have a private repo, which provides bar. I have a package foo which depends onbar from the private repo, and baz from hackage. Now, an entirely unrelated bar is uploaded to hackage, and furthermore, baz acquires a dependency on it.

We have no way, at the moment, of building foo against the latest bar (since we would have two entirely different packages named baz as dependencies). I think the proposal from @michaelpj comes closest, but we would want some way to scope the "address book" to specific packages.

As a hackage maintainer, I've certainly become aware of some companies that run private hackages -- the competing internal solution, used at e.g. my current company, is nix, which I find also works well. I couldn't estimate how many such companies there are, but I do think there are enough current and also potential users that this problem is worth spending some time to solve well and thoroughly (though not urgently!)

@mouse07410
Copy link
Collaborator

Are there many companies in the wild (besides Cardano) which run private Hackage servers?

No idea about "many". But we do. Probably, among others.

How "many" do you need to justify fixing this?

@Bodigrim
Copy link
Collaborator

How "many" do you need to justify fixing this?

(I'm not making any decisions here.) My original point was to seek feedback on the proposal from potential users, but I knew none, thus asked.

@chris-martin
Copy link

chris-martin commented Jan 25, 2023

There's a chicken-egg problem here if you want to see users of a system before building the system. I've considered running a package server but never pursued it because there would be no way to namespace the packages.

@phadej
Copy link
Collaborator

phadej commented Jan 25, 2023

We have no way, at the moment, of building foo against the latest bar (since we would have two entirely different packages named baz as dependencies).

I hope that renaming baz in the private repositories is not impossible. It's good idea to prefix your "private" packages with some unique enough prefix to make such clashes less likely.

The story would be quite different if there were competing central repositories, e.g. Hackage and EvilTwin, and packages from both of these were used by companies. Let's not build such future.

@phadej
Copy link
Collaborator

phadej commented Jan 25, 2023

Forgot to say

building foo against the latest bar (since we would have two entirely different packages named baz as dependencies

scenario can happen with just a monorepository. And there a repository namespacing won't help.

@chris-martin
Copy link

What I'd love to be able to see is a system in which it is possible to self-publish packages that are just as accessible as packages on the central Hackage. If there are two package servers A and B, a library hosted on server A should be able to declare a dependency on a library hosted on server B, and vice versa. Without placing URLs in the cabal file itself, I am not sure this proposal accomplishes enough. It seems to me that it pushes the problem out a little, but still requires a central registry to map repository names to URLs?

@phadej
Copy link
Collaborator

phadej commented Jan 25, 2023

What I'd love to be able to see is a system in which it is possible to self-publish packages that are just as accessible as packages on the central Hackage.

https://cabal.readthedocs.io/en/stable/config.html?#local-no-index-repositories on a network drive.

@chris-martin
Copy link

chris-martin commented Jan 25, 2023

What I'd love to be able to see is a system in which it is possible to self-publish packages that are just as accessible as packages on the central Hackage.

https://cabal.readthedocs.io/en/stable/config.html?#local-no-index-repositories on a network drive.

Currently, if a package is on Hackage, somebody can publish another package and declare a dependency on it. Suppose I self-publish a package named text; how does somebody else publish a package that declares a dependency on my text package?

@chris-martin
Copy link

chris-martin commented Jan 25, 2023

if there were competing central repositories, e.g. Hackage and EvilTwin

I really don't believe this would be a bad thing. Multiple service providers aren't necessarily fighting with one another. Just one use case I had in mind: I'd love to be able to start a server that welcomes useless beginner package contributions, so that people could follow an intro tutorial that involves realistically publishing a package (a package that would violate Hackage rules because the publisher does not intend to maintain it or for it to be useful to others). This has reason to exist as a separate server and it would be a "competing" server but not in an antagonistic sense.

@phadej
Copy link
Collaborator

phadej commented Jan 25, 2023

Suppose I self-publish a package named text; how does somebody else publish a package that declares a dependency on my text package?

Why would you want to make your and your users life hard?

What would happen when your user publishes a package with @chrismartin/text to the Hackage? Should a hackage-server (the software, not central Hackage) have a list of allowed external repositories, but default unspecied one to their own? How tooling (cabal-install, Nix, I'm not even mentioning stack) will keep track of that?

How PackageImports would work? (i.e. GHC)

@chris-martin
Copy link

Suppose I self-publish a package named text; how does somebody else publish a package that declares a dependency on my text package?

Why would you want to make your and your users life hard?

Okay, then suppose I first give my package a unique name, and then somebody else publishes another useful package of the same name on Hackage without any knowledge that people are using mine.

@phadej
Copy link
Collaborator

phadej commented Jan 25, 2023

Okay, then suppose I first give my package a unique name, and then somebody else publishes another useful package of the same name on Hackage without any knowledge that people are using mine.

First step is that you can define shadowing. That will help until the uniquely named package on Hackage becomes popular so it finds a way into your dependencies, like @gbaz explained. Hopefully by that time you'll figure out and have already renamed your package.

EDIT: and if renaming your package is somehow extremely hard, then I'd say that the addressbook approach could still be able to rename it as index of your repository is made.

@Kleidukos
Copy link
Member Author

@phadej Would you say that you disagree with the following?

This thread has tended to convince me that the assumption of a single global package namespace is not a good one. In particular -- we're haskellers, when do we make assumptions about a single global anything? I have an idea kicking around about how we might fix this at the root, in a way that is pretty painless to end-users.

haskell/ecosystem-proposals#4 (comment)

What would happen when your user publishes a package with @chrismartin/text to the Hackage? Should a hackage-server (the software, not central Hackage) have a list of allowed external repositories, but default unspecied one to their own?

My proposal says:

unqualified packages are implicitly referring to Hackage Central.

@phadej
Copy link
Collaborator

phadej commented Jan 25, 2023

My proposal says:

unqualified packages are implicitly referring to Hackage Central.

Good. So Chris would always need to write @chris-martin/text to depend on his text? That's fine with me.

How head.hackage (or GHC.X.Hackage) would work, which relies on implicit shadowing of packages on Hackage?

@hasufell
Copy link
Member

hasufell commented Jan 26, 2023

I feel we've conflated two things into one proposal:

  1. namespaces on hackage with no particular meaning: better and unified syntax for e.g. forks. Hackage and other tools could more intelligently group them. For cabal, nothing changes (unless we need to make the syntax legal).
  2. Integrating namespaces with hackage repositories: I think this is much more complex and I'm not even sure that's the most interesting use case. I also believe these things can be done more explicitly in cabal.project files.

@michaelpj
Copy link
Collaborator

I think the proposal from @michaelpj comes closest, but we would want some way to scope the "address book" to specific packages.

Yes, I agree, much like we have package-scoped allow-newer and friends, it makes sense to bind the global names to different local names in different contexts.

@fgaz
Copy link
Member

fgaz commented Jan 28, 2023

I think we have to keep the "what" separate from the "where", and only the first belongs to .cabal files. Securely unique names can be achieved at a higher level (cabal-install, distro packages...).

Go mixes the two, and while it appears simple, I think it can lead to a lot of problems. They even had to build a massive caching infrastructure to deal with it.

Just an example: suppose repository @A gets abandoned, or the maintainer of package @a/x wants to migrate it to @b/x. Suddenly all versions of all reverse dependencies need their metadata changed! "What" is immutable, "where" isn't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants