Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taking package names less seriously #33047

Open
oschulz opened this issue Aug 23, 2019 · 33 comments
Open

Taking package names less seriously #33047

oschulz opened this issue Aug 23, 2019 · 33 comments
Labels
packages Package management and loading

Comments

@oschulz
Copy link
Contributor

oschulz commented Aug 23, 2019

I think it would open up some powerful possibilities if packages in a "Project.toml" were only identified by their UUID. So in e.g.

JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"

"JSON" would just become an arbitrary alias, resp. the local name used for "682c06a0-de6a-54ab-a142-c8b1cf79cde6" within the package/project with that "Project.toml".

This would enable us to rename packages at will without breaking code referring to those packages under their old names.

For example, let's assume (I'm not involved in either package, and this is pure hypothetical), the maintainers of "JSON" and "JSON3" agreed that "JSON3" (I guess it's faster?) should become the new default JSON-package. So we they could rename "JSON" to "JSONOld" and rename "JSON3" to "JSON", but keep the UUIDs. Any other package that still refers to "JSONOld" as

JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"

would just continue to work, while packages that want to use the newer JSON package (formerly "JSON3" could use

JSON = "0f8b85d8-7281-11e9-16c2-39a750bddbf1"

This would also enable us to tidy up the package namespace a bit (as long as the package maintainers agree) - I guess we have quite a few packages that, for historical reasons, have names that are either to general or don't fit the package very well anymore (e.g. after it has evolved and broadened or narrowed it's scope).

This would also resolve name-clashes between packages from different registries (e.g. between general and a private registry).

In principle, fully UUID-based package resolution would also allow for having multiple packages registered with the same name within the same registry - IMHO this should definitely be avoided though, at least in general, as it would create lot's of confusion. A name should be "freed" before another package can claim it.

CC @StefanKarpinski (we discussed this in Baltimore, but I forgot to write it up as an issue.)

I'm not able to judge if this would require major changes of if it could be implemented fairly easily.

@KristofferC
Copy link
Member

KristofferC commented Aug 23, 2019

Everything of this already happens, so trust me that we are already taking uuids very seriously. :) The only reason that it is not possible to rename a package right now is because of code loading does not lookup the uuid of the package that got its named changed in the project file of the packages that uses the old name. But this could be added.

@oschulz oschulz changed the title Taking package UUIDs seriously Taking package names less seriously Aug 23, 2019
@oschulz
Copy link
Contributor Author

oschulz commented Aug 23, 2019

Everything of this already happens, so trust me that we are already taking uuids very seriously. :)

I wasn't really implying we didn't - I just couldn't resist the allusion to a certain other issue. :-) But you're right, of course - I changed the title of this issue accordingly. ;-)

What I meant if that we currently can't use

[deps]
Foo = "0f8b85d8-7281-11e9-16c2-39a750bddbf1"

instead of

[deps]
JSON3 = "0f8b85d8-7281-11e9-16c2-39a750bddbf1"

so currently both the name and the UUID matter (maybe not the technically correct way of putting it, I fear). I guess that's what you mean with "because of code loading does not lookup the uuid of the package that got its named changed in the project file of the packages that uses the old name."?

But this could be added.

If this could be added without too much trouble, I think it would be worth it (and yes, I know, a PR might be welcome - but I wouldn't quite know where to begin ;-) ).

@StefanKarpinski
Copy link
Member

As you might imagine I have thought more than a bit about this. I'm currently working through what needs to change to support renames. However, I don't think code loading needs to change so much as we need to use the internal name of a package at a given commit (i.e. what the Project.toml file at that commit says the name is) as the name when installing it. Then as long as the code using a package agrees with the internal name of the package at the commit that's being used, everything will work fine. There's a bunch of changes to Pkg required to support that to make it so that we install packages at a location based on the internal name of the package at a version instead of the registered name, but it's not too bad.

If you want to be able to use a package by an arbitrary name, that's a very different situation. It would require completely changing the path scheme for installed packages. Currently the installed path of a package is something like $depot/packages/$name/$slug where name is the name of the package (currently the registered name, but in what I propose above it would be the internal name of that version, i.e. matching the name in $depot/packages/$name/$slug/Project.toml). If the name when you're using a package is completely arbitrary, then how do you find the package? You can't determine the name part. What you're saying could be done if we did pure content addressing of packages and stored installed packages at a path like $depot/packages/$hash where hash is the full git-tree-sha1 of the package source. But then the installed package directory becomes very unfriendly to users: it gives no indication of the package's name and all versions of all packages are installed in a single directory.

@oschulz
Copy link
Contributor Author

oschulz commented Aug 23, 2019

This would also resolve name-clashes between packages from different registries (e.g. between general and a private registry).

I think this part was nonsense, though, two packages with same name but separate UUIDs can already handled right?

@StefanKarpinski
Copy link
Member

two packages with same name but separate UUIDs can already handled right?

Yes, that works fine already.

@StefanKarpinski
Copy link
Member

We could potentially keep a file mapping (uuid, git-tree-sha1) pairs to names and then look up package locations that way, but it complicates (and changes) the process of loading code, which I'm generally very reluctant to do. Code loading would ignore the arbitrary name used for a package inside the project and manifest files and look up the (uuid, git-tree-sha1) pair to get the actual name of a version of a package and then find it based on that. Not the worst complication ever, but still.

@oschulz
Copy link
Contributor Author

oschulz commented Aug 23, 2019

As you might imagine I have thought more than a bit about this.

Oh, sure - sorry, I didn't mean to badger, I just thought maybe I should write this "package names as aliases" idea down.

If you want to be able to use a package by an arbitrary name, that's a very different situation.

Yes, that was my (maybe somewhat naive) hope. Currently, when I make a new package, I worry a lot about the name, because it's a bit hard to change later. And sometimes, packages just develop in a direction not forseen in be beginning. But of course I didn't think about

It would require completely changing the path scheme for installed packages.

the above. Darn. :-)

@oschulz
Copy link
Contributor Author

oschulz commented Aug 23, 2019

I guess $depot/packages/$uuid/... or $depot/packages/$name-$uuid/... wouldn't solve it either, right?

@StefanKarpinski
Copy link
Member

Anything with the name in it has the same problem. Using just the UUID would work but $depot/packages/$uuid/$hash creates a very long path name which can be a problem for some systems/tools. That's why we use the five character slug instead of something that long.

@oschulz
Copy link
Contributor Author

oschulz commented Aug 23, 2019

However, I don't think code loading needs to change so much as we need to use the internal name of a package at a given commit (i.e. what the Project.toml file at that commit says the name is) as the name when installing it. Then as long as the code using a package agrees with the internal name of the package at the commit that's being used, everything will work fine.

This would mean that a package using the renamed package would be stuck with using an old verison, until it changes to use the new name, right? Hm, if we had a way to notify users of the package that is has been renamed - do we? That would solve it, I guess.

@oschulz
Copy link
Contributor Author

oschulz commented Aug 23, 2019

Using just the UUID would work but $depot/packages/$uuid/$hash creates a very long path name which can be a problem for some systems/tools. That's why we use the five character slug instead of something that long.

Again, probably naive - we can't turn the UUID into a slug instead of the name, right? It certainly would be a bit less, ah, unique?

@StefanKarpinski
Copy link
Member

The slug is based on the uuid and tree hash. The name in the path is important though because sqrt(62^5) ≈ 30267 so when there are about 30k slugs in the same place there's a 50% chance of a collision. When the slugs are in a directory of version of packages with the same name, that's a pretty good situation, if they're all in the same top-level packages directory, that's not so good.

@StefanKarpinski
Copy link
Member

I think the most viable approach is this:

  • Change Pkg to install packages at a path based on the internal name of the package version (what's in the project file), regardless of what's in the registry.
  • Record all the internal names for all installed versions of each package.
  • Modify code loading to look up potential internal names in this file.

This file would have a format something like this:

682c06a0-de6a-54ab-a142-c8b1cf79cde6 = ["JSON"]
0f8b85d8-7281-11e9-16c2-39a750bddbf1 = ["JSON", "JSON3"]

This last line is assuming that JSON3 has been renamed to JSON at some point so that there are earlier versions of the same package with the internal name JSON3 and later versions with the internal name JSON. This file would live at $depot/packages/Names.toml for each depot.

The process of code loading would go like this:

  • You need to load (uuid, hash)
  • For each depot, look in $depot/packages/Names.toml for uuid
  • For each name found, look for $depot/packages/$name/$slug

All that said, I have to wonder if it isn't better to just require the name by which one uses a version of a package to match its internal name.

@StefanKarpinski
Copy link
Member

In any case, making sure that package versions are installed at a path that's based on their internal name is something that we should do in any case, so I'm going to keep working on that.

@00vareladavid
Copy link
Contributor

Can't we pull a git and just break up the UUID/hash into parts? This would avoid having to look up the name at all and avoid any problems with overly long directory names.

@StefanKarpinski
Copy link
Member

We could but I still think that $uuid_slug/$hash_slug is a bit of a user-unfriendly path scheme. Since a package always has an internal name we might as well use it to make paths friendlier.

@00vareladavid
Copy link
Contributor

I would prefer a simpler mapping over a friendlier path. (Why does the path have to be user-friendly in the first place?)

@StefanKarpinski
Copy link
Member

Because source paths show up in stack traces all the time and not being able to tell what package the code is in is kind of a big usability issue.

@oschulz
Copy link
Contributor Author

oschulz commented Aug 24, 2019

Argh, yes, didn't consider that.

@oschulz
Copy link
Contributor Author

oschulz commented Aug 24, 2019

Looks like this is indeed quite a bit more tricky than I had hoped - I thought the difficulty might be closer to the compiler level or so - but from what I understand, that's actually not a really problem at all. Instead it's "mundane" path names :-) But Stefan's arguments are very hard to fault, of course. Darn, I had hoped it would be some easy change - I guess that was too naive, because in that case it probably would have been done already.

@oschulz
Copy link
Contributor Author

oschulz commented Aug 24, 2019

In addition to problems with stack traces, etc., changing package paths would also kinda break break package-management compatibility between Julia 1.x versions, I guess - which would be really inconvenient, it's really nice to be able to quickly switch versions with a common package repo.

@tkf
Copy link
Member

tkf commented Aug 24, 2019

Because source paths show up in stack traces all the time and not being able to tell what package the code is in is kind of a big usability issue.

Would it be crazy to just insert package name when printing the stack traces, instead of a full path, something like this?

Stacktrace:
 [1] f(...) at $PACKAGE_NAME src/file.jl:51
 ...

The files in Base are already not printed using the full path.

The direction of this issue sounds great as it would allow (but not force) package authors to use Go-like migration path; i.e., differentiate namespace when bumping the major version.

@StefanKarpinski
Copy link
Member

Yes, that's a great idea but I still think that having the package name in the path is still a good idea.

@oschulz
Copy link
Contributor Author

oschulz commented Aug 26, 2019

I agree - I do find myself looking into the packages directory manually from time to time, human readable path names are very nice to have. Also, breaking the current scheme would probably not be possible withing the Julia v1.x track, right? And even if it could be considered, breaking the current option of Julia verisons sharing packages would be kinda inconvenient.

So I guess Stefan's suggestion of keeping a separate map between package names and UUIDs would currently be the only practical way to have package names that are truly local to the using Project.toml? I certainly do understand the reluctance of complicating the package loading process with such an additional mapping, though. And even with such a map, I guess we would have cases where we have to different packages installed in the same directory, only with different (not human readable) slugs? Not very transparent, I have to admit.

Darn, this would be so nice to have - but I can't think of a really clean and elegant solution either (though I'm hardy an expert here).

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Aug 27, 2019

Also, breaking the current scheme would probably not be possible withing the Julia v1.x track, right?

It depends. Is it acceptable to make everyone reinstantiate their manifests? If so then it's fine.

I'm actually pretty ok with going ahead with my suggestion here. It's a pretty simple scheme and doesn't complicate code loading all that much. I'm going to give it a try. You can automatically generate the Names.toml file from an existing set of installed packages, which is a nice property.

@oschulz
Copy link
Contributor Author

oschulz commented Aug 27, 2019

Yay, thanks!

@waldyrious
Copy link
Contributor

Just throwing this out there, since it wasn't mentioned on this thread: would symlinks be an acceptable solution for a fully UUID-based package resolution scheme, while preserving human-readable package paths? So there could be, e.g.

$depot/packages/$name/ -> $depot/packages-uuid/$uuid/

or something like that. This is similar to the strategy used by GoboLinux, and to some extent by Homebrew.

@StefanKarpinski
Copy link
Member

That wouldn't work on Windows, unfortunately.

@StefanKarpinski
Copy link
Member

I wanted to record a possible approach that came up in a discussion just now with @staticfloat: we could use the "local name" of packages in the project file but the "canonical name" in the manifest file. That means that the manifest stanza for a package would always use its canonical name. To find top-level X, you would look up X in the project file, find its UUID, then scan through the manifest file for that UUID, which gives the canonical name, and then look up the package at $depot/packages/$canonical_name/$slug. If you're looking up a non-top-level X then you start in the manifest, find the stanza for the current package, look in its deps, which would be keyed by local names and would only be permitted to use the name list form if (a) all deps use their canonical names and (b) all those canonical names are unique in the manifest. Thus we can look up the local name in the deps entry and be assured that either the name is canonical and unqiue in the manifest, or the deps is a map to UUIDs and we can then proceed with looking up the dependency by UUID, which gives us its canonical name again, allowing us to find the code. This approach doesn't require keeping an alias map, which is considerably cleaner than what I'd proposed before.

@KristofferC
Copy link
Member

KristofferC commented Oct 21, 2020

Thus we can look up the local name in the deps entry and be assured that either the name is canonical

I don't see how this is possible if the non top-level dependency uses a non canonical name for what it itself loads. Unless you go to that package's project file it feels impossible to know what UUID that resolves to. I think that is what I mentioned here #33047 (comment).

@StefanKarpinski
Copy link
Member

Yes, we'd have to look at the project file for each bit of code we're loading, which we don't currently have to do. But I think that's ok to do. Can't think of a reason not to—we're loading code from that location anyway.

@KristofferC
Copy link
Member

But then you can no longer figure out the full dependency graph from just the Project / Manifest without downloading all packages in it. It's a bit similar to why the deps, and version entries are stored in the Manifest, no? We could theoretically just read them from the package Project file when loading.

@StefanKarpinski
Copy link
Member

That's true. Ok, here's an alternative: the deps keys in the manifest are the local names while the stanza headers are the canonical names. If the local name and canonical name are different, you have to use the table form of deps with UUIDs so you can find the right stanza based on the UUID.

@ViralBShah ViralBShah added the packages Package management and loading label Mar 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
packages Package management and loading
Projects
None yet
Development

No branches or pull requests

7 participants