Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mechanism for default packages #18795

Closed
StefanKarpinski opened this issue Oct 5, 2016 · 34 comments
Closed

mechanism for default packages #18795

StefanKarpinski opened this issue Oct 5, 2016 · 34 comments
Assignees
Labels
design Design of APIs or of the language itself excision Removal of code from Base or the repository modules packages Package management and loading stdlib Julia's standard library
Milestone

Comments

@StefanKarpinski
Copy link
Sponsor Member

Discussion related to #5155. There are a few steps needed to break up Base:

  1. Move functionality into modules, some of this work is are done already.
  2. Move modules out of Base, into a location where they can be loaded via LOAD_PATH.
  3. Have a mechanism for installing, finding, and updating these "pseudo-packages".

It is this last point, the pseudo-package mechanism, which, I suspect, blocks the whole process. Considerations:

  • Base should be independent of the pseudo-packages: it should be possible to build, load, run and test all of base Julia without any pseudo-packages loaded or even installed.
  • Pseudo-packages should be largely independent of each other: it should be possible to load and test each without all the others. Should we allow any amount of interdependency between pseudo-packages?
  • It should be possible to update pseudo-packages independently of Julia, but it is acceptable for pseudo-packages to be considerably more coupled with Julia than normal packages.
  • There needs to be a mechanism for installing and updating pseudo-packages that is not normal package installation, because we want the package manager itself to be a pseudo-package.
@StefanKarpinski StefanKarpinski added packages Package management and loading design Design of APIs or of the language itself modules labels Oct 5, 2016
@StefanKarpinski StefanKarpinski added this to the 0.6.0 milestone Oct 5, 2016
@JeffBezanson
Copy link
Sponsor Member

It seems clear to me that these packages should be as much like normal packages as possible. Here's a strawman proposal:

  • Add a /stdlib directory to the julia repo, with a Makefile in it.
  • Do make -C stdlib after building julia.
  • That makefile first manually clones the package manager repo, then uses the package manager to get the rest of the default packages.
  • Part I'm not clear on: either the package manager handles multiple directories, or we forget the stdlib directory and put all these packages in the normal place.
  • make install puts the default packages in the site-wide package dir.
  • Another tricky bit: are they compiled into the system image? If not, and you put usings in your juliarc, it will adversely affect startup time. If so, we won't be able to use Pkg to get them, since we won't yet have a sysimg to use to run Pkg. Or we could first build a small sysimg then a larger one, but that will take longer of course.

@stevengj
Copy link
Member

stevengj commented Oct 5, 2016

One of my biggest concerns with breaking up Base is testing and implementing changes to the Base language. Ideally, I would want:

  • CI testing of Julia PRs to see whether they break the "default" modules.
  • A way to submit coordinated PRs that patch both Julia and the default modules.

The former seems straightforward, but the latter seems hard within the context of Github. Or am I missing some Github feature that would allow this?

@JeffBezanson
Copy link
Sponsor Member

Arguably, anything that typically needs to be patched along with Julia doesn't benefit much from being a separate package, and its code could remain in this repo (maybe still moved to /stdlib though).

@stevengj
Copy link
Member

stevengj commented Oct 5, 2016

It would be interesting to sort the files in Base according to how many commits have touched them. The files touched by the fewest commits would be the best candidates to split.

I still think there should be a way to run CI on the default packages. Maybe optionally, e.g. via @julialib runtests() or whatever, analogous to @nanosoldier.

@StefanKarpinski
Copy link
Sponsor Member Author

Another tricky bit: are they compiled into the system image? If not, and you put usings in your juliarc, it will adversely affect startup time. If so, we won't be able to use Pkg to get them, since we won't yet have a sysimg to use to run Pkg. Or we could first build a small sysimg then a larger one, but that will take longer of course.

Idea: precompile .juliarc.jl the same way we precompile modules? Can we do that?

@JeffBezanson
Copy link
Sponsor Member

Of course we can see if the normal package precompile mechanism is sufficient to get good startup time. We also might end up with starting the REPL being slower, but ./julia script.jl being faster due to not loading the repl or other modules script.jl doesn't need.

@tkelman
Copy link
Contributor

tkelman commented Oct 5, 2016

One option without needing to add too many new features to Pkg would be make Pkg.init perform a copy prepopulation. So the copies we distribute under site (doesn't have to be system-wide) in the default LOAD_PATH aren't managed by Pkg, but when Pkg.init is called with a pre-populated set, it will manage them from then on.

@StefanKarpinski
Copy link
Sponsor Member Author

I'm also not sure we really want these to be as much like normal packages as possible. It could be a good idea, but I'm not entirely convinced. As you ask, does it live in the same place? Should we do version resolution in the same way? This seems like a case where monolithic updating of all standard packages together seems like it may be preferable – otherwise you end up in a situation where each version of each standard package needs to work with not only a range of versions of Julia itself, but also with a range of versions of the other standard packages.

@musm
Copy link
Contributor

musm commented Oct 5, 2016

To get this done asap it might be easiest/fastest to first split into separate packages logical files in base and to also leave them in base, i.e. src/stdlib/ and have things like LibM LinAlg etc. and then have a separate src/runtime for the runtime. Get a feel of how things work for a bit on master. Later one can think about a more sophisticated modular, flexible approach. (tests should also be moved to the corresponding packages in stdlib, and similarly for the runtime)

@JeffBezanson
Copy link
Sponsor Member

Yes, I think that's a good idea, but we'd still need to decide how loading works, and how to install the stdlib, and whether the stdlib modules can be precompiled. If they aren't precompiled the delays could be quite frustrating.

@tkelman
Copy link
Contributor

tkelman commented Oct 5, 2016

I think we pin the versions of packages that are included in the stdlib, and update the pinned versions frequently with a sufficient amount of testing. Same as any C dependency, we don't pull from master, and it's good to get as deterministic a build as possible so 0.6.0 built when it comes out behaves the same as 0.6.0 downloaded and built from source months later. If Pkg.init uses the bundled versions as a starting point, it can be allowed to update the copies.

@aviks
Copy link
Member

aviks commented Oct 6, 2016

I feel there should not any difference between "pseudo" packages and "real" packages. For two reasons:

  1. Packages moved out of base should be able to be updated outside of Base's release schedule. Doing a Pkg.update() should bring me the latest bug fixes for (e.g.) FFTW, and FFTW development should be decoupled from base's cadence. Otherwise, what is the point of moving these things out of base? And implementing a separate update mechanism for "pseduo" packages would more unnecessary complexity.
  2. Part of the logic for a having a stdlib Shrinking Base and Introducing a Standard Library #5155 is not only to remove things to base, but to add things to the distribution. So its not just about removing FFT from base, its about adding (e.g.) GLM to standard library. And doing that should not change the cadence of GLM's development.

@StefanKarpinski
Copy link
Sponsor Member Author

Ok, I'm convinced. These should be normal packages :)

@tkelman
Copy link
Contributor

tkelman commented Oct 12, 2016

We do need to account for the possibility that the Julia distribution gets installed into a read-only location, in which case default packages wouldn't be updateable in-place.

@martinholters
Copy link
Member

How about this: There is a user and a system package dir, where the former is searched first when a package is to be loaded. The packages included in the Julia distribution go into the system package dir (potentially read-only for the user), updates go into the user package dir (writable for the user). Options to the Pkg commands could make them write to the system package dir instead so that an administrator can update/add/remove packages for all users on the system.

@tkelman
Copy link
Contributor

tkelman commented Oct 13, 2016

That already works with appropriate setting of LOAD_PATH, just anything outside of Pkg.dir doesn't currently participate in anything Pkg does. We don't want to require root to install a Julia-with-packages distribution, so I don't think system wide should be the default. Bundled packages should be specific to the Julia version they come bundled with, but with options to populate the user (or system wide if you have root) package directory from them.

@martinholters
Copy link
Member

Sorry, with "system package dir" I didn't mean a necessarily system-wide installation location. For a per-user installation of Julia, that would just be somewhere in his Julia install path. Still, the benefit would be that a user could just delete his .julia and still have the default packages (at the originally shipped version) available.

@StefanKarpinski
Copy link
Sponsor Member Author

StefanKarpinski commented Oct 27, 2016

For 0.6 we should just install standard packages in the LOAD_PATH site directory, which we're currently not using but look in by default. The first step here is to move some of the ready-for-packaging modules in Base into the site directory. What's necessary for that and what modules?

@tkelman
Copy link
Contributor

tkelman commented Oct 27, 2016

There are multiple pieces here - the build infrastructure for putting specific versions of packages in place, and the restructuring of a few modules. Restructuring modules is rearranging files and imports if they aren't a module yet, then filtering the git history and putting them into a separate package repo.

Series of steps for any module to go through:

  1. Group code into a module within Base (and adjust imports anywhere it's used)
  2. Move the module to not be inside the Base namespace, a separate toplevel module
  3. Rearrange the source code and build system within this repo so the install mechanism is separate
  4. Move the code to a separate repo so it's versioned as a real package

Most of the listed modules can go through any subset of these steps, and each one is (mostly) a prerequisite for the next. We don't have the mechanism for step 3 yet, but once it's there, 2->3 and 3->4 would individually be smaller steps than going straight from 2->4.

@amitmurthy
Copy link
Contributor

A few observations:

  • packages defined under stdlib should be loaded and their exports brought into scope automatically by default. julia stdlib=no will not load any of the stdlib packages, i.e., default is julia stdlib=yes
  • I am OK with the other approach too. i.e. default of julia stdlib=no. Would want a simple way to get everything and the kitchen sink without having to define a bunch of using statements for every small program I write, especially during initial development/prototyping/exploration. The final version can load only specific modules when the code is ready for deployment. So, during development I can always start with julia stdlib=yes
  • We need to have a means of building base julia and a specific set of packages (initially with stdlib but later to including non-stdlib ones and their dependencies) into a single precompiled binary. The current model of loading from node 1 is not scalable with larger number of workers. Folks deploying on clusters can thus deploy the bundled image either on all the nodes or on a shared filesystem.

@tkelman
Copy link
Contributor

tkelman commented Nov 4, 2016

#18928 reminds me that quadgk would be a good first case here - small, self-contained, no ccalls, not used by anything else in base. Step 1's already even done for it.

@ChrisRackauckas
Copy link
Member

ChrisRackauckas commented Dec 16, 2016

Is this still slated for v0.6? Seemed like the push to finish it died down.

@stevengj
Copy link
Member

stevengj commented Dec 16, 2016

I don't see how it can make it into 0.6, because we don't have step 3.

@stevengj
Copy link
Member

If we move a lot of key numerical functionality out of Base without building additional infrastructure first, I'm concerned that it will have a serious impact on testing changes to the core language. It will be much harder to assess the impact on important packages, and much harder to detect performance regressions, if the numerical packages aren't updated in sync. e.g. a lot of the BaseBenchmarks functionality won't work if a PR breaks something like LinAlg and there is no way to update LinAlg in the same PR because it has been moved to a separate package.

I agree with @amitmurthy that we also need to improve the infrastructure for building "batteries included" system images.

@quinnj
Copy link
Member

quinnj commented Dec 16, 2016

I agree with @amitmurthy that we also need to improve the infrastructure for building "batteries included" system images.

What's wrong with the usrimg.jl approach?

@stevengj
Copy link
Member

usrimg.jl is fine as far as it goes, but it requires too much user intervention. It needs to be coupled to our distribution mechanism so that julialang.org can post "batteries included" downloads. Moreover, there is the question of how updates are handled.

@quinnj
Copy link
Member

quinnj commented Dec 16, 2016

Oh, that's fine; I thought there were concerns in the mechanics of how userimg.jl works that didn't allow for making a "batteries included" distro.

@ChrisRackauckas
Copy link
Member

usrimg.jl is fine as far as it goes, but it requires too much user intervention. It needs to be coupled to our distribution mechanism so that julialang.org can post "batteries included" downloads. Moreover, there is the question of how updates are handled.

Is that more of a distribution issue then, an issue with sticking the right GUI on the install to make things easier, rather than requiring a new infrastructure? (Or a GUI for changing the system image?)

This made me think of a proposal which I opened up here to make it easier.

https://discourse.julialang.org/t/improved-installations-from-executables/1001/1

@stevengj
Copy link
Member

stevengj commented Dec 16, 2016

It is distribution (and Pkg/updating) infrastructure, but that's still new infrastructure.

@ViralBShah
Copy link
Member

ViralBShah commented Sep 20, 2017

It appears to me that there are two separate issues here. The first one that is not contentious is that we need new infrastructure, which is already coming along with Pkg3/Bindeps2.

The second issue is what the list of default packages should be, and should they continue to be in the julialang distribution, and if those packages are made available in the namespace by default. I don't think the second answer is easy for the 1.0 release schedule. The Pkg3 manifest file makes it easy to recreate a Julia environment in an easy way, and seems like an important part of the way forward.

I am in favour of closing this issue in favour of existing Pkg3/Bindeps2 issues, which get addressed in 1.0, and the larger issue of a list of default packages be something we address after 1.0.

@ViralBShah ViralBShah added the triage This should be discussed on a triage call label Sep 20, 2017
@stevengj
Copy link
Member

@ViralBShah, there is also the issue of startup time if you want to load a bunch of external packages by default. Since that is a performance/packaging issue it doesn't need to be settled for 1.0, but it should be tracked somewhere.

@StefanKarpinski
Copy link
Sponsor Member Author

We need this one way or another, so whether we close this issue or not doesn't change the actual work that needs to be done. I have some thoughts on this which I can write up.

@StefanKarpinski StefanKarpinski added excision Removal of code from Base or the repository and removed triage This should be discussed on a triage call labels Sep 21, 2017
@JeffBezanson JeffBezanson self-assigned this Sep 29, 2017
@JeffBezanson JeffBezanson added the stdlib Julia's standard library label Sep 29, 2017
@StefanKarpinski
Copy link
Sponsor Member Author

Is this done now? What else is there to do?

@JeffBezanson
Copy link
Sponsor Member

It's probably done. A remaining piece could be to clone certain packages during build, so that packages in external repos can also be in stdlib, but that can probably just be done case-by-case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design of APIs or of the language itself excision Removal of code from Base or the repository modules packages Package management and loading stdlib Julia's standard library
Projects
None yet
Development

No branches or pull requests