Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Establish guideline for packages that can upload to the scientific-python nightly channel #30

Open
matthewfeickert opened this issue Aug 5, 2023 · 33 comments

Comments

@matthewfeickert
Copy link
Member

We probably want to add some more guidance about what we can include, but we can sort things out as we go.

I really would prefer if the guidelines are figured out first than starting to add random packages (==non core, and core dependencies), then ending up to need to say no for others later when we run into technical limitations, e.g. not enough space, etc.
That guideline could be very generic, that e.g. specific things from the domain stack could use the same space, etc, but I definitely see the value of figuring it out first rather than when a handful of packages are already there.

Originally posted by @bsipocz (referencing @jarrodmillman) in #29 (comment)

@matthewfeickert
Copy link
Member Author

matthewfeickert commented Aug 5, 2023

That guideline could be very generic, that e.g. specific things from the domain stack could use the same space, etc, but I definitely see the value of figuring it out first rather than when a handful of packages are already there.

What level of formality should the guideline be? Is this at the level of a SPEC (either a new one or additional information added to SPEC 4)? Or is it something less formal that is just an amendment to the

# Access
To request access to the repository, please open an issue on [this action's
repository](https://github.com/scientific-python/upload-nightly-action). You can
then generate a token at `https://anaconda.org/scientific-python-nightly-wheels/settings/access`
with permissions to _Allow write access to the API site_ and _Allow uploads to Standard Python repositories_,
and add the token as a secret to your GitHub repository.

that we have in the README now?

Establishing criteria that aren't very exclusive will probably be difficult unless there is some (not great) metric on use numbers (if people have great ideas here that prove me wrong I would be very happy!), as I think "core" is already quite a fuzzy term with regards to the libraries that are already on the channel (and I mean that with no disrespect to any of the projects there, just that I think even with a small group of projects the idea of "core" becomes highly subjective based off of your use cases and experience).

@bsipocz
Copy link
Member

bsipocz commented Aug 9, 2023

What is being a core project for SPEC purposes has been discussed quite a lot both by the SPEC committee as well at the summits (both dev and the domain one), and as far as I recall the domain stacks were also discussed. So, it's not really fuzzy or arbitrary of what ended up in the channel.
The channel also has technical (e.g. size) limits, which was also discussed when we started the migration, so those limits should be explored, etc before it's being opened up to a lot more packages that currently have domain stack usage rather than across the ecosystem.

@matthewfeickert
Copy link
Member Author

matthewfeickert commented Aug 9, 2023

What is being a core project for SPEC purposes has been discussed quite a lot both by the SPEC committee as well at the summits (both dev and the domain one), and as far as I recall the domain stacks were also discussed. So, it's not really fuzzy or arbitrary of what ended up in the channel.

Is this written down anywhere publicly? If not, it should be in a way that is easy to find. If it has been discussed at length then it will hopefully be straightforward for people who did this to summarize what was decided. Also there is a mismatch between what was on the old channel and this one,

old-channel

so was this intentional that some of those projects aren't on this channel?

so those limits should be explored, etc before it's being opened up to a lot more packages

At the moment we're using 6.6 of 20.0 GB

storage

That amount shouldn't significantly change for the packages shown, and can be brought down by simply changing

env:
N_LATEST_UPLOADS: 5

(though some of the packages only upload 1 wheel that just overwrites the previous).

If you round up that total to say 10 gigs reserved for core (we can also ask Anaconda Cloud for more storage) then you have the remaining half to be distributed to other packages.

I think it is reasonable to say that additional packages that want to have nightlies distributed now could do so if they are able to show that their wheels are under 1 Gig. I also think it is reasonable to ask Anaconda Cloud for more storage (whoever setup the Anaconda Cloud org would need to do that though).

@matthewfeickert
Copy link
Member Author

Is this written down anywhere publicly?

Yes, yes it is very public if I could read: https://scientific-python.org/specs/core-projects/

@bsipocz
Copy link
Member

bsipocz commented Aug 9, 2023

so was this intentional that some of those projects aren't on this channel?

yes. dipy is very much a domain package, and I can't exactly recall h5py, but it's also somewhat very specific. The only one that got migrated, but shouldn't have, is statsmodels, but as I recall it got moved before the summit and the rest of the packages.

@matthewfeickert
Copy link
Member Author

Following PR #33 the storage for core packages has dropped by 1.6 GB (so good call on suggesting that @bsipocz):

core-storage-use

@mattip
Copy link

mattip commented Aug 15, 2023

At the moment we're using 6.6 of 20.0 GB

The previous site https://anaconda.org/multibuild-wheels-staging has a limit of 50GB. Is there a hard limit of 20GB these days? I guess part of the onboarding should also be to discuss the storage limits and how projects will allocate these between them.

@matthewfeickert
Copy link
Member Author

Is there a hard limit of 20GB these days?

I think no, but that a request needs to be made to Anaconda Cloud for more storage (this is based off my thoughts from #30 (comment)). @jarrodmillman as I think you(?) made the Anaconda Cloud organization https://anaconda.org/scientific-python-nightly-wheels/ can you make a request to Anaconda for more storage?

@matthewfeickert
Copy link
Member Author

@jarrodmillman A much delayed (sorry, late Summer got too busy) ping on the Anaconda Cloud organization storage limits check.

@matthewfeickert
Copy link
Member Author

@jarrodmillman as I'm coming back to this Issue given Issue #45, were you able to check on the Anaconda Cloud organization storage limits?

@jarrodmillman
Copy link
Member

I asked someone to ask, but never heard back. I am not sure who to contact at Anaconda, but will ask a around.

@larsoner
Copy link

Separate from which packages like dipy should be included, there is the issue of the extent to which optional dependencies of scientific python core packages themselves should be included. Continuing from #51 (comment) :

One immediate line we could draw is that this library is needed by a core library for full testing.

If that will be the policy, then we'll need to bring in a lot of libraries indeed. (looking at the extras dependencies of a few libraries e.g. xarray, mpl, etc, the list to include in this channel will be very long). This is not to say that h5py should not be here (I'm +/-0 on it), just that having this as a policy may not be as easy/clear-cut as it seems.

FWIW I don't think SPNW would need to supply all libraries that are optional deps -- just (ideally) all of those optional deps that prevent the SPNW-scoped modules (NumPy, SciPy, pandas, matplotlib, ...) from being fully tested and used by downstream libraries.

h5py falls in that camp because you must compile it against NumPy 2.0.0dev0 for it to import at all. pytables is also used by pandas[hdf5] and compiles against NumPy, I think it would also be good to include here as it falls into the "needs to be added here for full pandas functionality" category, too. However, many other optional dependencies (e.g., most pure-Python libraries) won't be like that. To me if a primary goal for SPNW is to allow bleeding edge code of SPNW-accepted modules to be tested, the more of that code you actually allow to be tested the better. And to maximize that, some of these libraries need to be supplied somehow, at least until they all release NumPy 2.0-compliant wheels. Then maybe this consideration becomes a bit moot.

@stefanv
Copy link
Member

stefanv commented Jan 23, 2024

I concur; inclusion of wheels should be pragmatic.

@tupui
Copy link
Member

tupui commented Jan 23, 2024

I Agree with the pure Python argument. Some rules could be:

  1. Needed to fully test core packages or/and are widely used in the community for testing purposes
  2. Need to be compiled
  3. Are difficult to do so: need more than a simple apt install and pip install
  4. Take a lot of time: >5 mins?

And if it's a yes. We could add limits such as:

  1. x GB at most
  2. x platforms at most

@bsipocz
Copy link
Member

bsipocz commented Jan 23, 2024

h5py falls in that camp because you must compile it against NumPy 2.0.0dev0 for it to import at all. pytables is also used by pandas[hdf5] and compiles against NumPy

the need to compile against numpy is a very strong argument and a good rule of thumb here, thanks, and it indeed addresses my initial fear in that comment that we pull in a lot of upstream dependencies that are either pure python or very much outside of the SP community involvement.

@matthewfeickert
Copy link
Member Author

Poking here, as it seems there is strong favor to include h5py, in terms of policy revision material, there's currently this discussion and the heuristics that @tupui laid out in #30 (comment). Can the @scientific-python/spec-steering-committee advise on what's the next step forward here? Would this be a discussion the Steering Committee needs to have in a meeting? A PR to update the SPEC Core Projects page? Something else?

@matthewfeickert
Copy link
Member Author

I'm not sure when this happened, but the avialable storage that we have on https://anaconda.org/scientific-python-nightly-wheels/ got doubled from 20 GB to 40 GB:

image

So we're only using about 1/4th of our total storage at the moment. 👍

@stefanv
Copy link
Member

stefanv commented Feb 22, 2024

I'm not sure when this happened, but the avialable storage that we have on https://anaconda.org/scientific-python-nightly-wheels/ got doubled from 20 GB to 40 GB:

I was corresponding to Anaconda about our project needs, and they generously doubled our storage while we conclude that conversation.

@jarrodmillman
Copy link
Member

jarrodmillman commented Feb 22, 2024

I would love to see more projects included, but I know there is some concern for adding more before we get more space. Given that they increased our storage to 40 GB, can we safely accommodate more projects? @matthewfeickert Could we add h5py, pytables, awkward, awkward-cpp, uproot, and shapely for now? (There may be others. This is just the list that immediately came to mind quickly scanning this discussion. Feel free to suggest other packages that I may have inadvertently overlooked.)

It may be easier to justify increasing our space allocation if we are using more of the 40GB to demonstrate there is an actual need for more space. If we don't get more space, we can always explain to new projects that the space constraints are the limiting factor for us going forward. But I am hopeful Anaconda will agree to vastly increasing our storage quota.

Regardless, the steering committee will discuss this during our March 5th meeting.

@tupui
Copy link
Member

tupui commented Feb 22, 2024

+1 to move forward with this list now.

@matthewfeickert
Copy link
Member Author

Could we add h5py, pytables, awkward, awkward-cpp, uproot, and shapely for now? (There may be others. This is just the list that immediately came to mind quickly scanning this discussion. Feel free to suggest other packages that I may have inadvertently overlooked.

Yes! 🚀 I think that I have all the admin priviliges necessary to be able to do this (?) but if not I'll ping you @jarrodmillman. I think I also understand the workflow needed as I setup https://anaconda.org/scikit-hep-nightly-wheels and got awkward-cpp and awkward working up there with @jpivarski.

I'll try to get to all of these issues before tomorrow.

@matthewfeickert
Copy link
Member Author

Could we add ... pytables

@jarrodmillman pytables doesn't have a request Issue open at the moment. If they would like to upload, can you have them open up an Issue so that we can track the setup process?

Feel free to suggest other packages that I may have inadvertently overlooked.

Should we also add:

@larsoner
Copy link

h5py is up 👍

@jarrodmillman
Copy link
Member

@larsoner It sounds like it would make sense to include pytables. Do you want to work with them? I am also happy to open an issue / PR if you prefer. Any other pandas dependencies we should consider adding (e.g., pyarrow)?

@jarrodmillman
Copy link
Member

@matthewfeickert Let's invite dipy and sunpy since they asked. So far we are still looking good for storage and we should take advantage of the extra space (especially since we have requested more).

@larsoner
Copy link

Yeah pyarrow was the other one that come to mind for me. Feel free to open an issue for tables and ping me, I haven't looked at their infrastructure much

@jarrodmillman
Copy link
Member

How about https://github.com/contourpy/contourpy? We ran into issues with it when testing scikit-image with numpy 2 nightly wheels: scikit-image/scikit-image#7288

Any other matplotlib dependencies that we should be considering at this point?

Even with the recent additions of awkward, awkward-cpp, shapely, h5py, and dipy, we are currently still in good shape (using 10.2 GB of our 40.0 GB quota).

@jarrodmillman
Copy link
Member

jarrodmillman commented Feb 23, 2024

See PyTables/PyTables#1115

@jarrodmillman
Copy link
Member

See apache/arrow#40216

@jarrodmillman
Copy link
Member

See contourpy/contourpy#362

@bsipocz
Copy link
Member

bsipocz commented Oct 23, 2024

@matthewfeickert - can we lift your table into a super short readme or other file in the repo and close this issue? Right now the policy, I feel, can be summarized in: community python libraries and their key dependencies in the scientific python ecosystem.

@matthewfeickert
Copy link
Member Author

SGTM. I'm at a conference atm but happy to have this closed and can follow up with anything else later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants