Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: meta package #4158

Closed
wants to merge 2 commits into from
Closed

Conversation

oliver-sanders
Copy link
Member

@oliver-sanders oliver-sanders commented Apr 1, 2021

sibling: cylc/cylc-doc#234

Easier to code than to explain...

The Cylc "meta" package is intended to bundle together a set of Cylc components at specified versions to insure inter-compatibility, make installation easier and ensure consistency between site installations.

There are two approaches:

  1. Create an empty package with the relevant Cylc packages as its dependencies.
  • We could create a PyPi package which provides an empty cylc.meta module.
  • We could then create a Conda forge package of that.
  • This may come under "package vendoring" which Forge doesn't like.
  1. Make the other Cylc components sub-packages of the "root" Cylc package.

Over discussions over the past year we have concluded:

  1. Cylc Flow will be present for all installations.
  2. The "meta" package version should always be the same as the Flow version.

We have been heading towards (1), however, considering the above I'm not convinced this is a good idea.
The only real advantage of (1) is that the user is more free to mix and match Cylc components (within the bounds of version pinning), which is more con than pro.

This PR implements (2), simply making the other Cylc components optional dependencies of cylc-flow. See how it would work in the docs PR - cylc/cylc-doc#234.

In Forge-land optional dependencies become package outputs (see Bruno's working example).

This is nice and simple, easy to maintain and keeps the PyPi/Forge versions nicely coupled. One problem, the "cylc-flow" package is now the "cylc" meta-package.

Proposal:

  • Go for option (2).
  • Deploy this repo as cylc on PyPi and Forge rather than cylc-flow
    • If Cylc Flow is present in all installations then we don't need a special install tag for it.
    • The name "Flow" is now confused with the many other uses of "Flow".
    • The main component of Cylc "Flow" is called the "Scheduler".
    • We already have deployments in these namespaces).
  • Wipe the old cylc-flow namespaces as best we can.
  • Rename this repo back to cylc 🤦.

TODO:

  • Convert cylc-flow references to cylc

Requirements check-list

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Does not need tests (why?).
  • No change log entry required (why? e.g. invisible to users).
  • (master branch) I have opened a documentation PR at proposal: meta package cylc-doc#234
  • Created an issue at cylc-flow conda-forge repository with version changes (if you changed dependencies in setup.py, see recipe/meta.yaml).

@oliver-sanders oliver-sanders self-assigned this Apr 1, 2021
@oliver-sanders oliver-sanders added this to the cylc-8.0b1 milestone Apr 1, 2021
@oliver-sanders oliver-sanders added the question Flag this as a question for the next Cylc project meeting. label Apr 1, 2021
name = cylc-flow
name = cylc
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renames the PyPi distribution cylc-flow -> cylc.

Copy link
Member

@kinow kinow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am not one of the best to review this PR, but leaving my 0.02 cents here anyway 🙂

There are definitely pros and cons to having the option to mix and match the components of a Cylc installation IMO. While it may be convenient to have a single package that, when installed, will bring all the packages required for the Cylc system, I normally prefer being able to choose what I will install where.

That way I can customize an Ansible/K8s/Docker/Terraform deployment specifying which modules/packages are installed where. But that's just my preference, I think the best people to judge how Cylc should be installed are yourself, Davids, Hilary, and others supporting production environments.

JupyterHub is a good example for this too, I think. It allows you to install jupyterhub Python package, the configurable-http-proxy (or alternatives) using the NPM package manager. You can also install Postgres with the operating system package manager.

However, the equivalent of their metapackage is their Kubernetes Helm, which can be customized for things like which proxy to use (configurable-http-proxy or traefik).

Should we move in the future to allow users to install Cylc in containers, cloud services, or use any database provider, I think the pip metapackage would be obsolete. The conda package could perhaps still be used.

This metapackage has cons as well. If jupyterhub ever has a CVE reported for it, and it matches the version we are using in Cylc UI Server in version a.b.c, then we could create a CVE for Cylc UI Server x.y.1, quickly release x.y.2, and ask users to update to this version. Sysadmins wouldn't have to worry about hosts with cylc-flow. With the metapackage, in this scenario, they would have to make sure that the cylc package installed did not bring the cylc-uiserver as an optional dependency (i.e. it has only the scheduler installed) (and I think the right thing would be to report the CVE against cylc-uiserver and against cylc metapackages, assuming we adopt the security workflow used in some other projects),

But said all that, I'm not maintaining Cylc in operations, and we will probably start thinking about containers, cloud, in a later version of Cylc 8 or Cylc 9, so we can try this different approach if others prefer.

In Conda Forge I think we would ignore the cylc-flow-feedstock recipe and artefact, and instead update conda-forge/cylc-feedstock to install the python code from this repository and handle the multiple outputs in Conda Forge (which I have that WIP PR that can be used if helpful).

Bruno

# these are other cylc components within the cylc meta-package
'uiserver': [
'cylc-uiserver==0.3.0'
],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it mean cylc has a downstream dependency on cylc-uiserver, and that cylc-uiserver has an upstream dependency on cylc now? Creating a circular dependency here?

Not sure whether it will always work (i.e. pip, poetry, pipenv, tox, setuptools, importlib; each of these could fail to handle a circular dependency I think? Looks like pip has had some issues before - https://github.com/pypa/pip/issues?q=circular+dependency)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I was thinking (not very deeply!) that we'd have to remove the cylc dependency from the UIS, but that doesn't make sense since it is a real code dependency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are re-using the protobuf schema and some GraphQL code too (also some ID parsing maybe?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UI Server imports the Protobuf and GraphQL schema from cylc-flow directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether circular imports are problematic, will have to do some reading.

'cylc-uiserver==0.3.0'
],
'rose': [
'cylc-rose==0.1.1'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto the above here, for circular dependency.

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Apr 8, 2021

Thanks for that, I hadn't thought about circular imports.

I normally prefer being able to choose what I will install where.

I think whichever way around we do it you will still be able to mix and match components if you so choose:

# 1 mix and match (separate meta package)
conda install cylc-flow cylc-uiserver

# 2 mix and match (integrated meta package)
conda install cylc cylc-uiserver

And either way around you get into the same problem if you try installing from both the meta and plain package:

$ conda install cylc cylc.uiserver
$ const install cylc-uiserver==<version>
version conflict between the cylc-uiserver version pinned by cylc.uiserver
and the one specified on the cli

I.E. I don't think it actually makes a difference whether we use a separate metapackage or put optional deps into the main one. Because cylc-flow would be the default base installation both give exactly the same end result.

However, version pinning in the ui-server may mean that there are a very small number of possible combinations that would work since the ui-server will need to import the GraphQL and Protobuf schema from newer Cylc versions in order to be able to work with newer fields.

JupyterHub is a good example for this too

Not sure about the pip package, from the conda side Jupyterhub is a single feedstock with three outputs:

# install jupyterhub & node & configurable-http-proxy
conda install jupyterhub

# install jupyterhub & nodebook
conda install jupyterhub-singleuser

# install jupyterhub only
conda install jupyterhub-base

I'm not sure what the difference between one feedstock with multiple outputs and multiple feedstocks is?

I think it's just less overhead having one feedstock and you avoid package vendoring?

@oliver-sanders
Copy link
Member Author

I find the idea of a blank meta-package which serves only to bring in other repos a little awkward since it's really just a strange way of sharing a recipe.

However, I'm not sure how useful this actually is since

  • This recipe won't contain everything a site requires.
  • The recipe may contain things a site doesn't want.
  • Sites requirements are all going to be different.

For example one site might want an environment that looks like this:

name: cylc-8.0b0
channels:
  - conda-forge
 dependencies:
  - python=3.7.10
  - cylc-flow=8.0b0
  - cylc-uiserver=0.3.0
  - cylc-rose=0.1.1
  - metomi-rose=2.0b1
  - some-jupyterhub-auth-plugin
  - some-jupyteerhub-spawner-plugin

Sites might want to do things like:

  • Pin the version (or build) of Python they use.
  • Install Jupyterhub plugins (potentially internally hosted).
  • Install Cylc plugins (potentially internally hosted).
  • Swap out the configurable-http-proxy (and the node dep it drags in) for something else.
  • Install bash to standardise the version used across different platforms.

E.G some other site might want:

name: cylc-8.0b0
channels:
  - conda-forge
  - artifatory-internal
 dependencies:
  - bash=5
  - coreutils
  - python=3.9.2
  - cylc-flow=8.0b0
  - cylc-uiserver=0.3.0
  - internal-jupyterhub-auth-plugin
  - internal-cylc-plugin

Another site might want to bring in dependencies from a commercially supported channel:

name: cylc-8.0b0
channels:
  - anaconda
 dependencies:
  - python=3.7.10

@hjoliver hjoliver modified the milestones: cylc-8.0b1, cylc-8.0b2 Apr 15, 2021
@hjoliver
Copy link
Member

hjoliver commented Aug 4, 2021

(meeting: @oliver-sanders to supersede this with a proposal for a simpler no-metapackage solution + document how to install the separate packages as required)

@hjoliver
Copy link
Member

@oliver-sanders - can we close this now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Flag this as a question for the next Cylc project meeting.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants