-
-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Beta support for configurable dependency resolution & Biocontainers. #214
Conversation
This is awesome! I will review at the soonest opportunity (likely early next week). "CWL-specific Dockerfiles" I think you mean Docker-specific CWL files? |
I definitely want this feature, but I'd like to see some more work before merging. Are you able to continue working on this branch in response to comments? Otherwise can see about working off of this branch. The intended design of SoftwareRequirement is to avoid picking favorites with the packaging system. The
Resolution should work something like this:
|
Reviewing #212, I'm not sure why a separate PYTHON_RUN_SCRIPT is necessary. Everything that it does can be accomplished generating shell code? |
@tetron I don't think the specs should strictly be needed - I can look at seqtk and find it Conda, Brew, and Debian. Same is going to hold for most packages in our experience. Back in #93 I made the same argument passionately - that time about bwa.
That comment was +1'd by @pvanheus who has a lot more experience adapting software inventories these platforms - Galaxy and cwl. The tool author shouldn't need to know every last package manager that this may be deployed against - forcing them to know this a priori is a mistake and is a recipe for non-future facing, brittle tools. If instead you want to use the spec field as an optional hint when the answer isn't the intuitive, natural one, then I think this idea is acceptable and I'm more than happy to add it to the galaxy-lib stuff myself. The resulting improvement would benefit Galaxy at the margins also. But it is the margins - I assure you. If you ignore the spec thing however - |
cfc9da0
to
1625611
Compare
@tetron Rebased with noted spelling fix - thanks for the catch! |
Ok, this seems reasonable. In that case, the behavior should be something like, check Let me see if I'm following: the strategy is to generate shell script which does the package installation prior to running the command. This may run on a separate node (or VM or container environment) from the workflow engine. How does this work with multiple package managers or package versions? Does it generate an if-then-else chain? |
@tetron So a limitation of the current implementation is that it does the install or generates the shell code to execute runs on the node running cwltool. (This works on fine on clusters that have a certain directory mounted across them and in Galaxy land our remote job execution tool Pulsar knows how to resolve dependencies so for jobs without a shared file system this would happen closer to the job.) I guess cwltool is somewhat explicitly single node so this shouldn't be a problem right? Explicit support for something like Toil would be great - I'd be happy to field PRs to galaxy-lib to enable that or do it myself if I sense enough eagerness for it. If multiple "package managers" are used (the term is a misnomer since something like Galaxy packages or environment modules are not package mangers) - each may resolve packages individually and produce its own string for injecting the environment into the shell. |
This adapts CWL-style ``SoftwareRequirement`` ``specs`` to solve galaxyproject#1927. Here I'm trying to implement the CWL specification in a way that helps enable the feasibility of Conda packaging in Galaxy. It is a delicate balancing act aimed to upset as many interested parties as I can. To understand the problem - consider the ``blast+`` requirement found in the Galaxy wrappers. It looks something like this: ``` <requirement type="package" version="2.2.31" name="blast+"> ``` Great, that works for Galaxy and Tool Shed packages. It doesn't work for bioconda at all. I think this problem is fairly uncommon - most packages have simple names shared across Debian, Brew, Conda, etc... - but it does happen in some cases that there are inconsistencies. Some people have taken to duplicating the requirement - this is bad and should not be done since they are mutually exclusive and Galaxy will attempt to resolve both. This introduces the following syntax for tools with profile >= 16.10: ``` <requirement type="package" version="2.2.31" name="blast+"> <specification uri="https://anaconda.org/bioconda/blast" /> <specification uri="https://packages.debian.org/sid/ncbi-blast+" version="2.2.31-3" /> </requirement> ``` This allows finer grain resolution of the requirement without sacrificing the abstract name at the top. It allows the name and the version to be adapted by resolvers as needed (hopefully rarely so). This syntax is the future facing one, but obviously this tool would not work on older Galaxy versions. To remedy this - an alternative syntax can be used for tools targetting Galaxy verions pre-16.10: ``` <requirement type="package" version="2.2" specification_uris="https://anaconda.org/bioconda/[email protected],https://packages.debian.org/jessie/[email protected]">blast+</requirement> ``` This syntax sucks - but it does give newer Galaxies the ability to resolve the specifications without breaking the more simple functionality for older Galaxies. For more information on the CWL side of this - checkout the discussion on common-workflow-language/cwltool#214. The CWL specification information is defined at http://www.commonwl.org/v1.0/CommandLineTool.html#SoftwarePackage.
This adapts CWL-style ``SoftwareRequirement`` ``specs`` to solve galaxyproject#1927. Here I'm trying to implement the CWL specification in a way that helps enable the feasibility of Conda packaging in Galaxy. It is a delicate balancing act aimed to upset as many interested parties as I can. To understand the problem - consider the ``blast+`` requirement found in the Galaxy wrappers. It looks something like this: ``` <requirement type="package" version="2.2.31" name="blast+"> ``` Great, that works for Galaxy and Tool Shed packages. It doesn't work for bioconda at all. I think this problem is fairly uncommon - most packages have simple names shared across Debian, Brew, Conda, etc... - but it does happen in some cases that there are inconsistencies. Some people have taken to duplicating the requirement - this is bad and should not be done since they are mutually exclusive and Galaxy will attempt to resolve both. This introduces the following syntax for tools with profile >= 16.10: ``` <requirement type="package" version="2.2.31" name="blast+"> <specification uri="https://anaconda.org/bioconda/blast" /> <specification uri="https://packages.debian.org/sid/ncbi-blast+" version="2.2.31-3" /> </requirement> ``` This allows finer grain resolution of the requirement without sacrificing the abstract name at the top. It allows the name and the version to be adapted by resolvers as needed (hopefully rarely so). This syntax is the future facing one, but obviously this tool would not work on older Galaxy versions. To remedy this - an alternative syntax can be used for tools targetting Galaxy verions pre-16.10: ``` <requirement type="package" version="2.2" specification_uris="https://anaconda.org/bioconda/[email protected],https://packages.debian.org/jessie/[email protected]">blast+</requirement> ``` This syntax sucks - but it does give newer Galaxies the ability to resolve the specifications without breaking the more simple functionality for older Galaxies. For more information on the CWL side of this - checkout the discussion on common-workflow-language/cwltool#214. The CWL specification information is defined at http://www.commonwl.org/v1.0/CommandLineTool.html#SoftwarePackage.
Can one of the admins verify this patch? |
This adapts CWL-style ``SoftwareRequirement`` ``specs`` to solve galaxyproject#1927. Here I'm trying to implement the CWL specification in a way that helps enable the feasibility of Conda packaging in Galaxy. It is a delicate balancing act aimed to upset as many interested parties as I can. To understand the problem - consider the ``blast+`` requirement found in the Galaxy wrappers. It looks something like this: ``` <requirement type="package" version="2.2.31" name="blast+"> ``` Great, that works for Galaxy and Tool Shed packages. It doesn't work for bioconda at all. I think this problem is fairly uncommon - most packages have simple names shared across Debian, Brew, Conda, etc... - but it does happen in some cases that there are inconsistencies. Some people have taken to duplicating the requirement - this is bad and should not be done since they are mutually exclusive and Galaxy will attempt to resolve both. This introduces the following syntax for tools with profile >= 16.10: ``` <requirement type="package" version="2.2.31" name="blast+"> <specification uri="https://anaconda.org/bioconda/blast" /> <specification uri="https://packages.debian.org/sid/ncbi-blast+" version="2.2.31-3" /> </requirement> ``` This allows finer grain resolution of the requirement without sacrificing the abstract name at the top. It allows the name and the version to be adapted by resolvers as needed (hopefully rarely so). This syntax is the future facing one, but obviously this tool would not work on older Galaxy versions. To remedy this - an alternative syntax can be used for tools targetting Galaxy verions pre-16.10: ``` <requirement type="package" version="2.2" specification_uris="https://anaconda.org/bioconda/[email protected],https://packages.debian.org/jessie/[email protected]">blast+</requirement> ``` This syntax sucks - but it does give newer Galaxies the ability to resolve the specifications without breaking the more simple functionality for older Galaxies. For more information on the CWL side of this - checkout the discussion on common-workflow-language/cwltool#214. The CWL specification information is defined at http://www.commonwl.org/v1.0/CommandLineTool.html#SoftwarePackage. Conflicts: lib/galaxy/tools/deps/__init__.py test/functional/tools/samples_tool_conf.xml
This adapts CWL-style ``SoftwareRequirement`` ``specs`` to solve galaxyproject#1927. Here I'm trying to implement the CWL specification in a way that helps enable the feasibility of Conda packaging in Galaxy. It is a delicate balancing act aimed to upset as many interested parties as I can. To understand the problem - consider the ``blast+`` requirement found in the Galaxy wrappers. It looks something like this: ``` <requirement type="package" version="2.2.31" name="blast+"> ``` Great, that works for Galaxy and Tool Shed packages. It doesn't work for bioconda at all. I think this problem is fairly uncommon - most packages have simple names shared across Debian, Brew, Conda, etc... - but it does happen in some cases that there are inconsistencies. Some people have taken to duplicating the requirement - this is bad and should not be done since they are mutually exclusive and Galaxy will attempt to resolve both. This introduces the following syntax for tools with profile >= 16.10: ``` <requirement type="package" version="2.2.31" name="blast+"> <specification uri="https://anaconda.org/bioconda/blast" /> <specification uri="https://packages.debian.org/sid/ncbi-blast+" version="2.2.31-3" /> </requirement> ``` This allows finer grain resolution of the requirement without sacrificing the abstract name at the top. It allows the name and the version to be adapted by resolvers as needed (hopefully rarely so). This syntax is the future facing one, but obviously this tool would not work on older Galaxy versions. To remedy this - an alternative syntax can be used for tools targetting Galaxy verions pre-16.10: ``` <requirement type="package" version="2.2" specification_uris="https://anaconda.org/bioconda/[email protected],https://packages.debian.org/jessie/[email protected]">blast+</requirement> ``` This syntax sucks - but it does give newer Galaxies the ability to resolve the specifications without breaking the more simple functionality for older Galaxies. For more information on the CWL side of this - checkout the discussion on common-workflow-language/cwltool#214. The CWL specification information is defined at http://www.commonwl.org/v1.0/CommandLineTool.html#SoftwarePackage. Conflicts: lib/galaxy/tools/deps/__init__.py test/functional/tools/samples_tool_conf.xml
This adapts CWL-style ``SoftwareRequirement`` ``specs`` to solve galaxyproject#1927. Here I'm trying to implement the CWL specification in a way that helps enable the feasibility of Conda packaging in Galaxy. It is a delicate balancing act aimed to upset as many interested parties as I can. To understand the problem - consider the ``blast+`` requirement found in the Galaxy wrappers. It looks something like this: ``` <requirement type="package" version="2.2.31" name="blast+"> ``` Great, that works for Galaxy and Tool Shed packages. It doesn't work for bioconda at all. I think this problem is fairly uncommon - most packages have simple names shared across Debian, Brew, Conda, etc... - but it does happen in some cases that there are inconsistencies. Some people have taken to duplicating the requirement - this is bad and should not be done since they are mutually exclusive and Galaxy will attempt to resolve both. This introduces the following syntax for tools with profile >= 16.10: ``` <requirement type="package" version="2.2.31" name="blast+"> <specification uri="https://anaconda.org/bioconda/blast" /> <specification uri="https://packages.debian.org/sid/ncbi-blast+" version="2.2.31-3" /> </requirement> ``` This allows finer grain resolution of the requirement without sacrificing the abstract name at the top. It allows the name and the version to be adapted by resolvers as needed (hopefully rarely so). This syntax is the future facing one, but obviously this tool would not work on older Galaxy versions. To remedy this - an alternative syntax can be used for tools targetting Galaxy verions pre-16.10: ``` <requirement type="package" version="2.2" specification_uris="https://anaconda.org/bioconda/[email protected],https://packages.debian.org/jessie/[email protected]">blast+</requirement> ``` This syntax sucks - but it does give newer Galaxies the ability to resolve the specifications without breaking the more simple functionality for older Galaxies. For more information on the CWL side of this - checkout the discussion on common-workflow-language/cwltool#214. The CWL specification information is defined at http://www.commonwl.org/v1.0/CommandLineTool.html#SoftwarePackage. Conflicts: lib/galaxy/tools/deps/__init__.py test/functional/tools/samples_tool_conf.xml
This adapts CWL-style ``SoftwareRequirement`` ``specs`` to solve galaxyproject#1927. Here I'm trying to implement the CWL specification in a way that helps enable the feasibility of Conda packaging in Galaxy. It is a delicate balancing act aimed to upset as many interested parties as I can. To understand the problem - consider the ``blast+`` requirement found in the Galaxy wrappers. It looks something like this: ``` <requirement type="package" version="2.2.31" name="blast+"> ``` Great, that works for Galaxy and Tool Shed packages. It doesn't work for bioconda at all. I think this problem is fairly uncommon - most packages have simple names shared across Debian, Brew, Conda, etc... - but it does happen in some cases that there are inconsistencies. Some people have taken to duplicating the requirement - this is bad and should not be done since they are mutually exclusive and Galaxy will attempt to resolve both. This introduces the following syntax for tools with profile >= 16.10: ``` <requirement type="package" version="2.2.31" name="blast+"> <specification uri="https://anaconda.org/bioconda/blast" /> <specification uri="https://packages.debian.org/sid/ncbi-blast+" version="2.2.31-3" /> </requirement> ``` This allows finer grain resolution of the requirement without sacrificing the abstract name at the top. It allows the name and the version to be adapted by resolvers as needed (hopefully rarely so). This syntax is the future facing one, but obviously this tool would not work on older Galaxy versions. To remedy this - an alternative syntax can be used for tools targetting Galaxy verions pre-16.10: ``` <requirement type="package" version="2.2" specification_uris="https://anaconda.org/bioconda/[email protected],https://packages.debian.org/jessie/[email protected]">blast+</requirement> ``` This syntax sucks - but it does give newer Galaxies the ability to resolve the specifications without breaking the more simple functionality for older Galaxies. For more information on the CWL side of this - checkout the discussion on common-workflow-language/cwltool#214. The CWL specification information is defined at http://www.commonwl.org/v1.0/CommandLineTool.html#SoftwarePackage. Conflicts: lib/galaxy/tools/deps/__init__.py test/functional/tools/samples_tool_conf.xml
And we are up and running on this branch again at the GCC 2017 Hackathon. Just summarize what I have done and some conversations...
So my TODO list for the rest of the hackathon is:
These last two won't actually happen - but they sound fun. |
Tested using |
log = ... # type: Any | ||
CONFIG_VAL_NOT_FOUND = ... # type: Any | ||
|
||
def build_dependency_manager(config): ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a full type signature. It returns a DependencyManager
, but I am unsure of what type config
is.
def __init__(self, default_base_path, conf_file: Optional[Any] = ..., app_config: Any = ...) -> None: ... | ||
def get_resolver_option(self, resolver, key, explicit_resolver_options: Any = ...): ... | ||
def get_app_option(self, key, default: Optional[Any] = ...): ... | ||
def dependency_shell_commands(self, requirements, **kwds): ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a full type signature.
def to_dict(self): ... | ||
def copy(self): ... | ||
@staticmethod | ||
def from_dict(dict): ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a full type description
tool_requirements = ... # type: Any | ||
def __init__(self, tool_requirements: Optional[Any] = ...) -> None: ... | ||
@staticmethod | ||
def from_list(requirements): ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a full type description
cwltool/draft2tool.py
Outdated
@@ -174,9 +174,19 @@ class CommandLineTool(Process): | |||
def __init__(self, toolpath_object, **kwargs): | |||
# type: (Dict[Text, Any], **Any) -> None | |||
super(CommandLineTool, self).__init__(toolpath_object, **kwargs) | |||
self.find_default_container = kwargs["find_default_container"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is causing multiple tests to fail.
cwltool/main.py
Outdated
@@ -838,5 +863,103 @@ def locToPath(p): | |||
_logger.addHandler(defaultStreamHandler) | |||
|
|||
|
|||
COMMAND_WITH_DEPENDENCIES_TEMPLATE = string.Template("""#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the rest of this be moved to an external file?
450f646
to
d25285c
Compare
|
Consider the included tool ``seqtk_seq.cwl``. It includes the following SoftwareRequirement hint: ``` hints: SoftwareRequirement: packages: - package: seqtk version: - 1.2 ``` I'm not happy that ``version`` is a list - but I can live with it for now I guess. If cwltool is executed with the hidden ``--beta-conda-dependencies`` flag, this requirement will be processed by galaxy-lib, Conda will be installed, and seqtk will be installed, and a Conda environment including seqtk will be setup for the job. ``` virtualenv .venv . .venv/bin/activate python setup.py install pip install galaxy-lib cwltool --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` Additional flags are available to configure dependency resolution in a more fine grained way - using Conda however has a number of advantages that make it particularily well suited to CWL. Conda packages are distributed as binaries that work across Mac and Linux and work on relatively old version of Linux (great for HPC). Conda also doesn't require root and supports installation of multiple different versions of a package - again these factors make it great for HPC and non-Docker targets. The Biocontainers project (previously Biodocker) dovetails nicely with this. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io (assembled by @bgruening and colleagues). There are over 1800 such containers currently. Continuing with the example above, the new ``--beta-use-biocontainers`` flag instructs cwltool to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools). ``` cwltool --beta-use-biocontainers tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across CWL, Galaxy, and CLI - both inside and outside of Docker. My sincerest hope is that we move away from CWL-specific Dockerfiles. For less effort, a community bioconda package can be made and the result can be used in many more contexts. The Docker image will then be maintained by the community Biocontainer project. Rebased with correct spelling of DependenciesConfiguration thanks to @tetron.
@mr-c This branch's README now contains lots of documentation about SoftwareRequirements including how to resolve them with environment modules, custom package scripts, and Conda. It covers mapping to local infrastructure (for admins) as well mapping to multiple package managers from the tools themselves (for tool developers). It has many example calls that should work without external dependencies covering most of the test data I added and that can be run to try these things out today. It also contains some external links and information about installing the optional dependencies required to run these. Here are the docs - https://github.com/jmchilton/cwltool/tree/galaxy_deps#leveraging-softwarerequirements-beta. |
@mr-c Consider this comment documentation that I agree to allow uploading of galaxy-lib stubs to typeshed. Hope this is good enough? |
|
||
$ pip install 'cwltool[deps]' | ||
|
||
Installing cwltool in this fashion enables several new command line options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. should we hide them when the deps
extra isn't modified? What happens if they are called and galaxy-lib is missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have clarified this behavior with #459 and improved it a bit.
Merging as this is tremendous; we can add additional polish later. Thank you @jmchilton ! |
Wuhu, after such a long time. Great to see it finally happen. |
…ainers. This work depends on galaxy-lib (available on PyPI) and currently unmerged cwltool enhancements from common-workflow-language/cwltool#456. This commit enables all the same options in cwltoil as added to cwltool recently in common-workflow-language/cwltool#214 including `` --beta-conda-dependencies``, ``--beta-dependency-resolvers-configuration``, and ``--beta-use-biocontainers``. The first two of these are documented in depth in cwltool's README now (https://github.com/common-workflow-language/cwltool/#leveraging-softwarerequirements-beta). Here I will quickly review a couple of the available options against test examples available in cwltool's ``tests`` directory. I will assume cwltoil has been setup as configure in this branch and galaxy-lib installed in the same environment - for instance with ``pip install galaxy-lib``. Now lets grab the examples from cwltool... ``` git clone https://github.com/common-workflow-language/cwltool.git cd cwltool ``` From here we can quickly demonstrate installation and resolution of SoftwareRequirements using Conda using the tests/seqtk_seq.cwl tool. This tool doesn't define a DockerRequirement but does define the following SoftwareRequirements as ``hints`` as follows: ``` hints: SoftwareRequirement: packages: - package: seqtk version: - r93 ``` We can try this tool out with cwltoil and see that by default we probably don't have the binary seqtk on our ``PATH`` and so the tool fails using the following command: ``` cwltoil tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` This should result in a tool execution failure. We can then instruct cwltoil to install the required package from Bioconda into an isolated environment and use it as needed by passing it the ``--beta-conda-dependencies`` flag as follows: ``` cwltoil --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` The tool should now be successful. The Conda support can be endless tweaked but the defaults are defaults that target the best practice Conda channels that work well for the Galaxy project. Additional SoftwareRequirement resolution options are available including targetting Software Modules, Homebrew, simple scripts called "Galaxy packages". All of these options can be specified and configured with a YAML file passed to cwltoil using the ``--beta-dependency-resolvers-configuration`` option instead of the simple shortcut ``--beta-conda-dependencies``. The cwltool documentation walks through a few examples of adapting infrastructure to tools and tools to package managers. In addition to options that allow configuring tool execution environments, containers themselves can be discovered and/or built from these software requirements. The [Biocontainers](https://github.com/BioContainers) project (previously Biodocker) contains a registry we use for this purpose. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io. There are over 2300 such containers currently. Continuing with the example above, the new `--beta-use-biocontainers` flag instructs cwltoil to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools). ``` cwltool --beta-use-biocontainers tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across cwltool, cwltoil, Galaxy, and CLI - both inside and outside of containers.
…ainers. This work depends on galaxy-lib (available on PyPI) and currently unmerged cwltool enhancements from common-workflow-language/cwltool#456. This commit enables all the same options in cwltoil as added to cwltool recently in common-workflow-language/cwltool#214 including `` --beta-conda-dependencies``, ``--beta-dependency-resolvers-configuration``, and ``--beta-use-biocontainers``. The first two of these are documented in depth in cwltool's README now (https://github.com/common-workflow-language/cwltool/#leveraging-softwarerequirements-beta). Here I will quickly review a couple of the available options against test examples available in cwltool's ``tests`` directory. I will assume cwltoil has been setup as configure in this branch and galaxy-lib installed in the same environment - for instance with ``pip install galaxy-lib``. Now lets grab the examples from cwltool... ``` git clone https://github.com/common-workflow-language/cwltool.git cd cwltool ``` From here we can quickly demonstrate installation and resolution of SoftwareRequirements using Conda using the tests/seqtk_seq.cwl tool. This tool doesn't define a DockerRequirement but does define the following SoftwareRequirements as ``hints`` as follows: ``` hints: SoftwareRequirement: packages: - package: seqtk version: - r93 ``` We can try this tool out with cwltoil and see that by default we probably don't have the binary seqtk on our ``PATH`` and so the tool fails using the following command: ``` cwltoil tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` This should result in a tool execution failure. We can then instruct cwltoil to install the required package from Bioconda into an isolated environment and use it as needed by passing it the ``--beta-conda-dependencies`` flag as follows: ``` cwltoil --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` The tool should now be successful. The Conda support can be endless tweaked but the defaults are defaults that target the best practice Conda channels that work well for the Galaxy project. Additional SoftwareRequirement resolution options are available including targetting Software Modules, Homebrew, simple scripts called "Galaxy packages". All of these options can be specified and configured with a YAML file passed to cwltoil using the ``--beta-dependency-resolvers-configuration`` option instead of the simple shortcut ``--beta-conda-dependencies``. The cwltool documentation walks through a few examples of adapting infrastructure to tools and tools to package managers. In addition to options that allow configuring tool execution environments, containers themselves can be discovered and/or built from these software requirements. The [Biocontainers](https://github.com/BioContainers) project (previously Biodocker) contains a registry we use for this purpose. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io. There are over 2300 such containers currently. Continuing with the example above, the new `--beta-use-biocontainers` flag instructs cwltoil to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools). ``` cwltoil --beta-use-biocontainers tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across cwltool, cwltoil, Galaxy, and CLI - both inside and outside of containers.
…ainers. This work depends on galaxy-lib (available on PyPI) and currently unmerged cwltool enhancements from common-workflow-language/cwltool#456. This commit enables all the same options in cwltoil as added to cwltool recently in common-workflow-language/cwltool#214 including `` --beta-conda-dependencies``, ``--beta-dependency-resolvers-configuration``, and ``--beta-use-biocontainers``. The first two of these are documented in depth in cwltool's README now (https://github.com/common-workflow-language/cwltool/#leveraging-softwarerequirements-beta). Here I will quickly review a couple of the available options against test examples available in cwltool's ``tests`` directory. I will assume cwltoil has been setup as configure in this branch and galaxy-lib installed in the same environment - for instance with ``pip install galaxy-lib``. Now lets grab the examples from cwltool... ``` git clone https://github.com/common-workflow-language/cwltool.git cd cwltool ``` From here we can quickly demonstrate installation and resolution of SoftwareRequirements using Conda using the tests/seqtk_seq.cwl tool. This tool doesn't define a DockerRequirement but does define the following SoftwareRequirements as ``hints`` as follows: ``` hints: SoftwareRequirement: packages: - package: seqtk version: - r93 ``` We can try this tool out with cwltoil and see that by default we probably don't have the binary seqtk on our ``PATH`` and so the tool fails using the following command: ``` cwltoil tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` This should result in a tool execution failure. We can then instruct cwltoil to install the required package from Bioconda into an isolated environment and use it as needed by passing it the ``--beta-conda-dependencies`` flag as follows: ``` cwltoil --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` The tool should now be successful. The Conda support can be endless tweaked but the defaults are defaults that target the best practice Conda channels that work well for the Galaxy project. Additional SoftwareRequirement resolution options are available including targetting Software Modules, Homebrew, simple scripts called "Galaxy packages". All of these options can be specified and configured with a YAML file passed to cwltoil using the ``--beta-dependency-resolvers-configuration`` option instead of the simple shortcut ``--beta-conda-dependencies``. The cwltool documentation walks through a few examples of adapting infrastructure to tools and tools to package managers. In addition to options that allow configuring tool execution environments, containers themselves can be discovered and/or built from these software requirements. The [Biocontainers](https://github.com/BioContainers) project (previously Biodocker) contains a registry we use for this purpose. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io. There are over 2300 such containers currently. Continuing with the example above, the new `--beta-use-biocontainers` flag instructs cwltoil to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools). ``` cwltoil --beta-use-biocontainers tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across cwltool, cwltoil, Galaxy, and CLI - both inside and outside of containers.
…ners. This work depends on galaxy-lib (available on PyPI) and currently unmerged cwltool enhancements from common-workflow-language/cwltool#456. This commit enables all the same options in cwltoil as added to cwltool recently in common-workflow-language/cwltool#214 including `` --beta-conda-dependencies``, ``--beta-dependency-resolvers-configuration``, and ``--beta-use-biocontainers``. The first two of these are documented in depth in cwltool's README now (https://github.com/common-workflow-language/cwltool/#leveraging-softwarerequirements-beta). Here I will quickly review a couple of the available options against test examples available in cwltool's ``tests`` directory. I will assume cwltoil has been setup as configure in this branch and galaxy-lib installed in the same environment - for instance with ``pip install galaxy-lib``. Now lets grab the examples from cwltool... ``` git clone https://github.com/common-workflow-language/cwltool.git cd cwltool ``` From here we can quickly demonstrate installation and resolution of SoftwareRequirements using Conda using the tests/seqtk_seq.cwl tool. This tool doesn't define a DockerRequirement but does define the following SoftwareRequirements as ``hints`` as follows: ``` hints: SoftwareRequirement: packages: - package: seqtk version: - r93 ``` We can try this tool out with cwltoil and see that by default we probably don't have the binary seqtk on our ``PATH`` and so the tool fails using the following command: ``` cwltoil tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` This should result in a tool execution failure. We can then instruct cwltoil to install the required package from Bioconda into an isolated environment and use it as needed by passing it the ``--beta-conda-dependencies`` flag as follows: ``` cwltoil --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` The tool should now be successful. The Conda support can be endless tweaked but the defaults are defaults that target the best practice Conda channels that work well for the Galaxy project. Additional SoftwareRequirement resolution options are available including targetting Software Modules, Homebrew, simple scripts called "Galaxy packages". All of these options can be specified and configured with a YAML file passed to cwltoil using the ``--beta-dependency-resolvers-configuration`` option instead of the simple shortcut ``--beta-conda-dependencies``. The cwltool documentation walks through a few examples of adapting infrastructure to tools and tools to package managers. In addition to options that allow configuring tool execution environments, containers themselves can be discovered and/or built from these software requirements. The [Biocontainers](https://github.com/BioContainers) project (previously Biodocker) contains a registry we use for this purpose. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io. There are over 2300 such containers currently. Continuing with the example above, the new `--beta-use-biocontainers` flag instructs cwltoil to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools). ``` cwltoil --beta-use-biocontainers tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across cwltool, cwltoil, Galaxy, and CLI - both inside and outside of containers.
This commit enables all the same options in cwltoil as added to cwltool recently in common-workflow-language/cwltool#214 including `` --beta-conda-dependencies``, ``--beta-dependency-resolvers-configuration``, and ``--beta-use-biocontainers``. The first two of these are documented in depth in cwltool's README now (https://github.com/common-workflow-language/cwltool/#leveraging-softwarerequirements-beta). Here I will quickly review a couple of the available options against test examples available in cwltool's ``tests`` directory. I will assume cwltoil has been setup as configure in this branch and galaxy-lib installed in the same environment - for instance with ``pip install galaxy-lib``. Now lets grab the examples from cwltool... ``` git clone https://github.com/common-workflow-language/cwltool.git cd cwltool ``` From here we can quickly demonstrate installation and resolution of SoftwareRequirements using Conda using the tests/seqtk_seq.cwl tool. This tool doesn't define a DockerRequirement but does define the following SoftwareRequirements as ``hints`` as follows: ``` hints: SoftwareRequirement: packages: - package: seqtk version: - r93 ``` We can try this tool out with cwltoil and see that by default we probably don't have the binary seqtk on our ``PATH`` and so the tool fails using the following command: ``` cwltoil tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` This should result in a tool execution failure. We can then instruct cwltoil to install the required package from Bioconda into an isolated environment and use it as needed by passing it the ``--beta-conda-dependencies`` flag as follows: ``` cwltoil --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` The tool should now be successful. The Conda support can be endless tweaked but the defaults are defaults that target the best practice Conda channels that work well for the Galaxy project. Additional SoftwareRequirement resolution options are available including targetting Software Modules, Homebrew, simple scripts called "Galaxy packages". All of these options can be specified and configured with a YAML file passed to cwltoil using the ``--beta-dependency-resolvers-configuration`` option instead of the simple shortcut ``--beta-conda-dependencies``. The cwltool documentation walks through a few examples of adapting infrastructure to tools and tools to package managers. In addition to options that allow configuring tool execution environments, containers themselves can be discovered and/or built from these software requirements. The [Biocontainers](https://github.com/BioContainers) project (previously Biodocker) contains a registry we use for this purpose. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io. There are over 2300 such containers currently. Continuing with the example above, the new `--beta-use-biocontainers` flag instructs cwltoil to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools). ``` cwltoil --beta-use-biocontainers tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across cwltool, cwltoil, Galaxy, and CLI - both inside and outside of containers.
@jmchilton are the galaxy |
The code should be Python 3 compatible. I must confess I'm unsure what it means for the typesheds to be Python 3 compatible. |
I just had a looks at the galaxy stub files. They should work with both python2 and python3. :) I just wanted to confirm if the galaxy codebase is py3 compatible and how the stubs for |
@manu-chroma Sounds good! If anything I've added makes the Python 3 stuff more difficult let me know and I can try to update it. |
This commit enables all the same options in cwltoil as added to cwltool recently in common-workflow-language/cwltool#214 including `` --beta-conda-dependencies``, ``--beta-dependency-resolvers-configuration``, and ``--beta-use-biocontainers``. The first two of these are documented in depth in cwltool's README now (https://github.com/common-workflow-language/cwltool/#leveraging-softwarerequirements-beta). Here I will quickly review a couple of the available options against test examples available in cwltool's ``tests`` directory. I will assume cwltoil has been setup as configure in this branch and galaxy-lib installed in the same environment - for instance with ``pip install galaxy-lib``. Now lets grab the examples from cwltool... ``` git clone https://github.com/common-workflow-language/cwltool.git cd cwltool ``` From here we can quickly demonstrate installation and resolution of SoftwareRequirements using Conda using the tests/seqtk_seq.cwl tool. This tool doesn't define a DockerRequirement but does define the following SoftwareRequirements as ``hints`` as follows: ``` hints: SoftwareRequirement: packages: - package: seqtk version: - r93 ``` We can try this tool out with cwltoil and see that by default we probably don't have the binary seqtk on our ``PATH`` and so the tool fails using the following command: ``` cwltoil tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` This should result in a tool execution failure. We can then instruct cwltoil to install the required package from Bioconda into an isolated environment and use it as needed by passing it the ``--beta-conda-dependencies`` flag as follows: ``` cwltoil --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` The tool should now be successful. The Conda support can be endless tweaked but the defaults are defaults that target the best practice Conda channels that work well for the Galaxy project. Additional SoftwareRequirement resolution options are available including targetting Software Modules, Homebrew, simple scripts called "Galaxy packages". All of these options can be specified and configured with a YAML file passed to cwltoil using the ``--beta-dependency-resolvers-configuration`` option instead of the simple shortcut ``--beta-conda-dependencies``. The cwltool documentation walks through a few examples of adapting infrastructure to tools and tools to package managers. In addition to options that allow configuring tool execution environments, containers themselves can be discovered and/or built from these software requirements. The [Biocontainers](https://github.com/BioContainers) project (previously Biodocker) contains a registry we use for this purpose. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io. There are over 2300 such containers currently. Continuing with the example above, the new `--beta-use-biocontainers` flag instructs cwltoil to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools). ``` cwltoil --beta-use-biocontainers tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across cwltool, cwltoil, Galaxy, and CLI - both inside and outside of containers.
This commit enables all the same options in cwltoil as added to cwltool recently in common-workflow-language/cwltool#214 including `` --beta-conda-dependencies``, ``--beta-dependency-resolvers-configuration``, and ``--beta-use-biocontainers``. The first two of these are documented in depth in cwltool's README now (https://github.com/common-workflow-language/cwltool/#leveraging-softwarerequirements-beta). Here I will quickly review a couple of the available options against test examples available in cwltool's ``tests`` directory. I will assume cwltoil has been setup as configure in this branch and galaxy-lib installed in the same environment - for instance with ``pip install galaxy-lib``. Now lets grab the examples from cwltool... ``` git clone https://github.com/common-workflow-language/cwltool.git cd cwltool ``` From here we can quickly demonstrate installation and resolution of SoftwareRequirements using Conda using the tests/seqtk_seq.cwl tool. This tool doesn't define a DockerRequirement but does define the following SoftwareRequirements as ``hints`` as follows: ``` hints: SoftwareRequirement: packages: - package: seqtk version: - r93 ``` We can try this tool out with cwltoil and see that by default we probably don't have the binary seqtk on our ``PATH`` and so the tool fails using the following command: ``` cwltoil tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` This should result in a tool execution failure. We can then instruct cwltoil to install the required package from Bioconda into an isolated environment and use it as needed by passing it the ``--beta-conda-dependencies`` flag as follows: ``` cwltoil --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` The tool should now be successful. The Conda support can be endless tweaked but the defaults are defaults that target the best practice Conda channels that work well for the Galaxy project. Additional SoftwareRequirement resolution options are available including targetting Software Modules, Homebrew, simple scripts called "Galaxy packages". All of these options can be specified and configured with a YAML file passed to cwltoil using the ``--beta-dependency-resolvers-configuration`` option instead of the simple shortcut ``--beta-conda-dependencies``. The cwltool documentation walks through a few examples of adapting infrastructure to tools and tools to package managers. In addition to options that allow configuring tool execution environments, containers themselves can be discovered and/or built from these software requirements. The [Biocontainers](https://github.com/BioContainers) project (previously Biodocker) contains a registry we use for this purpose. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io. There are over 2300 such containers currently. Continuing with the example above, the new `--beta-use-biocontainers` flag instructs cwltoil to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools). ``` cwltoil --beta-use-biocontainers tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across cwltool, cwltoil, Galaxy, and CLI - both inside and outside of containers.
This enables the reproducibilty stack described in [this preprint](https://www.biorxiv.org/content/early/2017/10/11/200683) and [presented at BOSC 2017](http://jmchilton.github.io/writing/bosc2017slides/biocontainers.html) under Toil. Concretely this enables all the same options in cwltoil as added to cwltool in common-workflow-language/cwltool#214 including `` --beta-conda-dependencies``, ``--beta-dependency-resolvers-configuration``, and ``--beta-use-biocontainers``. The first two of these are documented in depth in cwltool's README (https://github.com/common-workflow-language/cwltool/#leveraging-softwarerequirements-beta). Here I will quickly review a couple of the available options against test examples available in cwltool's ``tests`` directory using this branch of Toil. ``` git clone https://github.com/common-workflow-language/cwltool.git cd cwltool ``` From here we can quickly demonstrate installation and resolution of CWL ``SoftwareRequirement`` hints using Conda using the tests/seqtk_seq.cwl tool. This tool doesn't define an explicit ``DockerRequirement`` but does define the following ``SoftwareRequirement`` in its ``hints`` as follows: ``` hints: SoftwareRequirement: packages: - package: seqtk version: - r93 ``` We can try this tool out with ``cwltoil`` and see that by default we probably don't have the binary seqtk on our ``PATH`` and so the tool fails using the following command: ``` cwltoil tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` This should result in a tool execution failure. We can then instruct ``cwltoil`` to install the required package from Bioconda into an isolated environment and use it as needed by passing it the ``--beta-conda-dependencies`` flag as follows: ``` cwltoil --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` The tool should now be successful. The Conda support can be endless tweaked but the defaults are defaults that target the best practice Conda channels that work well for the Galaxy project. Additional ``SoftwareRequirement`` resolution options are available including targetting Software Modules, lmod, Homebrew, simple scripts called "Galaxy packages". All of these options can be specified and configured with a YAML file passed to cwltoil using the ``--beta-dependency-resolvers-configuration`` option instead of the simple shortcut ``--beta-conda-dependencies``. The cwltool documentation walks through a few examples of adapting infrastructure to tools and tools to package managers. Reference documentation is available in [galaxy-lib's documentation](http://galaxy-lib.readthedocs.io/en/latest/topics/dependency_resolution.html). In addition to options that allow configuring tool execution environments, containers themselves can be discovered and/or built from these software requirements. The [Biocontainers](https://github.com/BioContainers) project (previously Biodocker) contains a registry we use for this purpose. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io. There are over 3000 such containers currently. Continuing with the example above, the new `--beta-use-biocontainers` flag instructs ``cwltoil`` to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools). ``` cwltoil --beta-use-biocontainers tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across cwltool, cwltoil, Galaxy, and CLI - both inside and outside of containers.
This enables the reproducibilty stack described in [this preprint](https://www.biorxiv.org/content/early/2017/10/11/200683) and [presented at BOSC 2017](http://jmchilton.github.io/writing/bosc2017slides/biocontainers.html) under Toil. Concretely this enables all the same options in cwltoil as added to cwltool in common-workflow-language/cwltool#214 including `` --beta-conda-dependencies``, ``--beta-dependency-resolvers-configuration``, and ``--beta-use-biocontainers``. The first two of these are documented in depth in cwltool's README (https://github.com/common-workflow-language/cwltool/#leveraging-softwarerequirements-beta). Here I will quickly review a couple of the available options against test examples available in cwltool's ``tests`` directory using this branch of Toil. ``` git clone https://github.com/common-workflow-language/cwltool.git cd cwltool ``` From here we can quickly demonstrate installation and resolution of CWL ``SoftwareRequirement`` hints using Conda using the tests/seqtk_seq.cwl tool. This tool doesn't define an explicit ``DockerRequirement`` but does define the following ``SoftwareRequirement`` in its ``hints`` as follows: ``` hints: SoftwareRequirement: packages: - package: seqtk version: - r93 ``` We can try this tool out with ``cwltoil`` and see that by default we probably don't have the binary seqtk on our ``PATH`` and so the tool fails using the following command: ``` cwltoil tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` This should result in a tool execution failure. We can then instruct ``cwltoil`` to install the required package from Bioconda into an isolated environment and use it as needed by passing it the ``--beta-conda-dependencies`` flag as follows: ``` cwltoil --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` The tool should now be successful. The Conda support can be endless tweaked but the defaults are defaults that target the best practice Conda channels that work well for the Galaxy project. Additional ``SoftwareRequirement`` resolution options are available including targetting Software Modules, lmod, Homebrew, simple scripts called "Galaxy packages". All of these options can be specified and configured with a YAML file passed to cwltoil using the ``--beta-dependency-resolvers-configuration`` option instead of the simple shortcut ``--beta-conda-dependencies``. The cwltool documentation walks through a few examples of adapting infrastructure to tools and tools to package managers. Reference documentation is available in [galaxy-lib's documentation](http://galaxy-lib.readthedocs.io/en/latest/topics/dependency_resolution.html). In addition to options that allow configuring tool execution environments, containers themselves can be discovered and/or built from these software requirements. The [Biocontainers](https://github.com/BioContainers) project (previously Biodocker) contains a registry we use for this purpose. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io. There are over 3000 such containers currently. Continuing with the example above, the new `--beta-use-biocontainers` flag instructs ``cwltoil`` to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools). ``` cwltoil --beta-use-biocontainers tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across cwltool, cwltoil, Galaxy, and CLI - both inside and outside of containers.
This enables the reproducibilty stack described in [this preprint](https://www.biorxiv.org/content/early/2017/10/11/200683) and [presented at BOSC 2017](http://jmchilton.github.io/writing/bosc2017slides/biocontainers.html) under Toil. Concretely this enables all the same options in cwltoil as added to cwltool in common-workflow-language/cwltool#214 including `` --beta-conda-dependencies``, ``--beta-dependency-resolvers-configuration``, and ``--beta-use-biocontainers``. The first two of these are documented in depth in cwltool's README (https://github.com/common-workflow-language/cwltool/#leveraging-softwarerequirements-beta). Here I will quickly review a couple of the available options against test examples available in cwltool's ``tests`` directory using this branch of Toil. ``` git clone https://github.com/common-workflow-language/cwltool.git cd cwltool ``` From here we can quickly demonstrate installation and resolution of CWL ``SoftwareRequirement`` hints using Conda using the tests/seqtk_seq.cwl tool. This tool doesn't define an explicit ``DockerRequirement`` but does define the following ``SoftwareRequirement`` in its ``hints`` as follows: ``` hints: SoftwareRequirement: packages: - package: seqtk version: - r93 ``` We can try this tool out with ``cwltoil`` and see that by default we probably don't have the binary seqtk on our ``PATH`` and so the tool fails using the following command: ``` cwltoil tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` This should result in a tool execution failure. We can then instruct ``cwltoil`` to install the required package from Bioconda into an isolated environment and use it as needed by passing it the ``--beta-conda-dependencies`` flag as follows: ``` cwltoil --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` The tool should now be successful. The Conda support can be endless tweaked but the defaults are defaults that target the best practice Conda channels that work well for the Galaxy project. Additional ``SoftwareRequirement`` resolution options are available including targetting Software Modules, lmod, Homebrew, simple scripts called "Galaxy packages". All of these options can be specified and configured with a YAML file passed to cwltoil using the ``--beta-dependency-resolvers-configuration`` option instead of the simple shortcut ``--beta-conda-dependencies``. The cwltool documentation walks through a few examples of adapting infrastructure to tools and tools to package managers. Reference documentation is available in [galaxy-lib's documentation](http://galaxy-lib.readthedocs.io/en/latest/topics/dependency_resolution.html). In addition to options that allow configuring tool execution environments, containers themselves can be discovered and/or built from these software requirements. The [Biocontainers](https://github.com/BioContainers) project (previously Biodocker) contains a registry we use for this purpose. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io. There are over 3000 such containers currently. Continuing with the example above, the new `--beta-use-biocontainers` flag instructs ``cwltoil`` to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools). ``` cwltoil --beta-use-biocontainers tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across cwltool, cwltoil, Galaxy, and CLI - both inside and outside of containers.
This enables the reproducibilty stack described in [this preprint](https://www.biorxiv.org/content/early/2017/10/11/200683) and [presented at BOSC 2017](http://jmchilton.github.io/writing/bosc2017slides/biocontainers.html) under Toil. Concretely this enables all the same options in cwltoil as added to cwltool in common-workflow-language/cwltool#214 including `` --beta-conda-dependencies``, ``--beta-dependency-resolvers-configuration``, and ``--beta-use-biocontainers``. The first two of these are documented in depth in cwltool's README (https://github.com/common-workflow-language/cwltool/#leveraging-softwarerequirements-beta). Here I will quickly review a couple of the available options against test examples available in cwltool's ``tests`` directory using this branch of Toil. ``` git clone https://github.com/common-workflow-language/cwltool.git cd cwltool ``` From here we can quickly demonstrate installation and resolution of CWL ``SoftwareRequirement`` hints using Conda using the tests/seqtk_seq.cwl tool. This tool doesn't define an explicit ``DockerRequirement`` but does define the following ``SoftwareRequirement`` in its ``hints`` as follows: ``` hints: SoftwareRequirement: packages: - package: seqtk version: - r93 ``` We can try this tool out with ``cwltoil`` and see that by default we probably don't have the binary seqtk on our ``PATH`` and so the tool fails using the following command: ``` cwltoil tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` This should result in a tool execution failure. We can then instruct ``cwltoil`` to install the required package from Bioconda into an isolated environment and use it as needed by passing it the ``--beta-conda-dependencies`` flag as follows: ``` cwltoil --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` The tool should now be successful. The Conda support can be endless tweaked but the defaults are defaults that target the best practice Conda channels that work well for the Galaxy project. Additional ``SoftwareRequirement`` resolution options are available including targetting Software Modules, lmod, Homebrew, simple scripts called "Galaxy packages". All of these options can be specified and configured with a YAML file passed to cwltoil using the ``--beta-dependency-resolvers-configuration`` option instead of the simple shortcut ``--beta-conda-dependencies``. The cwltool documentation walks through a few examples of adapting infrastructure to tools and tools to package managers. Reference documentation is available in [galaxy-lib's documentation](http://galaxy-lib.readthedocs.io/en/latest/topics/dependency_resolution.html). In addition to options that allow configuring tool execution environments, containers themselves can be discovered and/or built from these software requirements. The [Biocontainers](https://github.com/BioContainers) project (previously Biodocker) contains a registry we use for this purpose. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io. There are over 3000 such containers currently. Continuing with the example above, the new `--beta-use-biocontainers` flag instructs ``cwltoil`` to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools). ``` cwltoil --beta-use-biocontainers tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across cwltool, cwltoil, Galaxy, and CLI - both inside and outside of containers.
This enables the reproducibilty stack described in [this preprint](https://www.biorxiv.org/content/early/2017/10/11/200683) and [presented at BOSC 2017](http://jmchilton.github.io/writing/bosc2017slides/biocontainers.html) under Toil. Concretely this enables all the same options in cwltoil as added to cwltool in common-workflow-language/cwltool#214 including `` --beta-conda-dependencies``, ``--beta-dependency-resolvers-configuration``, and ``--beta-use-biocontainers``. The first two of these are documented in depth in cwltool's README (https://github.com/common-workflow-language/cwltool/#leveraging-softwarerequirements-beta). Here I will quickly review a couple of the available options against test examples available in cwltool's ``tests`` directory using this branch of Toil. ``` git clone https://github.com/common-workflow-language/cwltool.git cd cwltool ``` From here we can quickly demonstrate installation and resolution of CWL ``SoftwareRequirement`` hints using Conda using the tests/seqtk_seq.cwl tool. This tool doesn't define an explicit ``DockerRequirement`` but does define the following ``SoftwareRequirement`` in its ``hints`` as follows: ``` hints: SoftwareRequirement: packages: - package: seqtk version: - r93 ``` We can try this tool out with ``cwltoil`` and see that by default we probably don't have the binary seqtk on our ``PATH`` and so the tool fails using the following command: ``` cwltoil tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` This should result in a tool execution failure. We can then instruct ``cwltoil`` to install the required package from Bioconda into an isolated environment and use it as needed by passing it the ``--beta-conda-dependencies`` flag as follows: ``` cwltoil --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` The tool should now be successful. The Conda support can be endless tweaked but the defaults are defaults that target the best practice Conda channels that work well for the Galaxy project. Additional ``SoftwareRequirement`` resolution options are available including targetting Software Modules, lmod, Homebrew, simple scripts called "Galaxy packages". All of these options can be specified and configured with a YAML file passed to cwltoil using the ``--beta-dependency-resolvers-configuration`` option instead of the simple shortcut ``--beta-conda-dependencies``. The cwltool documentation walks through a few examples of adapting infrastructure to tools and tools to package managers. Reference documentation is available in [galaxy-lib's documentation](http://galaxy-lib.readthedocs.io/en/latest/topics/dependency_resolution.html). In addition to options that allow configuring tool execution environments, containers themselves can be discovered and/or built from these software requirements. The [Biocontainers](https://github.com/BioContainers) project (previously Biodocker) contains a registry we use for this purpose. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io. There are over 3000 such containers currently. Continuing with the example above, the new `--beta-use-biocontainers` flag instructs ``cwltoil`` to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools). ``` cwltoil --beta-use-biocontainers tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across cwltool, cwltoil, Galaxy, and CLI - both inside and outside of containers.
This enables the reproducibilty stack described in [this preprint](https://www.biorxiv.org/content/early/2017/10/11/200683) and [presented at BOSC 2017](http://jmchilton.github.io/writing/bosc2017slides/biocontainers.html) under Toil. Concretely this enables all the same options in cwltoil as added to cwltool in common-workflow-language/cwltool#214 including `` --beta-conda-dependencies``, ``--beta-dependency-resolvers-configuration``, and ``--beta-use-biocontainers``. The first two of these are documented in depth in cwltool's README (https://github.com/common-workflow-language/cwltool/#leveraging-softwarerequirements-beta). Here I will quickly review a couple of the available options against test examples available in cwltool's ``tests`` directory using this branch of Toil. ``` git clone https://github.com/common-workflow-language/cwltool.git cd cwltool ``` From here we can quickly demonstrate installation and resolution of CWL ``SoftwareRequirement`` hints using Conda using the tests/seqtk_seq.cwl tool. This tool doesn't define an explicit ``DockerRequirement`` but does define the following ``SoftwareRequirement`` in its ``hints`` as follows: ``` hints: SoftwareRequirement: packages: - package: seqtk version: - r93 ``` We can try this tool out with ``cwltoil`` and see that by default we probably don't have the binary seqtk on our ``PATH`` and so the tool fails using the following command: ``` cwltoil tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` This should result in a tool execution failure. We can then instruct ``cwltoil`` to install the required package from Bioconda into an isolated environment and use it as needed by passing it the ``--beta-conda-dependencies`` flag as follows: ``` cwltoil --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` The tool should now be successful. The Conda support can be endless tweaked but the defaults are defaults that target the best practice Conda channels that work well for the Galaxy project. Additional ``SoftwareRequirement`` resolution options are available including targetting Software Modules, lmod, Homebrew, simple scripts called "Galaxy packages". All of these options can be specified and configured with a YAML file passed to cwltoil using the ``--beta-dependency-resolvers-configuration`` option instead of the simple shortcut ``--beta-conda-dependencies``. The cwltool documentation walks through a few examples of adapting infrastructure to tools and tools to package managers. Reference documentation is available in [galaxy-lib's documentation](http://galaxy-lib.readthedocs.io/en/latest/topics/dependency_resolution.html). In addition to options that allow configuring tool execution environments, containers themselves can be discovered and/or built from these software requirements. The [Biocontainers](https://github.com/BioContainers) project (previously Biodocker) contains a registry we use for this purpose. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io. There are over 3000 such containers currently. Continuing with the example above, the new `--beta-use-biocontainers` flag instructs ``cwltoil`` to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools). ``` cwltoil --beta-use-biocontainers tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across cwltool, cwltoil, Galaxy, and CLI - both inside and outside of containers.
This enables the reproducibilty stack described in [this preprint](https://www.biorxiv.org/content/early/2017/10/11/200683) and [presented at BOSC 2017](http://jmchilton.github.io/writing/bosc2017slides/biocontainers.html) under Toil. Concretely this enables all the same options in cwltoil as added to cwltool in common-workflow-language/cwltool#214 including `` --beta-conda-dependencies``, ``--beta-dependency-resolvers-configuration``, and ``--beta-use-biocontainers``. The first two of these are documented in depth in cwltool's README (https://github.com/common-workflow-language/cwltool/#leveraging-softwarerequirements-beta). Here I will quickly review a couple of the available options against test examples available in cwltool's ``tests`` directory using this branch of Toil. ``` git clone https://github.com/common-workflow-language/cwltool.git cd cwltool ``` From here we can quickly demonstrate installation and resolution of CWL ``SoftwareRequirement`` hints using Conda using the tests/seqtk_seq.cwl tool. This tool doesn't define an explicit ``DockerRequirement`` but does define the following ``SoftwareRequirement`` in its ``hints`` as follows: ``` hints: SoftwareRequirement: packages: - package: seqtk version: - r93 ``` We can try this tool out with ``cwltoil`` and see that by default we probably don't have the binary seqtk on our ``PATH`` and so the tool fails using the following command: ``` cwltoil tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` This should result in a tool execution failure. We can then instruct ``cwltoil`` to install the required package from Bioconda into an isolated environment and use it as needed by passing it the ``--beta-conda-dependencies`` flag as follows: ``` cwltoil --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` The tool should now be successful. The Conda support can be endless tweaked but the defaults are defaults that target the best practice Conda channels that work well for the Galaxy project. Additional ``SoftwareRequirement`` resolution options are available including targetting Software Modules, lmod, Homebrew, simple scripts called "Galaxy packages". All of these options can be specified and configured with a YAML file passed to cwltoil using the ``--beta-dependency-resolvers-configuration`` option instead of the simple shortcut ``--beta-conda-dependencies``. The cwltool documentation walks through a few examples of adapting infrastructure to tools and tools to package managers. Reference documentation is available in [galaxy-lib's documentation](http://galaxy-lib.readthedocs.io/en/latest/topics/dependency_resolution.html). In addition to options that allow configuring tool execution environments, containers themselves can be discovered and/or built from these software requirements. The [Biocontainers](https://github.com/BioContainers) project (previously Biodocker) contains a registry we use for this purpose. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io. There are over 3000 such containers currently. Continuing with the example above, the new `--beta-use-biocontainers` flag instructs ``cwltoil`` to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools). ``` cwltoil --beta-use-biocontainers tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across cwltool, cwltoil, Galaxy, and CLI - both inside and outside of containers.
This enables the reproducibilty stack described in [this preprint](https://www.biorxiv.org/content/early/2017/10/11/200683) and [presented at BOSC 2017](http://jmchilton.github.io/writing/bosc2017slides/biocontainers.html) under Toil. Concretely this enables all the same options in cwltoil as added to cwltool in common-workflow-language/cwltool#214 including `` --beta-conda-dependencies``, ``--beta-dependency-resolvers-configuration``, and ``--beta-use-biocontainers``. The first two of these are documented in depth in cwltool's README (https://github.com/common-workflow-language/cwltool/#leveraging-softwarerequirements-beta). Here I will quickly review a couple of the available options against test examples available in cwltool's ``tests`` directory using this branch of Toil. ``` git clone https://github.com/common-workflow-language/cwltool.git cd cwltool ``` From here we can quickly demonstrate installation and resolution of CWL ``SoftwareRequirement`` hints using Conda using the tests/seqtk_seq.cwl tool. This tool doesn't define an explicit ``DockerRequirement`` but does define the following ``SoftwareRequirement`` in its ``hints`` as follows: ``` hints: SoftwareRequirement: packages: - package: seqtk version: - r93 ``` We can try this tool out with ``cwltoil`` and see that by default we probably don't have the binary seqtk on our ``PATH`` and so the tool fails using the following command: ``` cwltoil tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` This should result in a tool execution failure. We can then instruct ``cwltoil`` to install the required package from Bioconda into an isolated environment and use it as needed by passing it the ``--beta-conda-dependencies`` flag as follows: ``` cwltoil --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` The tool should now be successful. The Conda support can be endless tweaked but the defaults are defaults that target the best practice Conda channels that work well for the Galaxy project. Additional ``SoftwareRequirement`` resolution options are available including targetting Software Modules, lmod, Homebrew, simple scripts called "Galaxy packages". All of these options can be specified and configured with a YAML file passed to cwltoil using the ``--beta-dependency-resolvers-configuration`` option instead of the simple shortcut ``--beta-conda-dependencies``. The cwltool documentation walks through a few examples of adapting infrastructure to tools and tools to package managers. Reference documentation is available in [galaxy-lib's documentation](http://galaxy-lib.readthedocs.io/en/latest/topics/dependency_resolution.html). In addition to options that allow configuring tool execution environments, containers themselves can be discovered and/or built from these software requirements. The [Biocontainers](https://github.com/BioContainers) project (previously Biodocker) contains a registry we use for this purpose. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io. There are over 3000 such containers currently. Continuing with the example above, the new `--beta-use-biocontainers` flag instructs ``cwltoil`` to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools). ``` cwltoil --beta-use-biocontainers tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across cwltool, cwltoil, Galaxy, and CLI - both inside and outside of containers.
Consider the included tool
seqtk_seq.cwl
. It includes the following SoftwareRequirement hint:I'm not happy that
version
is a list - but I can live with it for now I guess.If cwltool is executed with the hidden
--beta-conda-dependencies
flag, this requirement will be processed by galaxy-lib, Conda will be installed, and seqtk will be installed, and a Conda environment including seqtk will be setup for the job.Additional flags are available to configure dependency resolution in a more fine grained way - using Conda however has a number of advantages that make it particularily well suited to CWL. Conda packages are distributed as binaries that work across Mac and Linux and work on relatively old version of Linux (great for HPC). Conda also doesn't require root and supports installation of multiple different versions of a package - again these factors make it great for HPC and non-Docker targets.
The Biocontainers project (previously Biodocker) dovetails nicely with this. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io (assembled by @bgruening and colleagues). There are over 1800 such containers currently.
Continuing with the example above, the new
--beta-use-biocontainers
flag instructs cwltool to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools).These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across CWL, Galaxy, and CLI - both inside and outside of Docker.
My sincerest hope is that we move away from CWL-specific Dockerfiles. For less effort, a community bioconda package can be made and the result can be used in many more contexts. The Docker image will then be maintained by the community Biocontainer project.