-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance cwltoil to support SoftwareRequirements & BioContainers. #1757
Conversation
Can one of the admins verify this patch? |
1 similar comment
Can one of the admins verify this patch? |
Jenkins, okay to test |
Jenkins, test please |
Jenkins, test this please |
Looks like a single transient failure |
@jessebrennan, #1742 is still occuring |
@ejacox yes but this branch isn't up to date with master and hasn't merged the latest commit to fix it. From what I can tell, No branch yet has. |
I'll rebase. |
This commit enables all the same options in cwltoil as added to cwltool recently in common-workflow-language/cwltool#214 including `` --beta-conda-dependencies``, ``--beta-dependency-resolvers-configuration``, and ``--beta-use-biocontainers``. The first two of these are documented in depth in cwltool's README now (https://github.com/common-workflow-language/cwltool/#leveraging-softwarerequirements-beta). Here I will quickly review a couple of the available options against test examples available in cwltool's ``tests`` directory. I will assume cwltoil has been setup as configure in this branch and galaxy-lib installed in the same environment - for instance with ``pip install galaxy-lib``. Now lets grab the examples from cwltool... ``` git clone https://github.com/common-workflow-language/cwltool.git cd cwltool ``` From here we can quickly demonstrate installation and resolution of SoftwareRequirements using Conda using the tests/seqtk_seq.cwl tool. This tool doesn't define a DockerRequirement but does define the following SoftwareRequirements as ``hints`` as follows: ``` hints: SoftwareRequirement: packages: - package: seqtk version: - r93 ``` We can try this tool out with cwltoil and see that by default we probably don't have the binary seqtk on our ``PATH`` and so the tool fails using the following command: ``` cwltoil tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` This should result in a tool execution failure. We can then instruct cwltoil to install the required package from Bioconda into an isolated environment and use it as needed by passing it the ``--beta-conda-dependencies`` flag as follows: ``` cwltoil --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` The tool should now be successful. The Conda support can be endless tweaked but the defaults are defaults that target the best practice Conda channels that work well for the Galaxy project. Additional SoftwareRequirement resolution options are available including targetting Software Modules, Homebrew, simple scripts called "Galaxy packages". All of these options can be specified and configured with a YAML file passed to cwltoil using the ``--beta-dependency-resolvers-configuration`` option instead of the simple shortcut ``--beta-conda-dependencies``. The cwltool documentation walks through a few examples of adapting infrastructure to tools and tools to package managers. In addition to options that allow configuring tool execution environments, containers themselves can be discovered and/or built from these software requirements. The [Biocontainers](https://github.com/BioContainers) project (previously Biodocker) contains a registry we use for this purpose. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io. There are over 2300 such containers currently. Continuing with the example above, the new `--beta-use-biocontainers` flag instructs cwltoil to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools). ``` cwltoil --beta-use-biocontainers tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across cwltool, cwltoil, Galaxy, and CLI - both inside and outside of containers.
Jenkins, test this please |
Jenkins, test this please. |
Hey @jmchilton , the conformance tests don't pass with this branch 73 tests passed, 5 failures, 9 unsupported features--- Running conformance test v1.0 on /mnt/ephemeral/workspace/toil-pull-requests/venv/bin/cwltoil --- 3.9.0a1 Test [1/87] Test [2/87] Test [3/87] Test [4/87] Test [5/87] Test [6/87] Test [7/87] Test [8/87] Test [9/87] Test [10/87] Test [11/87] Test [12/87] Test [13/87] Test [14/87] Test [15/87] Test [16/87] Test [17/87] Test [18/87] Test [19/87] Test [20/87] Test [21/87] Test failed: /mnt/ephemeral/workspace/toil-pull-requests/venv/bin/cwltoil --outdir=/mnt/ephemeral/tmp/tmpsFWfaB --quiet v1.0/count-lines1-wf.cwl v1.0/wc-job.json Test two step workflow with imported tools Returned non-zero Exception in thread Thread-14: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs) File "/mnt/ephemeral/workspace/toil-pull-requests/src/toil/fileStore.py", line 1468, in asyncWrite raise RuntimeError("The termination flag is set, exiting") RuntimeError: The termination flag is set, exiting Exception in thread Thread-13: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs) File "/mnt/ephemeral/workspace/toil-pull-requests/src/toil/fileStore.py", line 1468, in asyncWrite raise RuntimeError("The termination flag is set, exiting") RuntimeError: The termination flag is set, exiting |
@mr-c Yes... interesting. My guess is that it is unrelated to my changes and is just the update to cwltool - but I don't really know. Do people mind if I create a little more noise and open a second PR that just updates cwltool and doesn't include the rest of my changes - that way we could know for sure if it is these changes or some other change in cwltool? |
@jmchilton go for it |
Jenkins add to whitelist |
hey @tetron , how does this relate to your changes? Would be good to synch on this. |
@benedictpaten it looks like the changes to cwltoil made by this PR are relatively small because it is just enabling the underlying functionality in cwltool, so the merge conflict is likely easy to resolve. However this isn't my thing (I haven't used this feature wouldn't know how to test it) so I'd look to @jmchilton or @mr-c to handle it. |
Hey @jmchilton - any update on this? Happy to push this through if possible. |
I may have screwed up the merge -- manually running the examples at https://github.com/common-workflow-language/cwltool#leveraging-softwarerequirements-beta doesn't work for me using I think the missing key is 62bdd2e#diff-ee480085fcce06066f78f98d1188e663R766 which was refactored out by @tetron 's v1.0.1 PR. |
Tests are passing now. Is this ready to merge? |
I'm not sure - I'd want to test it some more. I'm going to close it for now and I'll reopen once I have some time to work through the details again after the merge of 1.0.1 support. Thanks for being open to merging it though - that is really great news! |
OK. Thank you for the update @jmchilton |
This commit enables all the same options in cwltoil as added to cwltool recently in common-workflow-language/cwltool#214 including
--beta-conda-dependencies
,--beta-dependency-resolvers-configuration
, and--beta-use-biocontainers
. The first two of these are documented in depth in cwltool's README now (https://github.com/common-workflow-language/cwltool/#leveraging-softwarerequirements-beta).Here I will quickly review a couple of the available options against test examples available in cwltool's
tests
directory. I will assume cwltoil has been setup as configure in this branch and galaxy-lib installed in the same environment - for instance withpip install galaxy-lib
. Now lets grab the examples from cwltool...From here we can quickly demonstrate installation and resolution of SoftwareRequirements using Conda using the tests/seqtk_seq.cwl tool. This tool doesn't define a DockerRequirement but does define the following SoftwareRequirements as
hints
as follows:We can try this tool out with cwltoil and see that by default we probably don't have the binary seqtk on our
PATH
and so the tool fails using the following command:This should result in a tool execution failure. We can then instruct cwltoil to install the required package from Bioconda into an isolated environment and use it as needed by passing it the
--beta-conda-dependencies
flag as follows:The tool should now be successful.
The Conda support can be endless tweaked but the defaults are defaults that target the best practice Conda channels that work well for the Galaxy project. Additional SoftwareRequirement resolution options are available including targetting Software Modules, Homebrew, simple scripts called "Galaxy packages". All of these options can be specified and configured with a YAML file passed to cwltoil using the
--beta-dependency-resolvers-configuration
option instead of the simple shortcut--beta-conda-dependencies
. The cwltool documentation walks through a few examples of adapting infrastructure to tools and tools to package managers.In addition to options that allow configuring tool execution environments, containers themselves can be discovered and/or built from these software requirements. The Biocontainers project (previously Biodocker) contains a registry we use for this purpose. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io. There are over 2300 such containers currently.
Continuing with the example above, the new
--beta-use-biocontainers
flag instructs cwltoil to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools).These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across cwltool, cwltoil, Galaxy, and CLI - both inside and outside of containers.