Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak SoftwareRequirement handling for reuse in Toil. #456

Merged

Conversation

jmchilton
Copy link
Contributor

@jmchilton jmchilton commented Jul 7, 2017

With these changes I can get the requirement handling from #214 to work properly in simple toil tests (after a couple tweaks to toil itself). I'll mark this as WIP until #214 is merged - though comments on the last few commits are more than welcome.

There are three commits here:

  • aa1b7c9 - the main refactor which seems uniformly positive.
  • 5069580 - which fixes a bug introduced at the last second in Beta support for configurable dependency resolution & Biocontainers. #214.
  • 33d1e96 - contains a small tweak which might not be so great. It restores the builder parameter in single_job_executor on jobs before executing if it is supplied to the method. I'm guessing I'm somehow depending on builder later in the job lifecycle than the previous code paths did and I'm not sure if this is a bad thing?

jmchilton added a commit to jmchilton/toil that referenced this pull request Jul 7, 2017
@jmchilton
Copy link
Contributor Author

Does this look cleaner anyway? I suspect yes?
…s it.

This was required to get dependency management working with toil - but it looks really ugly so I suspect I am doing something suboptimially.
…22e88.

Not sure why it seemed like I needed that when testing the docs branch - clearly I hadn't originally and now it is broken after that change.
@jmchilton jmchilton changed the title [WIP] Tweak SoftwareRequirement handling for reuse in Toil. Tweak SoftwareRequirement handling for reuse in Toil. Jul 8, 2017
jmchilton added a commit to jmchilton/toil that referenced this pull request Jul 8, 2017
…ainers.

This work depends on galaxy-lib (available on PyPI) and currently unmerged cwltool enhancements from common-workflow-language/cwltool#456.

This commit enables all the same options in cwltoil as added to cwltool recently in common-workflow-language/cwltool#214 including `` --beta-conda-dependencies``, ``--beta-dependency-resolvers-configuration``, and ``--beta-use-biocontainers``. The first two of these are documented in depth in cwltool's README now (https://github.com/common-workflow-language/cwltool/#leveraging-softwarerequirements-beta).

Here I will quickly review a couple of the available options against test examples available in cwltool's ``tests`` directory. I will assume cwltoil has been setup as configure in this branch and galaxy-lib installed in the same environment - for instance with ``pip install galaxy-lib``. Now lets grab the examples from cwltool...

```
git clone https://github.com/common-workflow-language/cwltool.git
cd cwltool
```

From here we can quickly demonstrate installation and resolution of SoftwareRequirements using Conda using the tests/seqtk_seq.cwl tool. This tool doesn't define a DockerRequirement but does define the following SoftwareRequirements as ``hints`` as follows:

```
hints:
  SoftwareRequirement:
    packages:
    - package: seqtk
      version:
      - r93
```

We can try this tool out with cwltoil and see that by default we probably don't have the binary seqtk on our ``PATH`` and so the tool fails using the following command:

```
cwltoil tests/seqtk_seq.cwl tests/seqtk_seq_job.json
```

This should result in a tool execution failure. We can then instruct cwltoil to install the required package from Bioconda into an isolated environment and use it as needed by passing it the ``--beta-conda-dependencies`` flag as follows:

```
cwltoil --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json
```

The tool should now be successful.

The Conda support can be endless tweaked but the defaults are defaults that target the best practice Conda channels that work well for the Galaxy project. Additional SoftwareRequirement resolution options are available including targetting Software Modules, Homebrew, simple scripts called "Galaxy packages". All of these options can be specified and configured with a YAML file passed to cwltoil using the ``--beta-dependency-resolvers-configuration`` option instead of the simple shortcut ``--beta-conda-dependencies``. The cwltool documentation walks through a few examples of adapting infrastructure to tools and tools to package managers.

In addition to options that allow configuring tool execution environments, containers themselves can be discovered and/or built from these software requirements. The [Biocontainers](https://github.com/BioContainers) project (previously Biodocker) contains a registry we use for this purpose. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io. There are over 2300 such containers currently.

Continuing with the example above, the new `--beta-use-biocontainers` flag instructs cwltoil to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools).

```
cwltool --beta-use-biocontainers tests/seqtk_seq.cwl tests/seqtk_seq_job.json
```

These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across cwltool, cwltoil, Galaxy, and CLI - both inside and outside of containers.
jmchilton added a commit to jmchilton/toil that referenced this pull request Jul 8, 2017
…ainers.

This work depends on galaxy-lib (available on PyPI) and currently unmerged cwltool enhancements from common-workflow-language/cwltool#456.

This commit enables all the same options in cwltoil as added to cwltool recently in common-workflow-language/cwltool#214 including `` --beta-conda-dependencies``, ``--beta-dependency-resolvers-configuration``, and ``--beta-use-biocontainers``. The first two of these are documented in depth in cwltool's README now (https://github.com/common-workflow-language/cwltool/#leveraging-softwarerequirements-beta).

Here I will quickly review a couple of the available options against test examples available in cwltool's ``tests`` directory. I will assume cwltoil has been setup as configure in this branch and galaxy-lib installed in the same environment - for instance with ``pip install galaxy-lib``. Now lets grab the examples from cwltool...

```
git clone https://github.com/common-workflow-language/cwltool.git
cd cwltool
```

From here we can quickly demonstrate installation and resolution of SoftwareRequirements using Conda using the tests/seqtk_seq.cwl tool. This tool doesn't define a DockerRequirement but does define the following SoftwareRequirements as ``hints`` as follows:

```
hints:
  SoftwareRequirement:
    packages:
    - package: seqtk
      version:
      - r93
```

We can try this tool out with cwltoil and see that by default we probably don't have the binary seqtk on our ``PATH`` and so the tool fails using the following command:

```
cwltoil tests/seqtk_seq.cwl tests/seqtk_seq_job.json
```

This should result in a tool execution failure. We can then instruct cwltoil to install the required package from Bioconda into an isolated environment and use it as needed by passing it the ``--beta-conda-dependencies`` flag as follows:

```
cwltoil --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json
```

The tool should now be successful.

The Conda support can be endless tweaked but the defaults are defaults that target the best practice Conda channels that work well for the Galaxy project. Additional SoftwareRequirement resolution options are available including targetting Software Modules, Homebrew, simple scripts called "Galaxy packages". All of these options can be specified and configured with a YAML file passed to cwltoil using the ``--beta-dependency-resolvers-configuration`` option instead of the simple shortcut ``--beta-conda-dependencies``. The cwltool documentation walks through a few examples of adapting infrastructure to tools and tools to package managers.

In addition to options that allow configuring tool execution environments, containers themselves can be discovered and/or built from these software requirements. The [Biocontainers](https://github.com/BioContainers) project (previously Biodocker) contains a registry we use for this purpose. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io. There are over 2300 such containers currently.

Continuing with the example above, the new `--beta-use-biocontainers` flag instructs cwltoil to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools).

```
cwltoil --beta-use-biocontainers tests/seqtk_seq.cwl tests/seqtk_seq_job.json
```

These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across cwltool, cwltoil, Galaxy, and CLI - both inside and outside of containers.
@jmchilton
Copy link
Contributor Author

Did some more hacking on this and I have updated the description and taken this out of WIP.

I did some more testing on the Toil branch as well and updated the commit (jmchilton/toil@4788dc7) - it looks like now all the options including BioContainer support that work in cwltool also work in cwltoil. Descriptions and examples included in that commit message.

@mr-c mr-c merged commit e9c6a38 into common-workflow-language:master Jul 8, 2017
@mr-c
Copy link
Member

mr-c commented Jul 8, 2017

Thanks!

jmchilton added a commit to jmchilton/toil that referenced this pull request Jul 8, 2017
…ainers.

This work depends on galaxy-lib (available on PyPI) and currently unmerged cwltool enhancements from common-workflow-language/cwltool#456.

This commit enables all the same options in cwltoil as added to cwltool recently in common-workflow-language/cwltool#214 including `` --beta-conda-dependencies``, ``--beta-dependency-resolvers-configuration``, and ``--beta-use-biocontainers``. The first two of these are documented in depth in cwltool's README now (https://github.com/common-workflow-language/cwltool/#leveraging-softwarerequirements-beta).

Here I will quickly review a couple of the available options against test examples available in cwltool's ``tests`` directory. I will assume cwltoil has been setup as configure in this branch and galaxy-lib installed in the same environment - for instance with ``pip install galaxy-lib``. Now lets grab the examples from cwltool...

```
git clone https://github.com/common-workflow-language/cwltool.git
cd cwltool
```

From here we can quickly demonstrate installation and resolution of SoftwareRequirements using Conda using the tests/seqtk_seq.cwl tool. This tool doesn't define a DockerRequirement but does define the following SoftwareRequirements as ``hints`` as follows:

```
hints:
  SoftwareRequirement:
    packages:
    - package: seqtk
      version:
      - r93
```

We can try this tool out with cwltoil and see that by default we probably don't have the binary seqtk on our ``PATH`` and so the tool fails using the following command:

```
cwltoil tests/seqtk_seq.cwl tests/seqtk_seq_job.json
```

This should result in a tool execution failure. We can then instruct cwltoil to install the required package from Bioconda into an isolated environment and use it as needed by passing it the ``--beta-conda-dependencies`` flag as follows:

```
cwltoil --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json
```

The tool should now be successful.

The Conda support can be endless tweaked but the defaults are defaults that target the best practice Conda channels that work well for the Galaxy project. Additional SoftwareRequirement resolution options are available including targetting Software Modules, Homebrew, simple scripts called "Galaxy packages". All of these options can be specified and configured with a YAML file passed to cwltoil using the ``--beta-dependency-resolvers-configuration`` option instead of the simple shortcut ``--beta-conda-dependencies``. The cwltool documentation walks through a few examples of adapting infrastructure to tools and tools to package managers.

In addition to options that allow configuring tool execution environments, containers themselves can be discovered and/or built from these software requirements. The [Biocontainers](https://github.com/BioContainers) project (previously Biodocker) contains a registry we use for this purpose. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io. There are over 2300 such containers currently.

Continuing with the example above, the new `--beta-use-biocontainers` flag instructs cwltoil to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools).

```
cwltoil --beta-use-biocontainers tests/seqtk_seq.cwl tests/seqtk_seq_job.json
```

These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across cwltool, cwltoil, Galaxy, and CLI - both inside and outside of containers.
jmchilton added a commit to jmchilton/toil that referenced this pull request Jul 8, 2017
…ners.

This work depends on galaxy-lib (available on PyPI) and currently unmerged cwltool enhancements from common-workflow-language/cwltool#456.

This commit enables all the same options in cwltoil as added to cwltool recently in common-workflow-language/cwltool#214 including `` --beta-conda-dependencies``, ``--beta-dependency-resolvers-configuration``, and ``--beta-use-biocontainers``. The first two of these are documented in depth in cwltool's README now (https://github.com/common-workflow-language/cwltool/#leveraging-softwarerequirements-beta).

Here I will quickly review a couple of the available options against test examples available in cwltool's ``tests`` directory. I will assume cwltoil has been setup as configure in this branch and galaxy-lib installed in the same environment - for instance with ``pip install galaxy-lib``. Now lets grab the examples from cwltool...

```
git clone https://github.com/common-workflow-language/cwltool.git
cd cwltool
```

From here we can quickly demonstrate installation and resolution of SoftwareRequirements using Conda using the tests/seqtk_seq.cwl tool. This tool doesn't define a DockerRequirement but does define the following SoftwareRequirements as ``hints`` as follows:

```
hints:
  SoftwareRequirement:
    packages:
    - package: seqtk
      version:
      - r93
```

We can try this tool out with cwltoil and see that by default we probably don't have the binary seqtk on our ``PATH`` and so the tool fails using the following command:

```
cwltoil tests/seqtk_seq.cwl tests/seqtk_seq_job.json
```

This should result in a tool execution failure. We can then instruct cwltoil to install the required package from Bioconda into an isolated environment and use it as needed by passing it the ``--beta-conda-dependencies`` flag as follows:

```
cwltoil --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json
```

The tool should now be successful.

The Conda support can be endless tweaked but the defaults are defaults that target the best practice Conda channels that work well for the Galaxy project. Additional SoftwareRequirement resolution options are available including targetting Software Modules, Homebrew, simple scripts called "Galaxy packages". All of these options can be specified and configured with a YAML file passed to cwltoil using the ``--beta-dependency-resolvers-configuration`` option instead of the simple shortcut ``--beta-conda-dependencies``. The cwltool documentation walks through a few examples of adapting infrastructure to tools and tools to package managers.

In addition to options that allow configuring tool execution environments, containers themselves can be discovered and/or built from these software requirements. The [Biocontainers](https://github.com/BioContainers) project (previously Biodocker) contains a registry we use for this purpose. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io. There are over 2300 such containers currently.

Continuing with the example above, the new `--beta-use-biocontainers` flag instructs cwltoil to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools).

```
cwltoil --beta-use-biocontainers tests/seqtk_seq.cwl tests/seqtk_seq_job.json
```

These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across cwltool, cwltoil, Galaxy, and CLI - both inside and outside of containers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants