A repository of common Informatics Matters Pipeline utilities shared between a number of Pipelines repositories. As well as containing a number of modules previously resident in the Pipelines repository this repository contains a Groovy-based test framework that interprets a relatively simple text-file format used to both describe and execute built-in tests.
You will need: -
- Java
- Conda
- Groovy (v2.4)
- Python
It is vital that you install and setup a Java installation (especially by also setting
JAVA_HOME
correctly) before you install groovy. Observed on Windows 7. If you do not groovy assumes a 32-bit environment and then cannot call into a 64-bit java.
Although the project is based on Gradle, which is Groovy-based,
you will need to install Groovy. We've tested this framework using Groovy
v2.4.11
.
The module utilities should support both Python 2 and 3 but we recommend any modules/pipelines you write support both flavours.
The test utility sets up the PIN
environment variable, which is set
to the data directory within your pipelines project
(<project-root>/src/data
). All input date files that you want to use for
your testing must reside in this directory.
Normally pipeline output files are written to a tmp
directory inside
the working copy of the pipelines-utils
repository you're running in.
Alternatively you can write test output to your own directory
(i.e. /tmp/blob
) using the environment variable POUT
: -
$ export POUT=/tmp/blob/
Output files are removed when the test suite starts and when it passes.
Some tests rely on the definition of the PROOT
environment variable.
If PROOT
is used it will have been defined in the the Dockerfile that
accompanies the pipelines project.
For shell (outside Docker) execution the test utility defines this for you.
PROOT
is set to the execution directory, typically
<project-root>/src/python
. If you have pipelines-specific files (Pickle
files and the like) tey should be located relative to this directory.
If you have working copies of all your pipeline repositories checked-out
in the same directory as this repository you can execute all the tests
across all the repositories by running the tester from here. Simply
run the following Gradle command from te root of the pipelines-utils
project: -
$ ./gradlew runPipelineTester
The tester summarises each test it has found wile also executing it. When tests fail it logs as much as it can and continues. When all the tests have finished it prints a summary including a lit of failed tests along with of the number of test files and individual tests that were executed: -
+------------------+
| Summary|
+------------------+
| Test directory| pipelines-utils
| Test directory| pipelines
+------------------+
| Test files| 29
| Tests found| 39
| Tests passed| 20
| Tests failed| -
| Tests skipped| 19
| Tests ignored| 3
| Tests excluded| -
+------------------+
| Result| SUCCESS
+------------------+
Fields in the summary should be self-explanatory but a couple might benefit from further explanation: -
Tests skipped
are the tests that were found but not executed. Normally these are the tests found during a Container test that can't be run (because the test has no corresponding container image).Tests ignored
are tests found that are not run because they have been marked for non-execution as the test name begins withignore_
.Tests excluded
are tests found that are not run because the operating system matches a regular expression in the test'sexlcude
declaration.
You can limit the tests that are executed by defining the parent directories that you want the tester to explore via the command-line.
For simple single argument execution you can run form within gradle. For fine control and multiple arguments you're better-off navigating to the PipelineTester implementation (
src/groovy
) and running it manually:groovy PipelineTester.groovy -s -o pipelines-sygnature
.
From gradle you can add the test directories as a comma-separated list
with the -o
option. To only run the tests in the pipelines
project your
command-line would look like this: -
$ ./gradlew runPipelineTester -Pptargs=-opipelines
Additionally, if you want to limit test execution further by naming
specific tests within a project you can added their names to the directory
with .
. For example, to only run the test test_x_100
in the pipelines
project: -
$ ./gradlew runPipelineTester -Pptargs=-opipelines.test_x_100
You can add more tests if you want by adding further comma-separated values.
To add test_v
in the pipleines_demo
project for example: -
$ ./gradlew runPipelineTester -Pptargs=-opipelines.test_x_100,pipelines-demo.test_v
To run all the tests in the pipelines
directory and the test_probe
test in the pipelines-2
project: -
$ ./gradlew runPipelineTester -Pptargs=-opipelines,pipelines-2.test_probe
You can run the pipeline tests in Docker using their expected container image (defined in the service descriptor). Doing this gives you added confidence that your pipeline will work wen deployed.
To execute the tests within their containers run the docker-specific Gradle task: -
$ ./gradlew runDockerPipelineTester
When you run in docker only the tests that can run in Docker (those with a defined image name) will be executed. Tests that cannot be executed in Docker will be skipped.
Ideally your tests will pass. When they don't the test framework prints the collected log to the screen as it happens but also keeps all the files generated (by all the tests) in case they are of use for diagnostics.
Files generated by the executed pipelines are temporarily written
to the src/groovy/tmp/PipelineTester
directory. You will find a directory
for every file and test combination. For example, if the test file
pbf_ev.test
contains the test test_pbf_ev_raw
any files it generates
will be found in the directory pbf_ev-test_pbf_ev_raw
.
You can re-direct the test output to an existing directory of your choice by
defining the Environment variable POUT
: -
$ export POUT=/tmp/my-output-dir/
Some important notes: -
- Files generated by the pipelines are removed when the tester is re-executed and when all the tests pass
- The files generated by the tests are not deleted until the test framework has finished. If your tests generate large output files you may need to make sure your disk can accommodate all the output from all the tests
- When you re-run the pipeline tester it removes any remaining collected output files
The PipelineTester
looks for files that have the extension .test
that
can be found in your pipelines project. Tests need to be placed in the
directory that contains the pipeline they're testing.
As well as copying existing test files you can use the pipeline.test.template
file, in this project's root, as a documented template file
in order to create a set of tests for a new pipeline.
At the moment the tester only permits one test file per pipeline so all tests for a given pipeline need to be composed in one file.
Windows is not currently supported btu the following may help you get started.
The tests and test framework are designed to operate from within a unix-like environment but if you are forced to execute pipeline tests from Windows the following approach is recommended: -
You will need:
- Java 8
- Conda
- Groovy (v2.4)
- Git-Bash
-
Install Java 8 (ideally the JDK) and define JAVA_HOME. This MUST be done before installing Groovy.
-
Install Git for Windows. This will give you a unix bash-like execution environment
-
From within the Git-bash shell navigate to your pipelines project.
-
Ensure that you can execute both Python and Groovy from within the Git-Bash shell (i.e.
python --version
andgroovy --version
work) -
From the pipelines project root enter your Conda environment with something like
source activate my-conda-env
. To run the pipelines tests your environment must contain the rdkit package. It can be installed with this command from within Conda...$ conda install -c rdkit rdkit
-
Install additional modules required by
pipelines-utils
but using its requirements file (which can be found in thepipelines-utils
sub-project): -$ pip install -r requirements.txt
With the above steps complete you should be able to execute the pipelines tester by navigating to the sub-module in your pipelines project: -
$ cd pipelines-utils
$ ./gradlew runPipelineTester
The pipeline utilities consist of a number of Python-based modules
that provide some common processing but are also distributed in the
pipeline container images. The distributed can be tested using setup.py
.
To test these modules run the following from the src/python
directory: -
$ pip install -r requirements.txt
$ python setup.py test
The utilities are automatically published to PyPI for easy installation.
We currently employ Travis to automate our builds and also to publish the
Python module to PyPI. The Travis CI/CD service (controlled by the .travis.yml
file) automatically publishes to PyPI when the repository is tagged (normally
on master).
In order to publish a new release of the module you need to: -
- Review the
src/python/READEME.rst
file (which is the source of the PyPi documentation). This rarely needs changing but it's worth understanding where the PyPI documentation comes from. - Tag the repository's master branch. It's the tagging that instructs Travis to publish the Python module to PyPI - it does not publish every commit.
Before tagging you must ensure that the tag you're using has not been used before. If you re-publish a package to PyPI it will respond with an error if the version's been published before. Once you have released version 1.0.0 you cannot release it again. Ever.
For acceptable version number formatting you should follow PEP-440.