Skip to content

Commit

Permalink
v0.4.0 (#150)
Browse files Browse the repository at this point in the history
* Act and Edit are no-ops on inactive maps (#155)

* resolves #145

* Improvements to the map.stderr/stdout API (#149)

* Add htmap-exec Docker image and change default image to it (#153)

* move test infrastructure into tests dir

* add htmap-exec image

* updates docs

* Transferring Arbitrary Output Files (#151)
  • Loading branch information
JoshKarpel committed May 25, 2019
1 parent 1af6a5d commit ac280bd
Show file tree
Hide file tree
Showing 34 changed files with 1,001 additions and 582 deletions.
4 changes: 2 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ matrix:
fast_finish: true

install:
- travis_retry docker build -t htmap-test --file docker/Dockerfile --build-arg HTCONDOR_VERSION --build-arg PYTHON_VERSION=$TRAVIS_PYTHON_VERSION .
- docker build -t htmap-test --file tests/_inf/Dockerfile --build-arg HTCONDOR_VERSION --build-arg PYTHON_VERSION=$TRAVIS_PYTHON_VERSION .

script:
- travis_retry docker run htmap-test tests/travis.sh
- docker run htmap-test tests/_inf/travis.sh
3 changes: 3 additions & 0 deletions binder/.htmaprc
Original file line number Diff line number Diff line change
@@ -1 +1,4 @@
DELIVERY_METHOD = "assume"

[MAP_OPTIONS]
REQUEST_DISK = "100MB"
15 changes: 15 additions & 0 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,14 @@ See :ref:`error_handling` for more details on error handling.
.. autoclass:: htmap.ComponentStatus
:members:

.. autoclass:: htmap.MapStdOut
:members: get

.. autoclass:: htmap.MapStdErr
:members: get

.. autoclass:: htmap.MapOutputFiles
:members: get

.. _error_handling:

Expand Down Expand Up @@ -146,6 +154,13 @@ Input File Transfer

.. autoclass:: htmap.TransferPath


Output File Transfer
--------------------

.. autofunction:: htmap.transfer_output_files


Checkpointing
-------------

Expand Down
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
author = 'HTCondor Team'

# The short X.Y version
version = htmap.__version__[:5]
version = htmap.__version__
# The full version, including alpha/beta/rc tags
release = htmap.__version__

Expand Down
15 changes: 8 additions & 7 deletions docs/source/dependencies.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ it's all of the execute nodes in the pool that your might map components might b
Submit-side dependency management can be handled using standard Python package management tools.
We recommend using ``miniconda`` as your package manager (https://docs.conda.io/en/latest/miniconda.html).

HTMap itself requires that execute-side can run a Python script using a Python install that has the module ``cloudpickle`` installed.
HTMap itself requires that execute-side can run a Python script using a Python install that also has ``htmap`` installed.
That Python installation also needs whatever other packages your code needs to run.
For example, if you ``import numpy`` in your code, you need to have ``numpy`` installed execute-side.

Expand All @@ -33,7 +33,8 @@ The built-in delivery methods are

More details on each of these methods can be found below.

The default delivery method is ``docker``, with image ``continuumio/anaconda3:latest``.
The default delivery method is ``docker``, with the default image ``htcondor/htmap-exec:<version>``,
where version will match the version of HTMap you are using submit-side.
If your pool can run Docker jobs and your Python code does not depend on any custom packages
(i.e., you never import any modules that you wrote yourself),
this default behavior will likely work for you without requiring any changes.
Expand Down Expand Up @@ -73,8 +74,8 @@ At runtime:
htmap.settings['DOCKER.IMAGE'] = "<repository>/<image>:<tag>"
In this mode, HTMap will run inside a Docker image that you provide.
Remember that this Docker image needs to have the ``cloudpickle`` module installed.
The default Docker image is `continuumio/anaconda3:latest <https://hub.docker.com/r/continuumio/anaconda3/>`_,
Remember that this Docker image needs to have the ``htmap`` module installed.
The default Docker image is `htcondor/htmap-exec <https://hub.docker.com/r/htcondor/htmap-exec/>`_,
which is based on Python 3 and has many useful packages pre-installed.

If you want to use your own Docker image, just change the ``'DOCKER.IMAGE'`` setting.
Expand All @@ -83,11 +84,11 @@ For example, a very simple Dockerfile that can be used with HTMap is

.. code-block:: docker
FROM python:latest
FROM python:3
RUN pip install --no-cache-dir cloudpickle
RUN pip install --no-cache-dir htmap
This would create a Docker image with the latest version of Python and ``cloudpickle`` installed.
This would create a Docker image with the latest versions of Python 3 and ``htmap`` installed.
From here you could install more Python dependencies, or add more layers to account for other dependencies.

.. attention::
Expand Down
6 changes: 3 additions & 3 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,15 +34,15 @@ Happy mapping!
:doc:`dependencies`
Information about how to manage your what your code depends on (e.g., other Python packages).

:doc:`recipes`
Deeper dives on specific tasks.

:doc:`api`
Public API documentation.

:doc:`settings`
Documentation for the various settings.

:doc:`recipes`
Deeper dives on specific, common tasks.

:doc:`tips-and-tricks`
Useful code snippets, tips, and tricks.

Expand Down
6 changes: 5 additions & 1 deletion docs/source/recipes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,10 @@ Recipes
:doc:`recipes/docker-image-cookbook`
How to build HTMap-compatible Docker images.
Yes, this recipe is an entire cookbook!
Yes, this single recipe is an entire cookbook!

:doc:`recipes/output-files`
How to move arbitrary files back to the submit node.

:doc:`recipes/wrapping-external-programs`
How to send input and output to an external (i.e., non-Python) program from inside a mapped function.
Expand All @@ -19,5 +22,6 @@ Recipes
:hidden:

recipes/docker-image-cookbook
recipes/output-files
recipes/wrapping-external-programs
recipes/checkpointing-maps
4 changes: 0 additions & 4 deletions docs/source/recipes/checkpointing-maps.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,6 @@
Checkpointing Maps
------------------

.. attention::

To use this feature, HTMap itself must be installed in your execute environment (not just ``cloudpickle``).

When running on opportunistic resources, HTCondor might "evict" your map components from the execute locations.
Evicted components return to the queue and, without your intervention, restart from scratch.
However, HTMap can preserve files across an eviction and make them available in the next run.
Expand Down
85 changes: 66 additions & 19 deletions docs/source/recipes/docker-image-cookbook.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,20 @@ are installed on the computers your code actually runs on.

To use Docker, you write a **Dockerfile** which tells Docker how to generate an **image**,
which is a blueprint to construct a **container**.
The Dockerfile is a list of instructions, such as shell commands or instructions for Docker to copy files from the build environment into the image.
The Dockerfile is a list of instructions, such as shell commands or instructions
for Docker to copy files from the build environment into the image.
You then tell Docker to "build" the image from the Dockerfile.

For use with HTMap, you then upload this image to `Docker Hub <https://hub.docker.com>`_, where it can then be downloaded to execute nodes in an HTCondor pool.
When your HTMap component lands on an execute node, HTCondor will download your image from Docker Hub and run your code inside it using HTMap.
For use with HTMap, you then upload this image to `Docker Hub <https://hub.docker.com>`_,
where it can then be downloaded to execute nodes in an HTCondor pool.
When your HTMap component lands on an execute node, HTCondor will download your
image from Docker Hub and run your code inside it using HTMap.

The following sections describe, roughly in order of increasing complexity, different ways to build Docker images for use with HTMap.
The following sections describe, roughly in order of increasing complexity,
different ways to build Docker images for use with HTMap.
Each level of complexity is introduced to solve a more advanced dependency management problem.
We recommend reading them in order until reach one that works for your dependencies (each section assumes knowledge of the previous sections).
We recommend reading them in order until reach one that works for your dependencies
(each section assumes knowledge of the previous sections).

More detailed information on how Dockerfiles work can be found
`in the Docker documentation itself <https://docs.docker.com/engine/reference/builder/>`_
Expand All @@ -37,10 +42,11 @@ This page only covers the bare minimum to get started with HTMap and Docker.
Can I use HTMap's default image?
--------------------------------

HTMap's default Docker image is `continuumio/anaconda3:latest <https://hub.docker.com/r/continuumio/anaconda3/>`_.
HTMap's default Docker image is `htcondor/htmap-exec <https://hub.docker.com/r/htcondor/htmap-exec/>`_,
which is itself based on`continuumio/anaconda3 <https://hub.docker.com/r/continuumio/anaconda3/>`_.
It is based on Python 3 and has many useful packages pre-installed, such as ``numpy``, ``scipy``, and ``pandas``.
If your software only depends on packages included in the `Anaconda distribution <https://docs.anaconda.com/anaconda/packages/pkg-docs/>`_ by default,
you can use HTMap's default and won't need to create your own image.
If your software only depends on packages included in the `Anaconda distribution <https://docs.anaconda.com/anaconda/packages/pkg-docs/>`_,
you can use HTMap's default image and won't need to create your own.


I depend on Python packages that aren't in the Anaconda distribution
Expand All @@ -52,13 +58,14 @@ I depend on Python packages that aren't in the Anaconda distribution
and `make an account on Docker Hub <https://hub.docker.com/>`_.


Let's pretend that there's a package called ``foobar`` that your Python code depends on, but isn't part of the Anaconda distribution.
Let's pretend that there's a package called ``foobar`` that your Python function depends on,
but isn't part of the Anaconda distribution.
You will need to write your own Dockerfile to include this package in your Docker image.

Docker images are built in **layers**.
You always start a Dockerfile by stating which existing Docker image you'd like to use as your base layer.
A good choice is the same Anaconda image that HTMap uses as the default,
which comes with both the ``conda`` package manager and the standard ``pip``.
which comes with both the ``conda`` package manager and the standard ``pip``.
Create a file named ``Dockerfile`` and write this into it:

.. code-block:: docker
Expand All @@ -67,18 +74,41 @@ Create a file named ``Dockerfile`` and write this into it:
FROM continuumio/anaconda3:latest
Lines that begin with a ``#`` are comments in a Dockerfile.
RUN pip install --no-cache-dir htmap
ARG USER=htmap
RUN groupadd ${USER} \
&& useradd -m -g ${USER} ${USER}
USER ${USER}
Each line in the Dockerfile starts with a short, capitalized word which tells Docker what kind of build instruction it is.
``FROM`` means "start with this base image".
Now we need to tell Docker to run a shell command during the build to install ``foobar``.

* ``FROM`` means "start with this base image".
* ``RUN`` means "execute these shell commands in the container".
* ``ARG`` means "set build argument" - it acts like an environment variable that's only set during the image build.

Lines that begin with a ``#`` are comments in a Dockerfile.
The above lines say that we want to inherit from the image ``continuumio/anaconda3:latest`` and build on top of it.
To be compatible with HTMap, we install ``htmap`` via ``pip``.
We also set up a non-root user to do the execution, which is important for security.
Naming that user ``htmap`` is arbitrary and has nothing to do with the ``htmap`` package itself.

Now we need to tell Docker to run a shell command during the build to install ``foobar``
by adding one more line to the bottom of the Dockerfile.

.. code-block:: docker
# Dockerfile
FROM continuumio/anaconda3:latest
RUN pip install --no-cache-dir htmap
ARG USER=htmap
RUN groupadd ${USER} \
&& useradd -m -g ${USER} ${USER}
USER ${USER}
# if foobar can be install via conda, use these lines
RUN conda install -y foobar \
&& conda clean -y --all
Expand All @@ -101,6 +131,13 @@ If you need install many packages, we recommend writing a ``requirements.txt`` f
FROM continuumio/anaconda3:latest
RUN pip install --no-cache-dir htmap
ARG USER=htmap
RUN groupadd ${USER} \
&& useradd -m -g ${USER} ${USER}
USER ${USER}
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
Expand Down Expand Up @@ -153,10 +190,13 @@ Instead of using the full Anaconda distribution, use a base Docker image that on
FROM continuumio/miniconda3:latest
RUN conda install -y cloudpickle \
&& conda clean -y -all
RUN pip install --no-cache-dir htmap
ARG USER=htmap
RUN groupadd ${USER} \
&& useradd -m -g ${USER} ${USER}
USER ${USER}
Note that we need to install ``cloudpickle``, which HTMap depends on execute-side, ourselves.
From here, install your particular dependencies as above.

If you prefer to not use ``conda``, an even-barer-bones image could be produced from
Expand All @@ -167,8 +207,14 @@ If you prefer to not use ``conda``, an even-barer-bones image could be produced
FROM python:latest
RUN pip install --no-cache-dir cloudpickle
RUN pip install --no-cache-dir htmap
ARG USER=htmap
RUN groupadd ${USER} \
&& useradd -m -g ${USER} ${USER}
USER ${USER}
We use ``python:latest`` as our base image, so we don't have ``conda`` anymore.

I want to use a Python package that's not on PyPI or Anaconda
-------------------------------------------------------------
Expand Down Expand Up @@ -225,8 +271,9 @@ We recommend adding ``miniconda`` to the image by adding these lines to your Doc
&& conda install python=${PYTHON_VERSION} \
&& conda clean -y -all
After this, you can install any other Python packages you need as in the preceeding sections.
After this, you can install HTMap and any other Python packages you need as in the preceeding sections.

Note that in this example we based the image on Ubuntu's base image and installed ``wget``,
which we used to download the ``miniconda`` installer.
Depending on your base image, you may need to use a different package manager (for example, ``yum``) or different command-line file download tool (for example, ``curl``).
Depending on your base image, you may need to use a different package manager
(for example, ``yum``) or different command-line file download tool (for example, ``curl``).
90 changes: 90 additions & 0 deletions docs/source/recipes/output-files.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
.. py:currentmodule:: htmap
Output Files
------------

If the "output" of your map function is a file, HTMap's
basic functionality will not be sufficient for you.
As a toy example, consider a function which takes a string and a number, and
writes out a file containing that string repeated that number of times, with
a space between each repetition.
The file itself will be the output artifact of our function.

.. code-block:: python
import htmap
import itertools
from pathlib import Path
@htmap.mapped
def repeat(string, number):
output_path = Path('repeated.txt')
with output_path.open(mode = 'w') as f:
f.write(' '.join(itertools.repeat(string, number)))
This would work great locally, producing a file named ``repeated.txt`` in
the directory we ran the code from.
If this same code runs execute-side, the file will still be produced, but
HTMap won't know that we care about the file.
In fact, the map will appear to be spectacularly useless:

.. code-block:: python
with repeat.build_map() as mb:
mb('foo', 5)
mb('wiz', 3)
mb('bam', 2)
repeated = mb.map
print(list(repeated))
# [None, None, None]
A function with no ``return`` statement implicitly returns ``None``.
There's no sign of our output file.

We need to tell HTMap that we are producing an output file.
We can do this by adding a call to an HTMap hook function in our mapped function:

.. code-block:: python
import htmap
import itertools
from pathlib import Path
@htmap.mapped
def repeat(string, number):
output_path = Path('repeated.txt')
with output_path.open(mode = 'w') as f:
f.write(' '.join(itertools.repeat(string, number)))
htmap.transfer_output_files(output_path) # identical, except for this line
The :func:`htmap.transfer_output_files` function tells HTMap to move the files
at the given paths back for us.
We can then access those files using the :attr:`Map.output_files` attribute,
which behaves like a sequence indexed by component numbers.
The elements of the sequence are :class:`pathlib.Path` pointing to the
directories containing the output files from each component, like so:

.. code-block:: python
with repeat.build_map() as mb:
mb('foo', 5)
mb('wiz', 3)
mb('bam', 2)
repeated = mb.map
for component, base in enumerate(repeated.output_files):
path = base / 'repeated.txt'
print(component, path.read_text())
# 0 foo foo foo foo foo
# 1 wiz wiz wiz
# 2 bam bam
Loading

0 comments on commit ac280bd

Please sign in to comment.