Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add auxiliary data download API #1513

Merged
merged 16 commits into from
Feb 12, 2021
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions doc/source/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,21 @@ defaults to a different path depending on your operating system following the
`appdirs <https://github.com/ActiveState/appdirs#some-example-output>`_
"user data dir".

.. _download_aux_setting:

Download Auxiliary Data
^^^^^^^^^^^^^^^^^^^^^^^

* **Environment variable**: ``SATPY_DOWNLOAD_AUX``
* **YAML/Config Key**: ``download_aux``
* **Default**: True

Whether to allow downloading of auxiliary files for certain Satpy operations.
See :doc:`dev_guide/aux_data` for more information. If ``True`` then Satpy
will download and cache any necessary data files to :ref:`data_dir_setting`
when needed. If ``False`` then pre-downloaded files will be used, but any
other files will not be downloaded or checked for validity.

.. _component_configuration:

Component Configuration
Expand Down
122 changes: 122 additions & 0 deletions doc/source/dev_guide/aux_data.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
Auxiliary Data Download
=======================

Sometimes Satpy components need some extra data files to get their work
done properly. These include files like Look Up Tables (LUTs), coefficients,
or Earth model data (ex. elevations). This includes any file that would be too
large to be included in the Satpy python package; anything bigger than a small
text file. To help with this, Satpy includes utilities for downloading and
caching these files only when your component is used. This saves the user from
wasting time and disk space downloading files they may never use.
This functionality is made possible thanks to the
`Pooch library <https://www.fatiando.org/pooch/latest/>`_.

Downloaded files are stored in the directory configured by
:ref:`data_dir_setting`.

Adding download functionality
-----------------------------

The utility functions for data downloading include a two step process:

1. **Registering**: Tell Satpy what files might need to be downloaded and used
later.
2. **Retrieving**: Ask Satpy to download and store the files locally.

Registering
^^^^^^^^^^^

Registering a file for downloading tells Satpy the remote URL for the file,
and an optional hash. The hash is used to verify a successful download.
Registering can also include a ``filename`` to tell Satpy what to name the
file when it is downloaded. If not provided it will be determined from the URL.
Once registered, Satpy can be told to retrieve the file (see below) by using a
"cache key". Cache keys follow the general scheme of
``<component_type>/<filename>`` (ex. ``readers/README.rst``).

Satpy includes a low-level function and a high-level Mixin class for
registering files. The higher level class is recommended for any Satpy
component like readers, writers, and compositors. The lower-level
:func:`~satpy.data_download.register_file` function can be used for any other
use case.

The :class:`~satpy.data_download.DataMixIn` class is automatically included
in the :class:`~satpy.readers.yaml_reader.FileYAMLReader` and
:class:`~satpy.writers.Writer` base classes. For any other component (like
a compositor) you should include it as another parent class:

.. code-block:: python

from satpy.data_download import DataDownloadMixin
from satpy.composites import GenericCompositor

class MyCompositor(GenericCompositor, DataDownloadMixin):
"""Compositor that uses downloaded files."""

def __init__(self, name, url=None, known_hash=None, **kwargs):
super().__init__(name, **kwargs)
data_files = [{'url': url, 'known_hash': known_hash}]
self.register_data_files(data_files)

However your code registers files, to be consistent it must do it during
initialization so that the :func:`~satpy.data_download.find_registerable_files`.
If your component isn't a reader, writer, or compositor then this function
will need to be updated to find and load your registered files. See
:ref:`offline_aux_downloads` below for more information.

As mentioned, the mixin class is included in the base reader and writer class.
To register files in these cases, include a ``data_files`` section in your
YAML configuration file. For readers this would go under the ``reader``
section and for writers the ``writer`` section. This parameter is a list
of dictionaries including a ``url``, ``known_hash``, and optional
djhoese marked this conversation as resolved.
Show resolved Hide resolved
``filename``. For example::

reader:
name: abi_l1b
short_name: ABI L1b
long_name: GOES-R ABI Level 1b
... other metadata ...
data_files:
- url: "https://example.com/my_data_file.dat"
- url: "https://raw.githubusercontent.com/pytroll/satpy/master/README.rst"
known_hash: "sha256:5891286b63e7745de08c4b0ac204ad44cfdb9ab770309debaba90308305fa759"
- url: "https://raw.githubusercontent.com/pytroll/satpy/master/RELEASING.md"
filename: "satpy_releasing.md"
known_hash: null

See the :class:`~satpy.data_download.DataDownloadMixin` for more information.

Retrieving
^^^^^^^^^^

Files that have been registered (see above) can be retrieved by calling the
:func:`~satpy.data_download.retrieve` function. This function expects a single
argument: the cache key. Cache keys are returned by registering functions, but
can also be pre-determined by following the scheme
``<component_type>/<filename>`` (ex. ``readers/README.rst``).
Retrieving a file will download it to local disk if needed and then return
the local pathname. Data is stored locally in the :ref:`data_dir_setting`.
It is up to the caller to then open the file.

.. _offline_aux_downloads:

Offline Downloads
-----------------

To assist with operational environments, Satpy includes a
:func:`~satpy.data_download.retrieve_all` function that will try to find all
files that Satpy components may need to download in the future and download
them to the current directory specified by :ref:`data_dir_setting`.
This function allows you to specify a list of ``readers``, ``writers``, or
``composite_sensors`` to limit what components are checked for files to
download.

The ``retrieve_all`` function is also available through a command line script
called ``satpy_retrieve_all``. Run the following for usage information.

.. code-block:: bash

satpy_retrieve_all --help
djhoese marked this conversation as resolved.
Show resolved Hide resolved

To make sure that no additional files are downloaded when running Satpy see
:ref:`download_aux_setting`.
10 changes: 9 additions & 1 deletion doc/source/dev_guide/custom_reader.rst
Original file line number Diff line number Diff line change
Expand Up @@ -571,4 +571,12 @@ One way of implementing a file handler is shown below:
# left as an exercise to the reader :)

If you have any questions, please contact the
:ref:`Satpy developers <dev_help>`.
:ref:`Satpy developers <dev_help>`.

Auxiliary File Download
-----------------------

If your reader needs additional data files to do calibrations, corrections,
or anything else see the :doc:`aux_data` document for more information on
how to download and cache these files without including them in the Satpy
python package.
1 change: 1 addition & 0 deletions doc/source/dev_guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ at the pages listed below.
custom_reader
plugins
satpy_internals
aux_data

Coding guidelines
=================
Expand Down
1 change: 1 addition & 0 deletions satpy/_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
'cache_dir': _satpy_dirs.user_cache_dir,
'data_dir': _satpy_dirs.user_data_dir,
'config_path': [],
'download_aux': True,
}

# Satpy main configuration object
Expand Down
24 changes: 11 additions & 13 deletions satpy/composites/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@

from satpy.dataset import DataID, combine_metadata
from satpy.dataset.dataid import minimal_default_keys_config
from satpy.data_download import DataDownloadMixin
from satpy.writers import get_enhanced_image


Expand Down Expand Up @@ -970,14 +971,11 @@ def __call__(self, projectables, *args, **kwargs):
*args, **kwargs)


class StaticImageCompositor(GenericCompositor):
class StaticImageCompositor(GenericCompositor, DataDownloadMixin):
"""A compositor that loads a static image from disk.

If the filename passed to this compositor is not valid then
the SATPY_ANCPATH environment variable will be checked to see
if the image is located there
Environment variables in the filename are automatically expanded.

Environment variables in the filename are automatically expanded
"""

def __init__(self, name, filename=None, url=None, known_hash=None, area=None,
Expand Down Expand Up @@ -1015,7 +1013,8 @@ def __init__(self, name, filename=None, url=None, known_hash=None, area=None,
self.area = get_area_def(area)

super(StaticImageCompositor, self).__init__(name, **kwargs)
self._cache_key = self.register_data_files()[0]
cache_keys = self.register_data_files([])
self._cache_key = cache_keys[0]

@staticmethod
def _get_cache_filename_and_url(filename, url):
Expand All @@ -1030,16 +1029,15 @@ def _get_cache_filename_and_url(filename, url):
"or absolute path to 'filename'.")
return filename, url

def register_data_files(self):
def register_data_files(self, data_files):
"""Tell Satpy about files we may want to download."""
if os.path.isabs(self._cache_filename):
return [None]
from satpy.data_download import register_file
cache_key = register_file(self._url, self._cache_filename,
component_type='composites',
component_name=self.__class__.__name__,
known_hash=self._known_hash)
return [cache_key]
return super().register_data_files([{
'url': self._url,
'known_hash': self._known_hash,
'filename': self._cache_filename,
}])

def _retrieve_data_file(self):
from satpy.data_download import retrieve
Expand Down
Loading