Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add auxiliary data download API #1513

Merged
merged 16 commits into from
Feb 12, 2021
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions continuous_integration/environment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ dependencies:
- fsspec
- pylibtiff
- python-geotiepoints
- pooch
- pip
- pip:
- trollsift
Expand Down
1 change: 1 addition & 0 deletions doc/rtd_environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ dependencies:
- graphviz
- numpy
- pillow
- pooch
- pyresample
- setuptools
- setuptools_scm
Expand Down
1 change: 1 addition & 0 deletions doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -268,4 +268,5 @@ def __getattr__(cls, name):
'xarray': ('https://xarray.pydata.org/en/stable', None),
'rasterio': ('https://rasterio.readthedocs.io/en/latest', None),
'donfig': ('https://donfig.readthedocs.io/en/latest', None),
'pooch': ('https://www.fatiando.org/pooch/latest/', None),
}
17 changes: 17 additions & 0 deletions doc/source/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,8 @@ configuration files, they are merged in reverse order. This means "base"
configuration paths should be at the end of the list and custom/user paths
should be at the beginning of the list.

.. _data_dir_setting:

Data Directory
^^^^^^^^^^^^^^

Expand All @@ -128,6 +130,21 @@ defaults to a different path depending on your operating system following the
`appdirs <https://github.com/ActiveState/appdirs#some-example-output>`_
"user data dir".

.. _download_aux_setting:

Download Auxiliary Data
^^^^^^^^^^^^^^^^^^^^^^^

* **Environment variable**: ``SATPY_DOWNLOAD_AUX``
* **YAML/Config Key**: ``download_aux``
* **Default**: True

Whether to allow downloading of auxiliary files for certain Satpy operations.
See :doc:`dev_guide/aux_data` for more information. If ``True`` then Satpy
will download and cache any necessary data files to :ref:`data_dir_setting`
when needed. If ``False`` then pre-downloaded files will be used, but any
other files will not be downloaded or checked for validity.

.. _component_configuration:

Component Configuration
Expand Down
122 changes: 122 additions & 0 deletions doc/source/dev_guide/aux_data.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
Auxiliary Data Download
=======================

Sometimes Satpy components need some extra data files to get their work
done properly. These include files like Look Up Tables (LUTs), coefficients,
or Earth model data (ex. elevations). This includes any file that would be too
large to be included in the Satpy python package; anything bigger than a small
text file. To help with this, Satpy includes utilities for downloading and
caching these files only when your component is used. This saves the user from
wasting time and disk space downloading files they may never use.
This functionality is made possible thanks to the
`Pooch library <https://www.fatiando.org/pooch/latest/>`_.

Downloaded files are stored in the directory configured by
:ref:`data_dir_setting`.

Adding download functionality
-----------------------------

The utility functions for data downloading include a two step process:

1. **Registering**: Tell Satpy what files might need to be downloaded and used
later.
2. **Retrieving**: Ask Satpy to download and store the files locally.

Registering
^^^^^^^^^^^

Registering a file for downloading tells Satpy the remote URL for the file,
and an optional hash. The hash is used to verify a successful download.
Registering can also include a ``filename`` to tell Satpy what to name the
file when it is downloaded. If not provided it will be determined from the URL.
Once registered, Satpy can be told to retrieve the file (see below) by using a
"cache key". Cache keys follow the general scheme of
``<component_type>/<filename>`` (ex. ``readers/README.rst``).

Satpy includes a low-level function and a high-level Mixin class for
registering files. The higher level class is recommended for any Satpy
component like readers, writers, and compositors. The lower-level
:func:`~satpy.data_download.register_file` function can be used for any other
use case.

The :class:`~satpy.data_download.DataMixIn` class is automatically included
in the :class:`~satpy.readers.yaml_reader.FileYAMLReader` and
:class:`~satpy.writers.Writer` base classes. For any other component (like
a compositor) you should include it as another parent class:

.. code-block:: python

from satpy.data_download import DataDownloadMixin
from satpy.composites import GenericCompositor

class MyCompositor(GenericCompositor, DataDownloadMixin):
"""Compositor that uses downloaded files."""

def __init__(self, name, url=None, known_hash=None, **kwargs):
super().__init__(name, **kwargs)
data_files = [{'url': url, 'known_hash': known_hash}]
self.register_data_files(data_files)

However your code registers files, to be consistent it must do it during
initialization so that the :func:`~satpy.data_download.find_registerable_files`.
If your component isn't a reader, writer, or compositor then this function
will need to be updated to find and load your registered files. See
:ref:`offline_aux_downloads` below for more information.

As mentioned, the mixin class is included in the base reader and writer class.
To register files in these cases, include a ``data_files`` section in your
YAML configuration file. For readers this would go under the ``reader``
section and for writers the ``writer`` section. This parameter is a list
of dictionaries including a ``url``, ``known_hash``, and optional
djhoese marked this conversation as resolved.
Show resolved Hide resolved
``filename``. For example::

reader:
name: abi_l1b
short_name: ABI L1b
long_name: GOES-R ABI Level 1b
... other metadata ...
data_files:
- url: "https://example.com/my_data_file.dat"
- url: "https://raw.githubusercontent.com/pytroll/satpy/master/README.rst"
known_hash: "sha256:5891286b63e7745de08c4b0ac204ad44cfdb9ab770309debaba90308305fa759"
- url: "https://raw.githubusercontent.com/pytroll/satpy/master/RELEASING.md"
filename: "satpy_releasing.md"
known_hash: null

See the :class:`~satpy.data_download.DataDownloadMixin` for more information.

Retrieving
^^^^^^^^^^

Files that have been registered (see above) can be retrieved by calling the
:func:`~satpy.data_download.retrieve` function. This function expects a single
argument: the cache key. Cache keys are returned by registering functions, but
can also be pre-determined by following the scheme
``<component_type>/<filename>`` (ex. ``readers/README.rst``).
Retrieving a file will download it to local disk if needed and then return
the local pathname. Data is stored locally in the :ref:`data_dir_setting`.
It is up to the caller to then open the file.

.. _offline_aux_downloads:

Offline Downloads
-----------------

To assist with operational environments, Satpy includes a
:func:`~satpy.data_download.retrieve_all` function that will try to find all
files that Satpy components may need to download in the future and download
them to the current directory specified by :ref:`data_dir_setting`.
This function allows you to specify a list of ``readers``, ``writers``, or
``composite_sensors`` to limit what components are checked for files to
download.

The ``retrieve_all`` function is also available through a command line script
called ``satpy_retrieve_all``. Run the following for usage information.

.. code-block:: bash

satpy_retrieve_all --help
djhoese marked this conversation as resolved.
Show resolved Hide resolved

To make sure that no additional files are downloaded when running Satpy see
:ref:`download_aux_setting`.
10 changes: 9 additions & 1 deletion doc/source/dev_guide/custom_reader.rst
Original file line number Diff line number Diff line change
Expand Up @@ -571,4 +571,12 @@ One way of implementing a file handler is shown below:
# left as an exercise to the reader :)

If you have any questions, please contact the
:ref:`Satpy developers <dev_help>`.
:ref:`Satpy developers <dev_help>`.

Auxiliary File Download
-----------------------

If your reader needs additional data files to do calibrations, corrections,
or anything else see the :doc:`aux_data` document for more information on
how to download and cache these files without including them in the Satpy
python package.
1 change: 1 addition & 0 deletions doc/source/dev_guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ at the pages listed below.
custom_reader
plugins
satpy_internals
aux_data

Coding guidelines
=================
Expand Down
6 changes: 4 additions & 2 deletions satpy/_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
'cache_dir': _satpy_dirs.user_cache_dir,
'data_dir': _satpy_dirs.user_data_dir,
'config_path': [],
'download_aux': True,
}

# Satpy main configuration object
Expand Down Expand Up @@ -116,13 +117,14 @@ def config_search_paths(filename, search_dirs=None, **kwargs):
return paths[::-1]


def glob_config(pattern):
def glob_config(pattern, search_dirs=None):
"""Return glob results for all possible configuration locations.

Note: This method does not check the configuration "base" directory if the pattern includes a subdirectory.
This is done for performance since this is usually used to find *all* configs for a certain component.
"""
patterns = config_search_paths(pattern, check_exists=False)
patterns = config_search_paths(pattern, search_dirs=search_dirs,
check_exists=False)
for pattern_fn in patterns:
for path in glob.iglob(pattern_fn):
yield path
Expand Down
78 changes: 59 additions & 19 deletions satpy/composites/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,9 @@
import numpy as np
import xarray as xr

import satpy
from satpy.dataset import DataID, combine_metadata
from satpy.dataset.dataid import minimal_default_keys_config
from satpy.data_download import DataDownloadMixin
from satpy.writers import get_enhanced_image


Expand Down Expand Up @@ -971,45 +971,85 @@ def __call__(self, projectables, *args, **kwargs):
*args, **kwargs)


class StaticImageCompositor(GenericCompositor):
class StaticImageCompositor(GenericCompositor, DataDownloadMixin):
"""A compositor that loads a static image from disk.

If the filename passed to this compositor is not valid then
the SATPY_ANCPATH environment variable will be checked to see
if the image is located there
Environment variables in the filename are automatically expanded.

Environment variables in the filename are automatically expanded
"""

def __init__(self, name, filename=None, area=None, **kwargs):
def __init__(self, name, filename=None, url=None, known_hash=None, area=None,
**kwargs):
"""Collect custom configuration values.

Args:
filename (str): Filename of the image to load, environment
variables are expanded
filename (str): Name to use when storing and referring to the file
in the ``data_dir`` cache. If ``url`` is provided (preferred),
then this is used as the filename in the cache and will be
appended to ``<data_dir>/composites/<class_name>/``. If
``url`` is provided and ``filename`` is not then the
``filename`` will be guessed from the ``url``.
If ``url`` is not provided, then it is assumed ``filename``
refers to a local file with an absolute path.
Environment variables are expanded.
url (str): URL to remote file. When the composite is created the
file will be downloaded and cached in Satpy's ``data_dir``.
Environment variables are expanded.
known_hash (str or None): Hash of the remote file used to verify
a successful download. If not provided then the download will
not be verified. See :func:`satpy.data_download.register_file`
for more information.
area (str): Name of area definition for the image. Optional
for images with built-in area definitions (geotiff)
for images with built-in area definitions (geotiff).

"""
if filename is None:
raise ValueError("No image configured for static image compositor")
self.filename = os.path.expandvars(filename)
filename, url = self._get_cache_filename_and_url(filename, url)
self._cache_filename = filename
self._url = url
self._known_hash = known_hash
self.area = None
if area is not None:
from satpy.resample import get_area_def
self.area = get_area_def(area)

super(StaticImageCompositor, self).__init__(name, **kwargs)
cache_keys = self.register_data_files([])
self._cache_key = cache_keys[0]

@staticmethod
def _get_cache_filename_and_url(filename, url):
if filename is not None:
filename = os.path.expanduser(os.path.expandvars(filename))
if url is not None:
url = os.path.expandvars(url)
if filename is None:
filename = os.path.basename(url)
if url is None and (filename is None or not os.path.isabs(filename)):
raise ValueError("StaticImageCompositor needs a remote 'url' "
"or absolute path to 'filename'.")
return filename, url

def register_data_files(self, data_files):
"""Tell Satpy about files we may want to download."""
if os.path.isabs(self._cache_filename):
return [None]
return super().register_data_files([{
'url': self._url,
'known_hash': self._known_hash,
'filename': self._cache_filename,
}])

def _retrieve_data_file(self):
from satpy.data_download import retrieve
if os.path.isabs(self._cache_filename):
return self._cache_filename
return retrieve(self._cache_key)

def __call__(self, *args, **kwargs):
"""Call the compositor."""
from satpy import Scene
# Check if filename exists, if not then try from SATPY_ANCPATH
if not os.path.isfile(self.filename):
tmp_filename = os.path.join(satpy.config.get('data_dir'), self.filename)
if os.path.isfile(tmp_filename):
self.filename = tmp_filename
scn = Scene(reader='generic_image', filenames=[self.filename])
local_file = self._retrieve_data_file()
scn = Scene(reader='generic_image', filenames=[local_file])
scn.load(['image'])
img = scn['image']
# use compositor parameters as extra metadata
Expand Down
16 changes: 15 additions & 1 deletion satpy/composites/config_loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,8 @@
from yaml import UnsafeLoader

from satpy import DatasetDict, DataQuery, DataID
from satpy._config import get_entry_points_config_dirs, config_search_paths
from satpy._config import (get_entry_points_config_dirs, config_search_paths,
glob_config)
from satpy.utils import recursive_dict_update
from satpy.dataset.dataid import minimal_default_keys_config

Expand Down Expand Up @@ -175,6 +176,19 @@ def __init__(self):
# sensor -> { dict of DataID key information }
self._sensor_dataid_keys = {}

@classmethod
def all_composite_sensors(cls):
"""Get all sensor names from available composite configs."""
paths = get_entry_points_config_dirs('satpy.composites')
composite_configs = glob_config(
os.path.join("composites", "*.yaml"),
search_dirs=paths)
yaml_names = set([os.path.splitext(os.path.basename(fn))[0]
for fn in composite_configs])
non_sensor_yamls = ('visir',)
sensor_names = [x for x in yaml_names if x not in non_sensor_yamls]
return sensor_names

def load_sensor_composites(self, sensor_name):
"""Load all compositor configs for the provided sensor."""
config_filename = sensor_name + ".yaml"
Expand Down
Loading