Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy regridding with Linear, Nearest, and AreaWeighted #3701

Merged
merged 9 commits into from
Sep 9, 2020
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions docs/iris/src/userguide/interpolation_and_regridding.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ The following are the regridding schemes that are currently available in Iris:
* nearest-neighbour regridding (:class:`iris.analysis.Nearest`), and
* area-weighted regridding (:class:`iris.analysis.AreaWeighted`, first-order conservative).

These regridding schemes support lazy regridding, i.e. if the source cube has
lazy data, the resulting cube will also have lazy data.
See :doc:`real_and_lazy_data` for an introduction to lazy data.


.. _interpolation:

Expand Down Expand Up @@ -409,3 +413,45 @@ regridded to the target grid. For example::
In each case ``result`` will be the input cube regridded to the grid defined by
the target grid cube (in this case ``rotated_psl``) that we used to define the
cached regridder.

Regridding lazy data
^^^^^^^^^^^^^^^^^^^^

If you are working with large cubes, especially when you are regridding to a
high resolution target grid, you may run out of memory when trying to
regrid a cube. When this happens, make sure the input cube has lazy data

>>> air_temp = iris.load_cube(iris.sample_data_path('A1B_north_america.nc'))
>>> air_temp
<iris 'Cube' of air_temperature / (K) (time: 240; latitude: 37; longitude: 49)>
>>> air_temp.has_lazy_data()
True

and the regridding scheme supports lazy data. All regridding schemes described
here support lazy data. If you still run out of memory even while using lazy
data, inspect the
`chunks <https://docs.dask.org/en/latest/array-chunks.html>`__
:

>>> air_temp.lazy_data().chunks
((240,), (37,), (49,))

The cube above consist of a single chunk, because it is fairly small. For
larger cubes, iris will automatically create chunks of an optimal size when
loading the data. However, because regridding to a high resolution grid
may dramatically increase the size of the data, the automatically chosen
chunks might be too large.

As an example of how to solve this, we could manually re-chunk the time
dimension, to regrid it in 8 chunks of 30 timesteps at a time:

>>> air_temp.data = air_temp.lazy_data().rechunk([30, None, None])
>>> air_temp.lazy_data().chunks
((30, 30, 30, 30, 30, 30, 30, 30), (37,), (49,))

Assuming that Dask is configured such that it processes only a few chunks of
the data array at a time, this will further reduce memory use.

Note that chunking in the horizontal dimensions is not supported by the
regridding schemes. Chunks in these dimensions will automatically be combined
before regridding.
4 changes: 3 additions & 1 deletion docs/iris/src/userguide/real_and_lazy_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,9 @@ In Iris, lazy data is provided as a
`dask array <https://docs.dask.org/en/latest/array.html>`_.
A dask array also has a shape and data type
but the dask array's data points remain on disk and only loaded into memory in
small chunks when absolutely necessary. This has key performance benefits for
small
`chunks <https://docs.dask.org/en/latest/array-chunks.html>`__
when absolutely necessary. This has key performance benefits for
handling large amounts of data, where both calculation time and storage
requirements can be significantly reduced.

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
* Lazy regridding with the :class:`~iris.analysis.Linear`, :class:`~iris.analysis.Nearest`, and :class:`~iris.analysis.AreaWeighted` regridding schemes.
36 changes: 36 additions & 0 deletions lib/iris/_lazy_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -349,3 +349,39 @@ def lazy_elementwise(lazy_array, elementwise_op):
dtype = elementwise_op(np.zeros(1, lazy_array.dtype)).dtype

return da.map_blocks(elementwise_op, lazy_array, dtype=dtype)


def map_complete_blocks(src, func, dims, out_sizes):
"""Apply a function to complete blocks.

Complete means that the data is not chunked along the chosen dimensions.

Args:

* src (:class:`~iris.cube.Cube`):
Source cube that function is applied to.
* func:
Function to apply.
* dims (tuple of int):
Dimensions that cannot be chunked.
* out_sizes (tuple of int):
Output size of dimensions that cannot be chunked.

"""
if not src.has_lazy_data():
return func(src.data)

data = src.lazy_data()

# Ensure dims are not chunked
in_chunks = list(data.chunks)
for dim in dims:
in_chunks[dim] = src.shape[dim]
data = data.rechunk(in_chunks)

# Determine output chunks
out_chunks = list(data.chunks)
for dim, size in zip(dims, out_sizes):
out_chunks[dim] = size

return data.map_blocks(func, chunks=out_chunks, dtype=src.dtype)
16 changes: 16 additions & 0 deletions lib/iris/analysis/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2440,6 +2440,10 @@ def regridder(self, src_grid, target_grid):
constructing your own regridder is preferable. These are detailed in
the :ref:`user guide <caching_a_regridder>`.

Supports lazy regridding. Any
`chunks <https://docs.dask.org/en/latest/array-chunks.html>`__
in horizontal dimensions will be combined before regridding.

Args:

* src_grid:
Expand Down Expand Up @@ -2514,6 +2518,10 @@ def regridder(self, src_grid_cube, target_grid_cube):
constructing your own regridder is preferable. These are detailed in
the :ref:`user guide <caching_a_regridder>`.

Supports lazy regridding. Any
`chunks <https://docs.dask.org/en/latest/array-chunks.html>`__
in horizontal dimensions will be combined before regridding.

Args:

* src_grid_cube:
Expand Down Expand Up @@ -2630,6 +2638,10 @@ def regridder(self, src_grid, target_grid):
constructing your own regridder is preferable. These are detailed in
the :ref:`user guide <caching_a_regridder>`.

Supports lazy regridding. Any
`chunks <https://docs.dask.org/en/latest/array-chunks.html>`__
in horizontal dimensions will be combined before regridding.

Args:

* src_grid:
Expand Down Expand Up @@ -2716,6 +2728,8 @@ def regridder(self, src_cube, target_grid):
constructing your own regridder is preferable. These are detailed in
the :ref:`user guide <caching_a_regridder>`.

Does not support lazy regridding.

Args:

* src_cube:
Expand Down Expand Up @@ -2791,6 +2805,8 @@ def regridder(self, src_grid, target_grid):
constructing your own regridder is preferable. These are detailed in
the :ref:`user guide <caching_a_regridder>`.

Does not support lazy regridding.

Args:

* src_grid:
Expand Down
9 changes: 9 additions & 0 deletions lib/iris/analysis/_area_weighted.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,9 @@ def __call__(self, cube):
The given cube must be defined with the same grid as the source
grid used to create this :class:`AreaWeightedRegridder`.

If the source cube has lazy data, the returned cube will also
have lazy data.

Args:

* cube:
Expand All @@ -89,6 +92,12 @@ def __call__(self, cube):
this cube will be converted to values on the new grid using
area-weighted regridding.

.. note::

If the source cube has lazy data,
`chunks <https://docs.dask.org/en/latest/array-chunks.html>`__
in the horizontal dimensions will be combined before regridding.

"""
src_x, src_y = get_xy_dim_coords(cube)
if (src_x, src_y) != self._src_grid:
Expand Down
39 changes: 29 additions & 10 deletions lib/iris/analysis/_regrid.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
import numpy as np
import numpy.ma as ma

from iris._lazy_data import map_complete_blocks
from iris.analysis._interpolation import (
EXTRAPOLATION_MODES,
extend_circular_coord_and_data,
Expand Down Expand Up @@ -438,6 +439,9 @@ def __call__(self, src):
The given cube must be defined with the same grid as the source
grid used to create this :class:`_CurvilinearRegridder`.

If the source cube has lazy data, it will be realized before
regridding and the returned cube will also have realized data.

Args:

* src:
Expand Down Expand Up @@ -984,6 +988,9 @@ def __call__(self, src):
The given cube must be defined with the same grid as the source
grid used to create this :class:`RectilinearRegridder`.

If the source cube has lazy data, the returned cube will also
have lazy data.

Args:

* src:
Expand All @@ -995,6 +1002,12 @@ def __call__(self, src):
this cube will be converted to values on the new grid using
either nearest-neighbour or linear interpolation.

.. note::

If the source cube has lazy data,
`chunks <https://docs.dask.org/en/latest/array-chunks.html>`__
in the horizontal dimensions will be combined before regridding.

"""
# Validity checks.
if not isinstance(src, iris.cube.Cube):
Expand Down Expand Up @@ -1040,16 +1053,22 @@ def __call__(self, src):
# Compute the interpolated data values.
x_dim = src.coord_dims(src_x_coord)[0]
y_dim = src.coord_dims(src_y_coord)[0]
data = self._regrid(
src.data,
x_dim,
y_dim,
src_x_coord,
src_y_coord,
sample_grid_x,
sample_grid_y,
self._method,
self._extrapolation_mode,

# Define regrid function
regrid = functools.partial(
self._regrid,
x_dim=x_dim,
y_dim=y_dim,
src_x_coord=src_x_coord,
src_y_coord=src_y_coord,
sample_grid_x=sample_grid_x,
sample_grid_y=sample_grid_y,
method=self._method,
extrapolation_mode=self._extrapolation_mode,
)

data = map_complete_blocks(
src, regrid, (y_dim, x_dim), sample_grid_x.shape
)

# Wrap up the data as a Cube.
Expand Down
17 changes: 13 additions & 4 deletions lib/iris/cube.py
Original file line number Diff line number Diff line change
Expand Up @@ -4448,7 +4448,7 @@ def interpolate(self, sample_points, scheme, collapse_scalar=True):
return interp(points, collapse_scalar=collapse_scalar)

def regrid(self, grid, scheme):
"""
r"""
Regrid this :class:`~iris.cube.Cube` on to the given target `grid`
using the given regridding `scheme`.

Expand All @@ -4461,16 +4461,25 @@ def regrid(self, grid, scheme):
target grid. The regridding schemes currently available
in Iris are:

* :class:`iris.analysis.Linear`,
* :class:`iris.analysis.Nearest`, and
* :class:`iris.analysis.AreaWeighted`.
* :class:`iris.analysis.Linear`\*,
* :class:`iris.analysis.Nearest`\*,
* :class:`iris.analysis.AreaWeighted`\*,
* :class:`iris.analysis.UnstructuredNearest`,
* :class:`iris.analysis.PointInCell`,
* :class:`iris.experimental.regrid.ProjectedUnstructuredLinear`,
* :class:`iris.experimental.regrid.ProjectedUnstructuredNearest`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found several more regridding schemes and added them here. Does that look right, or were they missing for a reason?

Copy link
Member

@pp-mo pp-mo Sep 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is out of date.
But also, we don't want to commit to the experimental ones.
So maybe it should omit those, and amend to say "The regridding schemes in Iris currently include:" ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also.. if we extend this then the next section that you added "These regridding schemes support lazy regridding..." needs to change, as not all these schemes support it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the experimental regridders and changed the section in the user guide so it explicitly mentions which regridding schemes are lazy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not add a section or mention of the UnstructeredNearest or PointInCell schemes to the user guide, because I would not feel qualified to write such a section and I also feel that it would be beyond the scope of this pull request.


\* Supports lazy regridding.

Returns:
A cube defined with the horizontal dimensions of the target grid
and the other dimensions from this cube. The data values of
this cube will be converted to values on the new grid
according to the given regridding scheme.

The returned cube will have lazy data if the original cube has
lazy data and the regridding scheme supports lazy regridding.

.. note::

Both the source and target cubes must have a CoordSystem, otherwise
Expand Down
17 changes: 15 additions & 2 deletions lib/iris/experimental/regrid.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
import numpy.ma as ma
import scipy.interpolate

from iris._lazy_data import map_complete_blocks
import iris.analysis.cartography
from iris.analysis._interpolation import (
get_xy_dim_coords,
Expand Down Expand Up @@ -931,8 +932,16 @@ def _regrid_area_weighted_rectilinear_src_and_grid__perform(
) = regrid_info

# Calculate new data array for regridded cube.
new_data = _regrid_area_weighted_array(
src_cube.data, src_x_dim, src_y_dim, weights_info, mdtol,
regrid = functools.partial(
_regrid_area_weighted_array,
x_dim=src_x_dim,
y_dim=src_y_dim,
weights_info=weights_info,
mdtol=mdtol,
)

new_data = map_complete_blocks(
src_cube, regrid, (src_y_dim, src_x_dim), meshgrid_x.shape
)

# Wrap up the data as a Cube.
Expand Down Expand Up @@ -1470,6 +1479,8 @@ def regridder(self, src_cube, target_grid):
constructing your own regridder is preferable. These are detailed in
the :ref:`user guide <caching_a_regridder>`.

Does not support lazy regridding.

Args:

* src_cube:
Expand Down Expand Up @@ -1535,6 +1546,8 @@ def regridder(self, src_cube, target_grid):
constructing your own regridder is preferable. These are detailed in
the :ref:`user guide <caching_a_regridder>`.

Does not support lazy regridding.

Args:

* src_cube:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
import copy
import random

import dask.array as da
import numpy as np
import numpy.ma as ma

Expand Down Expand Up @@ -456,7 +457,14 @@ def test_ten_by_ten_subset(self):
indices = tuple([slice(i, i + 10), slice(j, j + 10)])
dest = src[indices]
res = regrid_area_weighted(src, dest)
self.assertTrue(res, src[indices])
self.assertEqual(res, src[indices])

def test_lazy_nop(self):
src = self.realistic_cube[:2, :3, :10, :10]
src.data = da.asarray(src.data, chunks=((1, 1), (2, 1), (10,), (10,)))
res = regrid_area_weighted(src, src)
self.assertTrue(res.has_lazy_data())
self.assertEqual(res, src)

def test_cross_section(self):
# Slice to get a cross section.
Expand Down
Loading