Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprocessor multimodel_statistics fails when data have no time dimension #890

Closed
stefsmeets opened this issue Nov 30, 2020 · 16 comments · Fixed by #1808
Closed

Preprocessor multimodel_statistics fails when data have no time dimension #890

stefsmeets opened this issue Nov 30, 2020 · 16 comments · Fixed by #1808
Labels
bug Something isn't working preprocessor Related to the preprocessor

Comments

@stefsmeets
Copy link
Contributor

Describe the bug
Hi all, I'm developing some tests for the multimodel statistics using real data (#856), and I'm coming accross this bug:

Running multimodel statistics with a list of cubes with no time dimension, will result in
ValueError: Cannot guess bounds for a coordinate of length 1.

The bug can be reproduced by:

cubes = [cube[0] for cube in cubes]
multi_model_statistics(cubes, span='full', statistics=['mean'])
See the full stack trace below.
timeseries_cubes_month = [<iris 'Cube' of air_temperature / (K) (time: 14; air_pressure: 2; latitude: 3; longitude: 2)>, <iris 'Cube' of air_te... 3; longitude: 2)>, <iris 'Cube' of air_temperature / (K) (time: 14; air_pressure: 2; latitude: 3; longitude: 2)>, ...]

    @pytest.mark.functional
    # @pytest.mark.xfail('ValueError')
    def test_multimodel_no_time_dimension(timeseries_cubes_month):
        """Test statistic without time dimension using monthly data."""
        span = 'full'
        cubes = timeseries_cubes_month
        cubes = [cube[0] for cube in cubes]
        # ValueError: Cannot guess bounds for a coordinate of length 1.
>       multimodel_test(cubes, span=span, statistic='mean')

tests/functional/test_multimodel.py:227:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/functional/test_multimodel.py:125: in multimodel_test
    output = multi_model_statistics(cubes, span=span, statistics=statistics)
esmvalcore/preprocessor/_multimodel.py:439: in multi_model_statistics
    statistic_cube = _assemble_full_data(cubes, statistic)
esmvalcore/preprocessor/_multimodel.py:323: in _assemble_full_data
    time_axis = [float(fl) for fl in _monthly_t(cubes)]
esmvalcore/preprocessor/_multimodel.py:277: in _monthly_t
    days = {day for cube in cubes for day in _datetime_to_int_days(cube)}
esmvalcore/preprocessor/_multimodel.py:277: in <setcomp>
    days = {day for cube in cubes for day in _datetime_to_int_days(cube)}
esmvalcore/preprocessor/_multimodel.py:206: in _datetime_to_int_days
    cube = _align_yearly_axes(cube)
esmvalcore/preprocessor/_multimodel.py:231: in _align_yearly_axes
    return regrid_time(cube, 'yr')
esmvalcore/preprocessor/_time.py:602: in regrid_time
    cube.coord('time').guess_bounds()
../../miniconda3/envs/esmvaltool/lib/python3.8/site-packages/iris/coords.py:1564: in guess_bounds
    self.bounds = self._guess_bounds(bound_position)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = DimCoord(array([49456.]), standard_name='time', units=Unit('days since 1850-1-1 00:00:00', calendar='365_day'), long_name='time', var_name='time', attributes={'_ChunkSizes': 1})
bound_position = 0.5

    def _guess_bounds(self, bound_position=0.5):
        """
        Return bounds for this coordinate based on its points.

        Kwargs:

        * bound_position:
            The desired position of the bounds relative to the position
            of the points.

        Returns:
            A numpy array of shape (len(self.points), 2).

        .. note::

            This method only works for coordinates with ``coord.ndim == 1``.

        .. note::

            If `iris.FUTURE.clip_latitudes` is True, then this method
            will clip the coordinate bounds to the range [-90, 90] when:

            - it is a `latitude` or `grid_latitude` coordinate,
            - the units are degrees,
            - all the points are in the range [-90, 90].

        .. deprecated:: 2.0.0

            The `iris.FUTURE.clip_latitudes` option is now deprecated
            and is set to True by default. Please remove code which
            relies on coordinate bounds being outside the range
            [-90, 90].

        """
        # XXX Consider moving into DimCoord
        # ensure we have monotonic points
        if not self.is_monotonic():
            raise ValueError("Need monotonic points to generate bounds for %s"
                             % self.name())

        if self.ndim != 1:
            raise iris.exceptions.CoordinateMultiDimError(self)

        if self.shape[0] < 2:
>           raise ValueError('Cannot guess bounds for a coordinate of length '
                             '1.')
E           ValueError: Cannot guess bounds for a coordinate of length 1.
@stefsmeets stefsmeets added bug Something isn't working preprocessor Related to the preprocessor labels Nov 30, 2020
@valeriupredoi
Copy link
Contributor

really, one should not run a multimodel preproc when there is not even a single time point - there is no meaningful statistical quantity that gets computed even with a single time point; a lot of our other preprocs would not work without a time point, I suggest you don't test with that kind of data, but we probably do need an initial check on the data before kickstarting the MM preproc - @bouweandela @jvegasbsc what you fellas think of this?

@stefsmeets
Copy link
Contributor Author

This works in nearly every statistical package:

from statistics import mean
mean([2])
>>> 2

Besides the fact that the mean of a single number is trivial, one might reasonably expect that it works the same for a cube with a single time point.

@valeriupredoi
Copy link
Contributor

I think there is an exception raised if the number of coincident time points for all models is lower than 3 - I don't know if a multimodel with no time variation is useful in any way, if it is, and the user is interested in a spatial-only statistic then sure, that'd be something to put in = do you know of such a case @stefsmeets ? 🍺

@bouweandela
Copy link
Member

bouweandela commented Dec 1, 2020

@valeriupredoi See #41 for a description of the problem by @ledm, there he tries to compute multimodel statistics after taking a time average.

@valeriupredoi
Copy link
Contributor

@bouweandela how did you remember of such an old issue? Good one, we should probably get that PR @Peter9192 has worked on in and then see about these issues the sooner the better 👍

@bouweandela
Copy link
Member

Yes, that's why @stefsmeets is working on #856, so we can test pull requests like that without manually running recipe_permetrics.yml with and without the changes and then comparing the output.

@schlunma
Copy link
Contributor

schlunma commented Dec 1, 2020

I fully agree with @stefsmeets and @bouweandela here; calculating multi-model statistics should be independent of any dimension. There is no need for a time dimension, since the statistic is calculated over the model dimension and not over the time dimension.

For example, suppose we have M datasets with shapes (A, B). Then multi_model_statistics should first merge the datasets to get one cube with shape (M, A, B) and then calculate the statistic over the first dimensions with length M to get a multi-model dataset with shape (A, B). A and B could be any data dimension.

There are plenty of use cases that either do not need a time (multi-model mean maps) or horizontal (multi-model mean time series) dimension.

@valeriupredoi
Copy link
Contributor

nice discussion guys! I can have a go at all this once #673 #685 and #767 have been merged - speaking of which - we should probably accelerate the review for those, they seem pretty mature 🧀

@bouweandela
Copy link
Member

A related problem, it also appears to fail when data have a scalar time coordinate.

@LisaBock
Copy link
Contributor

Hi all, there is again a problem when using mulitmodel statistics for one time point. Maybe it was again intorduced with pull request #1301. I don't know...

The error message says:

ValueError: Cannot guess bounds for a coordinate of length 1.

The preprocessor I used:

preprocessors:
  zonal_mmm:
    regrid:
      target_grid: 2x2
      scheme: linear
    multi_model_statistics:
      span: full
      statistics: [mean, p5, p95]
    climate_statistics:
      operator: mean
    zonal_statistics:
      operator: mean

@zklaus zklaus added this to the v2.4.0 milestone Sep 29, 2021
@bouweandela
Copy link
Member

I fixed this as part of #968, but it looks like that may not be ready in time before v2.4.

@zklaus
Copy link

zklaus commented Oct 5, 2021

Ok, we'll have to live with it for one more cycle then. Let's prioritize it in the next round.

@zklaus zklaus modified the milestones: v2.4.0, v2.5.0 Oct 5, 2021
@schlunma
Copy link
Contributor

schlunma commented Feb 4, 2022

Moving this to v2.6 since there is not open PR yet.

@schlunma schlunma modified the milestones: v2.5.0, v2.6.0 Feb 4, 2022
@sloosvel sloosvel removed this from the v2.6.0 milestone Jun 7, 2022
@schlunma
Copy link
Contributor

I've been working with the multi_model_statistics preprocessor a lot recently and had some really unpleasant experiences. This is issue was one of the main problems.

I'd like to work on this now because I think this can be fixed very easily. @bouweandela I saw that you are planning to work on #968. Does this somehow conflict with my plans?

@schlunma
Copy link
Contributor

schlunma commented Nov 18, 2022

Draft PR: #1808

(this also solved 4 other issues)

@bouweandela
Copy link
Member

I saw that you are planning to work on #968. Does this somehow conflict with my plans?

I did some work making things lazy a long time ago and also included other changes (such as fixing some issues with dimensions), expecting that it would be merged quite soon. Unfortunately that didn't happen. It would be nice if you could take the changes from there that fix the other problems, so I don't end up with a massive merge conflict, but don't worry about it if it's difficult. I do not really expect lazy multimodel statistics to make it into v2.8, v2.9 seems more realistic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working preprocessor Related to the preprocessor
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants