Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-model statistics shift time coordinate #665

Closed
LisaBock opened this issue Jun 4, 2020 · 46 comments · Fixed by #677
Closed

Multi-model statistics shift time coordinate #665

LisaBock opened this issue Jun 4, 2020 · 46 comments · Fixed by #677
Assignees

Comments

@LisaBock
Copy link
Contributor

LisaBock commented Jun 4, 2020

I tried to calculate the multi-model mean of several models with different time axes. So I chose span: full . After that the time axis of the MulitModelMean preproc file has different time points. And it does not fit to the overall time period of the models anymore.
Did I something wrong?
Could anybody help me?
Thanks!

recipe to test:

# ESMValTool
---
documentation:

  description: |
    xxx

preprocessors:

  clim_ref:
    regrid:
      target_grid: reference_dataset
      scheme: linear
    multi_model_statistics:
      span: full
      statistics: [mean]
      exclude: [reference_dataset]



datasets:
  - {dataset: bccr_bcm2_0,  institute: BCCR}
  - {dataset: cccma_cgcm3_1, institute: CCCMA}
  - {dataset: cccma_cgcm3_1_t63, institute: CCCMA}
  - {dataset: csiro_mk3_0, institute: CSIRO, start_year: 1871}
  - {dataset: gfdl_cm2_0, institute: GFDL, start_year: 1861}
  - {dataset: gfdl_cm2_1, institute: GFDL, start_year: 1861} 
  - {dataset: giss_aom, institute: NASA}
  - {dataset: giss_model_e_h, institute: NASA, start_year: 1880}
  - {dataset: giss_model_e_r, institute: NASA, start_year: 1880}
  - {dataset: iap_fgoals1_0_g, institute: LASG}
  - {dataset: ingv_echam4, institute: INGV, start_year: 1870}
  - {dataset: inmcm3_0, institute: INM, start_year: 1871}
  - {dataset: ipsl_cm4, institute: IPSL, start_year: 1860}
  #- {dataset: miroc3_2_hires, institute: NIES, start_year: 1900}
  - {dataset: miroc3_2_medres, institute: NIES}
  #- {dataset: miub_echo_g, institute: MIUB-KMA} #something wrong with numeration of years
  - {dataset: mpi_echam5, institute: MPIM, start_year: 1860}
  - {dataset: mri_cgcm2_3_2a, institute: MRI, start_year: 1851}
  - {dataset: ncar_ccsm3_0, institute: NCAR, start_year: 1870}
  - {dataset: ncar_pcm1, institute: NCAR, start_year: 1890}
  - {dataset: ukmo_hadcm3, institute: UKMO, start_year: 1860}
  - {dataset: ukmo_hadgem1, institute: UKMO, start_year: 1860}
  - {dataset: HadCRUT4, project: OBS, type: ground, version: 1, tier: 2,
     mip: Amon, end_year: 2017}


diagnostics:

  fig_1_cmip3: 
    description: CMIP3 timeseries of near-surface temperature anomalies
    variables:
      tas: 
        preprocessor: clim_ref   
        reference_dataset: HadCRUT4   
        project: CMIP3
        mip: A1
        modeling_realm: atm
        exp: 20c3m
        frequency: mo
        ensemble: run1
        start_year: 1850
        end_year: 1999
@SarahAlidoost
Copy link
Contributor

SarahAlidoost commented Jun 5, 2020

I got almost the same problem while using multi_model_statistics preprocessor.
Here, some values of time points for raw and fix_metadata files for two models of ACCESS1-0 and CCSM4. It seems that after fixing the metadata, the values of times are changed for ACCESS1-0 whereas, not for CCSM4.

  • Raw files are:
tas_Amon_ACCESS1-0_historical_r1i1p1_185001-200512.nc
1850-01-16 12:00:00
1850-02-15 00:00:00
1850-03-16 12:00:00
1850-04-16 00:00:00
1850-05-16 12:00:00
1850-06-16 00:00:00
1850-07-16 12:00:00
1850-08-16 12:00:00
1850-09-16 00:00:00
1850-10-16 12:00:00

tas_Amon_CCSM4_historical_r1i1p1_185001-200512.nc
1850-01-16 12:00:00
1850-02-15 00:00:00
1850-03-16 12:00:00
1850-04-16 00:00:00
1850-05-16 12:00:00
1850-06-16 00:00:00
1850-07-16 12:00:00
1850-08-16 12:00:00
1850-09-16 00:00:00
1850-10-16 12:00:00

  • Fix_metadata files are:
CMIP5_ACCESS1-0_Amon_historical_r1i1p1_tas_1850-2005/00_fix_metadata.nc
1850-01-14 12:00:00
1850-02-13 00:00:00
1850-03-14 12:00:00
1850-04-14 00:00:00
1850-05-14 12:00:00
1850-06-14 00:00:00
1850-07-14 12:00:00
1850-08-14 12:00:00
1850-09-14 00:00:00
1850-10-14 12:00:00
CMIP5_CCSM4_Amon_historical_r1i1p1_tas_1850-2005/00_fix_metadata.nc
1850-01-16 12:00:00
1850-02-15 00:00:00
1850-03-16 12:00:00
1850-04-16 00:00:00
1850-05-16 12:00:00
1850-06-16 00:00:00
1850-07-16 12:00:00
1850-08-16 12:00:00
1850-09-16 00:00:00
1850-10-16 12:00:00

CC: @Peter9192

@Peter9192
Copy link
Contributor

Peter9192 commented Jun 8, 2020

@LisaBock I don't think you can perform multimodel statistics over two timeseries with (completely) different time points, as the statistics are computed for each time point that the datasets have in common. So far as I understand, keyword full allows you to calculate multimodel stats even if not all models are present for the entire period, rather than only where you have data for all of them. Perhaps you could add the regrid_time preprocessor to align your data first?

@SarahAlidoost as we discussed, this issue seems to come from the dataset fix that changes the calendar of the dataset. I'll look into the reason for why this seems to break, rather than fix our time coordinate.

@valeriupredoi
Copy link
Contributor

valeriupredoi commented Jun 8, 2020

about full: the multimodel statistic is computed across the full length of timeseries data for each cube - say we have three timeseries T1, T2 and T3: T1 has time points [1800...2000], T2 has time points [1900...2000] and T3 has time points [1950...2100] - because the statistic can only be computed if at least two members are available the multimodel cube will start at 1900 and will end at 2000; as @Peter9192 says the MM is computed per time point; also, there is no need to time-regrid the points in advance since the MM module is doing its own internal time regridding for monthly data; the full option is a bit dodgy in that it doesn't apply weights to the stats - if you have only two models with data say at 1900 that statistic will be as stroong as a statistic computed at say 1950 where you have 30 models 🍺

@Peter9192
Copy link
Contributor

@SarahAlidoost I made a new issue about the access data: #669

@Peter9192
Copy link
Contributor

the MM module is doing its own internal time regridding for monthly data

Are you sure about this? As far as I see, this just creates a new time array containing all unique time points in the input data. But if they are offset with respect to one another, you'll get multiple indices. Lets say dataset 1 contains [01-15, 02-15, 03-15, ...], and dataset 2 contains [01-16, 02-16, 03-16, ...], then the new time array will contain [01-15, 01-16, 02-15, 02-16, 03-15, 03-16, ...].

But perhaps your point is that it should never be the case that these monthly time points can differ if the data are cmor standard?

@valeriupredoi
Copy link
Contributor

yes, _monthly_t() using _datetime_to_int_days() sets all the days to 1 - as I said, this is the rough regridder but works only for monthly data as it says in the code comment; I have not written _multimodel.py for daily data, that will be a monster in terms of memory and will need to be rewritten with a lot of dask in it 😁

@Peter9192
Copy link
Contributor

Ah okay, I see. But then I don't understand how both Lisa and us are getting the unexpected behaviour where the length of the time coordinate get's doubled. Let me dive into it a bit deeper.

@valeriupredoi
Copy link
Contributor

Ah okay, I see. But then I don't understand how both Lisa and us are getting the unexpected behaviour where the length of the time coordinate get's doubled. Let me dive into it a bit deeper.

can you pls post the actual times you mean by double? 🍺

@Peter9192
Copy link
Contributor

Peter9192 commented Jun 8, 2020

  ...
  File "esmvalcore/preprocessor/_multimodel.py", line 297, in _assemble_full_data
    new_datas_array = _full_time_slice(cubes, empty_arr, indices_list,
  File "esmvalcore/preprocessor/_multimodel.py", line 237, in _full_time_slice
    ndat[indices[idx_cube]] = cube.data
  File "lib/python3.8/site-packages/numpy/ma/core.py", line 3343, in __setitem__
    _data[indx] = dval
ValueError: shape mismatch: value array of shape (1668,192,288) could not be broadcast to indexing result of shape (1668,145,192)

@Peter9192
Copy link
Contributor

But I see it's not actually the time coordinate that's causing the problem in our case.

@Peter9192
Copy link
Contributor

Okay never mind.

@Peter9192
Copy link
Contributor

I managed to reproduce our problem. Here's a MWE:

---
documentation:
  description: mwe
  authors:
    - kalverla_peter
    - alidoost_sarah

datasets:
  - {dataset: ACCESS1-0, project: CMIP5, mip: Amon, exp: [historical, rcp85], ensemble: r1i1p1, start_year: 1961, end_year: 2099}
  - {dataset: CCSM4, project: CMIP5, mip: Amon, exp: [historical, rcp85], ensemble: r1i1p1, start_year: 1961, end_year: 2099}


preprocessors:
  preprocessor1:
    custom_order: True
    area_statistics:
      operator: mean
    anomalies:
      period: full  # 'full' requires https://github.com/ESMValGroup/ESMValCore/pull/652, use # 'month' for now.
      reference: &reference
        start_year: 1980
        start_month: 1
        start_day: 1
        end_year: 2009
        end_month: 12
        end_day: 31
      standardize: false
    annual_statistics:
      operator: mean
    multi_model_statistics:
      span: full
      statistics: [mean, median]   # might want to add percentiles here, but not supported

diagnostics:
  mwe:
    description: minimal working example

    variables:
      tas:
        preprocessor: preprocessor1

    scripts: null

The preprocessed output files of this recipe contain 139 time points for both the access1-0 and the CCSM4 dataset, but 278 time points for the multimodelmean. Perhaps this is caused by different calendars, still looking into it.

@Peter9192
Copy link
Contributor

I guess the problem is that we're passing yearly data to the multimodel stats, and due to the unfortunate time difference between our datasets, the annually average time points end up in the beginning of July for the one dataset, and end of June for the other.

@Peter9192
Copy link
Contributor

Final diagnosis: CCSM4 data has a no-leap calendar, and ACCESS-1 a gregorian calendar (that seems to need an extra fix: #669). The multimodel preprocessor can deal with this as long as it's monthly data, but it fails if we first compute annual means, because the two calendars' time points average out into different months.

So my questions are:

  • Should it be mentioned somewhere (docs) that multimodel is really designed with monthly data in mind?
  • Should the no-leap calendar be fixed for CCSM4? I do get trouble later on, e.g. with plotting. Matplotlib cannot plot both calendars on 1 axis.
  • I have the impression that regrid_time also doesn't account for differences in calendars, resulting in messed up time coordinates. This preprocessor order [..., annual_stats, regrid_time, multimodel] produces perfect individual model files with time points at every 1 July, but a multimodel file with time points at 28 or 29 July.

@valeriupredoi
Copy link
Contributor

valeriupredoi commented Jun 9, 2020

@Peter9192 a few points:

  • ValueError: shape mismatch: value array of shape (1668,192,288) could not be broadcast... - that's because the datasets were not regridded on a common grid, mate
  • the MM preprocessor can not handle yearly nor daily datas or, better said, it doesn't crash but the results are at risk and most of the times will be incorrect if corner cases are present: yearly datas should be fine if the time point per year is somewhere not in January or december; daily datas will not work with MM, period
  • regrid_time() has no business with calendars - it merely resets the time points to be at a convenient origin: July 1st for yearly data, 15th of the month for monthly data, midnight for daily data etc

This preprocessor order [..., annual_stats, regrid_time, multimodel] produces perfect individual model files with time points at every 1 July, but a multimodel file with time points at 28 or 29 July.

For MM; overlap the MM will use the first cube in the cube list time axis as the MM cube time axis (after all cubes have been sliced on the common overlap period, so dates may differ from cube to cube, but as long as they are monthly dates the MM doesn't care); for MM: full the MM will reset the monthly dates to the first of the month; no calendar manipulation done 🍺

@Peter9192
Copy link
Contributor

@valeriupredoi Yep, thanks for the explanation. I was trying to reproduce our earlier workflow, but overlooked the custom_order setting. I'm still a bit confused about this, though:

This preprocessor order [..., annual_stats, regrid_time, multimodel] produces perfect individual model files with time points at every 1 July, but a multimodel file with time points at 28 or 29 July.

I understand that it shouldn't be possible, but still something weird happens there.

And in general: should all calendars be 'gregorian', or is it okay for datasets to have different calendars?

@valeriupredoi
Copy link
Contributor

@Peter9192 what type of data is it (monthly, yearly means?), also can you pls post sample minimal recipe with which I could attempt to replicate the behaviour? 🍺

@Peter9192
Copy link
Contributor

Hey @valeriupredoi thanks for looking into this. It's monthly data, that we first yearly means on, and then multimodel means.

sample minimal recipe

See my earlier comment. We need the custom order and the anomalies. However, the behaviour can also be replicated with this:

---
documentation:
  description: mwe
  authors:
    - kalverla_peter
    - alidoost_sarah

datasets:
  - {dataset: ACCESS1-0, project: CMIP5, mip: Amon, exp: [historical], ensemble: r1i1p1, start_year: 1961, end_year: 1965}
  - {dataset: CCSM4, project: CMIP5, mip: Amon, exp: [historical], ensemble: r1i1p1, start_year: 1961, end_year: 1965}


preprocessors:
  preprocessor1:
    custom_order: True
    area_statistics:
      operator: mean
    annual_statistics:
      operator: mean
    # regrid_time:
    #   frequency: yr
    multi_model_statistics:
      span: full
      statistics: [mean]

diagnostics:
  mwe:
    description: minimal working example

    variables:
      tas:
        preprocessor: preprocessor1

    scripts: null

Without the regrid_time preprocessor, the ACCESS data is at 30 June, while the CCSM4 data is at 2 July. The MM Mean is then at every 28 June AND 28 July.

In [2]: time = iris.load_cube('CMIP5_ACCESS1-0_Amon_historical_r1i1p1_tas_1961-1965.nc').coord('time')                                                                                                      

In [3]: print(time)                                                                                                                                                                                         
DimCoord([1961-06-30 12:00:00, 1962-06-30 12:00:00, 1963-06-30 12:00:00,
       1964-06-30 00:00:00, 1965-06-30 12:00:00], bounds=[[1960-12-30 00:00:00, 1961-12-30 00:00:00],
       [1961-12-30 00:00:00, 1962-12-30 00:00:00],
       [1962-12-30 00:00:00, 1963-12-30 00:00:00],
       [1963-12-30 00:00:00, 1964-12-30 00:00:00],
       [1964-12-30 00:00:00, 1965-12-30 00:00:00]], standard_name='time', calendar='gregorian', long_name='time', var_name='time')

In [4]: time = iris.load_cube('CMIP5_CCSM4_Amon_historical_r1i1p1_tas_1961-1965.nc').coord('time')                                                                                                          

In [5]: print(time)                                                                                                                                                                                         
DimCoord([1961-07-02 12:00:00, 1962-07-02 12:00:00, 1963-07-02 12:00:00,
       1964-07-02 12:00:00, 1965-07-02 12:00:00], bounds=[[1961-01-01 00:00:00, 1962-01-01 00:00:00],
       [1962-01-01 00:00:00, 1963-01-01 00:00:00],
       [1963-01-01 00:00:00, 1964-01-01 00:00:00],
       [1964-01-01 00:00:00, 1965-01-01 00:00:00],
       [1965-01-01 00:00:00, 1966-01-01 00:00:00]], standard_name='time', calendar='365_day', long_name='time', var_name='time')

In [6]: time = iris.load_cube('MultiModelMean_Amon_tas_1961-1965.nc').coord('time')                                                                                                                         

In [7]: print(time)                                                                                                                                                                                         
DimCoord([1961-06-28 00:00:00, 1961-07-28 00:00:00, 1962-06-28 00:00:00,
       1962-07-28 00:00:00, 1963-06-28 00:00:00, 1963-07-28 00:00:00,
       1964-06-29 00:00:00, 1964-07-29 00:00:00, 1965-06-29 00:00:00,
       1965-07-29 00:00:00], standard_name='time', calendar='365_day', var_name='time')

Note that the source data, from ESGF, are all monthly data with exactly matching datetimes, but different calendars.:

In [8]: time = iris.load_cube('ACCESS1-0/r1i1p1/tas_Amon_ACCESS1-0_historical_r1i1p1_185001-200512.nc').coord('time')[:5]                            

In [9]: print(time)                                                                                                                                                                                         
DimCoord([1850-01-16 12:00:00, 1850-02-15 00:00:00, 1850-03-16 12:00:00,
       1850-04-16 00:00:00, 1850-05-16 12:00:00], bounds=[[1850-01-01 00:00:00, 1850-02-01 00:00:00],
       [1850-02-01 00:00:00, 1850-03-01 00:00:00],
       [1850-03-01 00:00:00, 1850-04-01 00:00:00],
       [1850-04-01 00:00:00, 1850-05-01 00:00:00],
       [1850-05-01 00:00:00, 1850-06-01 00:00:00]], standard_name='time', calendar='proleptic_gregorian', long_name='time', var_name='time')

In [10]: time = iris.load_cube('tas_Amon_CCSM4_historical_r1i1p1_185001-200512.nc').coord('time')[:5]                                   

In [11]: print(time)                                                                                                                                                                                        
DimCoord([1850-01-16 12:00:00, 1850-02-15 00:00:00, 1850-03-16 12:00:00,
       1850-04-16 00:00:00, 1850-05-16 12:00:00], bounds=[[1850-01-01 00:00:00, 1850-02-01 00:00:00],
       [1850-02-01 00:00:00, 1850-03-01 00:00:00],
       [1850-03-01 00:00:00, 1850-04-01 00:00:00],
       [1850-04-01 00:00:00, 1850-05-01 00:00:00],
       [1850-05-01 00:00:00, 1850-06-01 00:00:00]], standard_name='time', calendar='365_day', long_name='time', var_name='time')

Adding the regrid_time preprocessor fixes the recipe.

@valeriupredoi
Copy link
Contributor

valeriupredoi commented Jun 16, 2020

I see. Cheers for the exact pinpointing! It's actually a combination of two things: lack of feature on the multimodel preprocessor side and garden variety of input data time axes. The MM can not actually handle correctly this case since it computes a set on the days and not months too. Let me try fix it and write a test for it too 🍺

@valeriupredoi
Copy link
Contributor

@Peter9192 could you test your failed recipe with #677 please mate? That should fix the yearly data behaviour 🍺

@LisaBock
Copy link
Contributor Author

LisaBock commented Jun 17, 2020

@valeriupredoi I tested again with #677 and sorry, but it fails again.
The problem is that the timestamp of the MulitModelMean shifts every month one day further. And when the time period is long enough, it will be shifted in the next month. See below the result of a shorter test recipe:

recipe:

preprocessors:

  clim_ref:
    regrid:
      target_grid: reference_dataset
      scheme: linear
    multi_model_statistics:
      span: full
      statistics: [mean]
      exclude: [reference_dataset]


CMIP5_tas: &cmip5_tas
  - {dataset: FGOALS-g2}
  - {dataset: GFDL-CM3, start_year: 1860}
  - {dataset: GFDL-ESM2G, start_year: 1861}
  - {dataset: GFDL-ESM2M, start_year: 1861}
  - {dataset: GISS-E2-H-CC}
  - {dataset: HadCRUT4, project: OBS, type: ground, version: 1, tier: 2,
     end_year: 1910}



diagnostics:

  fig_1_cmip5:
    description: CMIP5 timeseries of near-surface temperature anomalies
    variables:
      tas:
        preprocessor: clim_ref
        reference_dataset: HadCRUT4
        mip: Amon
        exp: historical
        project: CMIP5
        ensemble: r1i1p1
        start_year: 1850
        end_year: 1910
    additional_datasets: *cmip5_tas

And the timestamps of my preproc file MultiModelMean_Amon_tas_1850-1910.nc looks then like this:

1850-01-01 1850-02-01 1850-03-01 1850-04-01 1850-05-01 1850-06-01 1850-07-01 1850-08-01 1850-09-01 1850-10-01 1850-11-01 1850-12-01 1851-01-01 1851-02-01 1851-03-01 1851-04-01 1851-05-01 1851-06-01 1851-07-01 1851-08-01 1851-09-01 1851-10-01 1851-11-01 1851-12-01 1852-01-01 1852-02-01 1852-03-02 1852-04-02 1852-05-02 1852-06-02 1852-07-02 1852-08-02 1852-09-02 1852-10-02 1852-11-02 1852-12-02 1853-01-02 1853-02-02 1853-03-02 1853-04-02 1853-05-02 1853-06-02 1853-07-02 1853-08-02 1853-09-02 1853-10-02 1853-11-02 1853-12-02 1854-01-02 1854-02-02 1854-03-02 1854-04-02 1854-05-02 1854-06-02 1854-07-02 1854-08-02 1854-09-02 1854-10-02 1854-11-02 1854-12-02 1855-01-02 1855-02-02 1855-03-02 1855-04-02 1855-05-02 1855-06-02 1855-07-02 1855-08-02 1855-09-02 1855-10-02 1855-11-02 1855-12-02 1856-01-02 1856-02-02 1856-03-03 1856-04-03 1856-05-03 1856-06-03 1856-07-03 1856-08-03 1856-09-03 1856-10-03 1856-11-03 1856-12-03 1857-01-03 1857-02-03 1857-03-03 1857-04-03 1857-05-03 1857-06-03 1857-07-03 1857-08-03 1857-09-03 1857-10-03 1857-11-03 1857-12-03 1858-01-03 1858-02-03 1858-03-03 1858-04-03 1858-05-03 1858-06-03 1858-07-03 1858-08-03 1858-09-03 1858-10-03 1858-11-03 1858-12-03 1859-01-03 1859-02-03 1859-03-03 1859-04-03 1859-05-03 1859-06-03 1859-07-03 1859-08-03 1859-09-03 1859-10-03 1859-11-03 1859-12-03 1860-01-03 1860-02-03 1860-03-04 1860-04-04 1860-05-04 1860-06-04 1860-07-04 1860-08-04 1860-09-04 1860-10-04 1860-11-04 1860-12-04 1861-01-04 1861-02-04 1861-03-04 1861-04-04 1861-05-04 1861-06-04 1861-07-04 1861-08-04 1861-09-04 1861-10-04 1861-11-04 1861-12-04 1862-01-04 1862-02-04 1862-03-04 1862-04-04 1862-05-04 1862-06-04 1862-07-04 1862-08-04 1862-09-04 1862-10-04 1862-11-04 1862-12-04 1863-01-04 1863-02-04 1863-03-04 1863-04-04 1863-05-04 1863-06-04 1863-07-04 1863-08-04 1863-09-04 1863-10-04 1863-11-04 1863-12-04 1864-01-04 1864-02-04 1864-03-05 1864-04-05 1864-05-05 1864-06-05 1864-07-05 1864-08-05 1864-09-05 1864-10-05 1864-11-05 1864-12-05 1865-01-05 1865-02-05 1865-03-05 1865-04-05 1865-05-05 1865-06-05 1865-07-05 1865-08-05 1865-09-05 1865-10-05 1865-11-05 1865-12-05 1866-01-05 1866-02-05 1866-03-05 1866-04-05 1866-05-05 1866-06-05 1866-07-05 1866-08-05 1866-09-05 1866-10-05 1866-11-05 1866-12-05 1867-01-05 1867-02-05 1867-03-05 1867-04-05 1867-05-05 1867-06-05 1867-07-05 1867-08-05 1867-09-05 1867-10-05 1867-11-05 1867-12-05 1868-01-05 1868-02-05 1868-03-06 1868-04-06 1868-05-06 1868-06-06 1868-07-06 1868-08-06 1868-09-06 1868-10-06 1868-11-06 1868-12-06 1869-01-06 1869-02-06 1869-03-06 1869-04-06 1869-05-06 1869-06-06 1869-07-06 1869-08-06 1869-09-06 1869-10-06 1869-11-06 1869-12-06 1870-01-06 1870-02-06 1870-03-06 1870-04-06 1870-05-06 1870-06-06 1870-07-06 1870-08-06 1870-09-06 1870-10-06 1870-11-06 1870-12-06 1871-01-06 1871-02-06 1871-03-06 1871-04-06 1871-05-06 1871-06-06 1871-07-06 1871-08-06 1871-09-06 1871-10-06 1871-11-06 1871-12-06 1872-01-06 1872-02-06 1872-03-07 1872-04-07 1872-05-07 1872-06-07 1872-07-07 1872-08-07 1872-09-07 1872-10-07 1872-11-07 1872-12-07 1873-01-07 1873-02-07 1873-03-07 1873-04-07 1873-05-07 1873-06-07 1873-07-07 1873-08-07 1873-09-07 1873-10-07 1873-11-07 1873-12-07 1874-01-07 1874-02-07 1874-03-07 1874-04-07 1874-05-07 1874-06-07 1874-07-07 1874-08-07 1874-09-07 1874-10-07 1874-11-07 1874-12-07 1875-01-07 1875-02-07 1875-03-07 1875-04-07 1875-05-07 1875-06-07 1875-07-07 1875-08-07 1875-09-07 1875-10-07 1875-11-07 1875-12-07 1876-01-07 1876-02-07 1876-03-08 1876-04-08 1876-05-08 1876-06-08 1876-07-08 1876-08-08 1876-09-08 1876-10-08 1876-11-08 1876-12-08 1877-01-08 1877-02-08 1877-03-08 1877-04-08 1877-05-08 1877-06-08 1877-07-08 1877-08-08 1877-09-08 1877-10-08 1877-11-08 1877-12-08 1878-01-08 1878-02-08 1878-03-08 1878-04-08 1878-05-08 1878-06-08 1878-07-08 1878-08-08 1878-09-08 1878-10-08 1878-11-08 1878-12-08 1879-01-08 1879-02-08 1879-03-08 1879-04-08 1879-05-08 1879-06-08 1879-07-08 1879-08-08 1879-09-08 1879-10-08 1879-11-08 1879-12-08 1880-01-08 1880-02-08 1880-03-09 1880-04-09 1880-05-09 1880-06-09 1880-07-09 1880-08-09 1880-09-09 1880-10-09 1880-11-09 1880-12-09 1881-01-09 1881-02-09 1881-03-09 1881-04-09 1881-05-09 1881-06-09 1881-07-09 1881-08-09 1881-09-09 1881-10-09 1881-11-09 1881-12-09 1882-01-09 1882-02-09 1882-03-09 1882-04-09 1882-05-09 1882-06-09 1882-07-09 1882-08-09 1882-09-09 1882-10-09 1882-11-09 1882-12-09 1883-01-09 1883-02-09 1883-03-09 1883-04-09 1883-05-09 1883-06-09 1883-07-09 1883-08-09 1883-09-09 1883-10-09 1883-11-09 1883-12-09 1884-01-09 1884-02-09 1884-03-10 1884-04-10 1884-05-10 1884-06-10 1884-07-10 1884-08-10 1884-09-10 1884-10-10 1884-11-10 1884-12-10 1885-01-10 1885-02-10 1885-03-10 1885-04-10 1885-05-10 1885-06-10 1885-07-10 1885-08-10 1885-09-10 1885-10-10 1885-11-10 1885-12-10 1886-01-10 1886-02-10 1886-03-10 1886-04-10 1886-05-10 1886-06-10 1886-07-10 1886-08-10 1886-09-10 1886-10-10 1886-11-10 1886-12-10 1887-01-10 1887-02-10 1887-03-10 1887-04-10 1887-05-10 1887-06-10 1887-07-10 1887-08-10 1887-09-10 1887-10-10 1887-11-10 1887-12-10 1888-01-10 1888-02-10 1888-03-11 1888-04-11 1888-05-11 1888-06-11 1888-07-11 1888-08-11 1888-09-11 1888-10-11 1888-11-11 1888-12-11 1889-01-11 1889-02-11 1889-03-11 1889-04-11 1889-05-11 1889-06-11 1889-07-11 1889-08-11 1889-09-11 1889-10-11 1889-11-11 1889-12-11 1890-01-11 1890-02-11 1890-03-11 1890-04-11 1890-05-11 1890-06-11 1890-07-11 1890-08-11 1890-09-11 1890-10-11 1890-11-11 1890-12-11 1891-01-11 1891-02-11 1891-03-11 1891-04-11 1891-05-11 1891-06-11 1891-07-11 1891-08-11 1891-09-11 1891-10-11 1891-11-11 1891-12-11 1892-01-11 1892-02-11 1892-03-12 1892-04-12 1892-05-12 1892-06-12 1892-07-12 1892-08-12 1892-09-12 1892-10-12 1892-11-12 1892-12-12 1893-01-12 1893-02-12 1893-03-12 1893-04-12 1893-05-12 1893-06-12 1893-07-12 1893-08-12 1893-09-12 1893-10-12 1893-11-12 1893-12-12 1894-01-12 1894-02-12 1894-03-12 1894-04-12 1894-05-12 1894-06-12 1894-07-12 1894-08-12 1894-09-12 1894-10-12 1894-11-12 1894-12-12 1895-01-12 1895-02-12 1895-03-12 1895-04-12 1895-05-12 1895-06-12 1895-07-12 1895-08-12 1895-09-12 1895-10-12 1895-11-12 1895-12-12 1896-01-12 1896-02-12 1896-03-13 1896-04-13 1896-05-13 1896-06-13 1896-07-13 1896-08-13 1896-09-13 1896-10-13 1896-11-13 1896-12-13 1897-01-13 1897-02-13 1897-03-13 1897-04-13 1897-05-13 1897-06-13 1897-07-13 1897-08-13 1897-09-13 1897-10-13 1897-11-13 1897-12-13 1898-01-13 1898-02-13 1898-03-13 1898-04-13 1898-05-13 1898-06-13 1898-07-13 1898-08-13 1898-09-13 1898-10-13 1898-11-13 1898-12-13 1899-01-13 1899-02-13 1899-03-13 1899-04-13 1899-05-13 1899-06-13 1899-07-13 1899-08-13 1899-09-13 1899-10-13 1899-11-13 1899-12-13 1900-01-13 1900-02-13 1900-03-13 1900-04-13 1900-05-13 1900-06-13 1900-07-13 1900-08-13 1900-09-13 1900-10-13 1900-11-13 1900-12-13 1901-01-13 1901-02-13 1901-03-13 1901-04-13 1901-05-13 1901-06-13 1901-07-13 1901-08-13 1901-09-13 1901-10-13 1901-11-13 1901-12-13 1902-01-13 1902-02-13 1902-03-13 1902-04-13 1902-05-13 1902-06-13 1902-07-13 1902-08-13 1902-09-13 1902-10-13 1902-11-13 1902-12-13 1903-01-13 1903-02-13 1903-03-13 1903-04-13 1903-05-13 1903-06-13 1903-07-13 1903-08-13 1903-09-13 1903-10-13 1903-11-13 1903-12-13 1904-01-13 1904-02-13 1904-03-14 1904-04-14 1904-05-14 1904-06-14 1904-07-14 1904-08-14 1904-09-14 1904-10-14 1904-11-14 1904-12-14 1905-01-14 1905-02-14 1905-03-14 1905-04-14 1905-05-14 1905-06-14 1905-07-14 1905-08-14 1905-09-14 1905-10-14 1905-11-14 1905-12-14 1906-01-14 1906-02-14 1906-03-14 1906-04-14 1906-05-14 1906-06-14 1906-07-14 1906-08-14 1906-09-14 1906-10-14 1906-11-14 1906-12-14 1907-01-14 1907-02-14 1907-03-14 1907-04-14 1907-05-14 1907-06-14 1907-07-14 1907-08-14 1907-09-14 1907-10-14 1907-11-14 1907-12-14 1908-01-14 1908-02-14 1908-03-15 1908-04-15 1908-05-15 1908-06-15 1908-07-15 1908-08-15 1908-09-15 1908-10-15 1908-11-15 1908-12-15 1909-01-15 1909-02-15 1909-03-15 1909-04-15 1909-05-15 1909-06-15 1909-07-15 1909-08-15 1909-09-15 1909-10-15 1909-11-15 1909-12-15 1910-01-15 1910-02-15 1910-03-15 1910-04-15 1910-05-15 1910-06-15 1910-07-15 1910-08-15 1910-09-15 1910-10-15 1910-11-15 1910-12-15

@LisaBock
Copy link
Contributor Author

I tried now to add to the preprocessor regrid_time as sugested from @Peter9192

preprocessors:

  clim_ref:
    regrid:
      target_grid: reference_dataset
      scheme: linear
    regrid_time:
      frequency: mon
    multi_model_statistics:
      span: full
      statistics: [mean]
      exclude: [reference_dataset]

And now it works!

@valeriupredoi
Copy link
Contributor

@LisaBock are you running with monthly or yearly means data? The fix from #677 is for yearly data only 🍺

@valeriupredoi
Copy link
Contributor

also those timestamps look fine to me (cheers for posting them) - what is the issue? There is no risk of spillover since they reset back to 1 if they reach the end of the month - that is, they look good if the data is monthly means

@LisaBock
Copy link
Contributor Author

@LisaBock are you running with monthly or yearly means data? The fix from #677 is for yearly data only beer

I am running monthly data.

@valeriupredoi
Copy link
Contributor

OK then #677 will not change anything to your data; the monthly data MM dates seem fine to me - can you please state again what the problem is? Also bear in mind that if you see dates in the coord("time") that are exceeding your boundaries that's because this is the full option and those points that are from time stretches that are only for a certain model will be masked, so the actual data is smaller

@LisaBock
Copy link
Contributor Author

also those timestamps look fine to me (cheers for posting them) - what is the issue? There is no risk of spillover since they reset back to 1 if they reach the end of the month - that is, they look good if the data is monthly means

The problem is that it does not reset simply back to 1 but shift to 1 in the next month...

I extract two part of the dates:
1957-05-15 1957-06-15 1957-07-16 1957-08-17 1957-09-16 1957-10-17 1957-11-17 1957-12-18 1958-01-18 1958-02-19 1958-03-20 1958-04-20 1958-05-21 1958-06-21 1958-07-22 1958-08-23 1958-09-21 1958-10-22 1958-11-22 1958-12-23 1959-01-23 1959-02-24 1959-03-25 1959-04-25 1959-05-26 1959-06-26 1959-07-27 1959-08-28 1959-09-26 1959-10-27 1959-11-27 1959-12-28 1960-01-28 1960-02-29 1960-03-30 1960-04-30 1960-06-01 1960-07-01 1960-08-02 1960-09-03 1960-10-01 1960-11-02 1960-12-02 1961-01-03 1961-02-03 1961-03-04 1961-04-05 1961-05-05 1961-06-06 1961-07-06 1961-08-07 1961-09-08 1961-10-07 1961-11-08 1961-12-08 1962-01-09 1962-02-09 1962-03-10 1962-04-11 1962-05-11 1962-06-12 1962-07-12 1962-08-13 1962-09-14 1962-10-12 1962-11-13 1962-12-13 1963-01-14
and
1997-01-10 1997-02-11 1997-03-12 1997-04-10 1997-05-11 1997-06-11 1997-07-12 1997-08-12 1997-09-13 1997-10-14 1997-11-14 1997-12-15 1998-01-15 1998-02-16 1998-03-17 1998-04-16 1998-05-17 1998-06-17 1998-07-18 1998-08-18 1998-09-19 1998-10-20 1998-11-20 1998-12-21 1999-01-21 1999-02-22 1999-03-23 1999-04-21 1999-05-22 1999-06-22 1999-07-23 1999-08-23 1999-09-24 1999-10-25 1999-11-25 1999-12-26 2000-01-26 2000-02-27 2000-03-28 2000-04-26 2000-05-27 2000-06-27 2000-07-28 2000-08-28 2000-09-29 2000-10-30 2000-11-30 2001-01-01 2001-02-01 2001-03-02 2001-04-03 2001-05-01 2001-06-02 2001-07-02 2001-08-03 2001-09-03 2001-10-04 2001-11-05 2001-12-05 2002-01-06 2002-02-06

You see that there is always one month missing when changing from 30 to 1. And therefor it adds 14 month in the end. All models end in December 1999 but the MultiModelMean 14 month later...

But as I said with regrid_time it is solved.

@Peter9192
Copy link
Contributor

could you test your failed recipe with #677 please mate?

I can confirm that this fixes our recipe. Let me have a better look at the PR though, see you there. 🍺

@valeriupredoi
Copy link
Contributor

@LisaBock yes but the actual data point that spills over will be masked so will not affect the actual computation. Can you pls confirm that? 🍺

@valeriupredoi
Copy link
Contributor

also - beats me if I can see any missing month - can you post that bit of time points (+/- 1 month before and after), it might be just me being blind 😁

@LisaBock
Copy link
Contributor Author

LisaBock commented Jun 17, 2020

@valeriupredoi
these are the critical points:
1960-02-29 1960-03-30 1960-04-30 1960-06-01 1960-07-01 1960-08-02 1960-09-03
and
2000-09-29 2000-10-30 2000-11-30 2001-01-01 2001-02-01 2001-03-02 2001-04-03

the problem is that for every month which is missing (here: 1960-05 and 2000-12) the time period is extended by one month. As I said all models end in 1999-12 but the MulitModelMean is extended to 2001-04.

@valeriupredoi
Copy link
Contributor

aaah now I see - cheers for the clarification! Not good 😁 - let me extend #677 to monthly data too

@Peter9192
Copy link
Contributor

Seems I was too quick to confirm. The duplicates are gone because of #671, but the time array is still showing these strange offsets, and for longer time ranges than in my previous example, they also start wandering, like Lisa described.

@valeriupredoi
Copy link
Contributor

Seems I was too quick to confirm. The duplicates are gone because of #671, but the time array is still showing these strange offsets, and for longer time ranges than in my previous example, they also start wandering, like Lisa described.

this time round for yearly means data right?

@Peter9192
Copy link
Contributor

yep. So the MWE now produces single time stamps per year (which is good, much better than before), but they're not at the first of the month, and for longer time arrays they start shifthing.

@valeriupredoi
Copy link
Contributor

can you pls post a snippet of the time points?

@Peter9192
Copy link
Contributor

In [2]: import iris; time = iris.load_cube('~MultiModelMean_Amon_tas_1961-1999.nc').coord('time'); print(time)     
DimCoord([1961-07-28 00:00:00, 1962-07-28 00:00:00, 1963-07-28 00:00:00,
       1964-07-29 00:00:00, 1965-07-29 00:00:00, 1966-07-29 00:00:00,
       1967-07-29 00:00:00, 1968-07-30 00:00:00, 1969-07-30 00:00:00,
       1970-07-30 00:00:00, 1971-07-30 00:00:00, 1972-07-31 00:00:00,
       1973-07-31 00:00:00, 1974-07-31 00:00:00, 1975-07-31 00:00:00,
       1976-08-01 00:00:00, 1977-08-01 00:00:00, 1978-08-01 00:00:00,
       1979-08-01 00:00:00, 1980-08-02 00:00:00, 1981-08-02 00:00:00,
       1982-08-02 00:00:00, 1983-08-02 00:00:00, 1984-08-03 00:00:00,
       1985-08-03 00:00:00, 1986-08-03 00:00:00, 1987-08-03 00:00:00,
       1988-08-04 00:00:00, 1989-08-04 00:00:00, 1990-08-04 00:00:00,
       1991-08-04 00:00:00, 1992-08-05 00:00:00, 1993-08-05 00:00:00,
       1994-08-05 00:00:00, 1995-08-05 00:00:00, 1996-08-06 00:00:00,
       1997-08-06 00:00:00, 1998-08-06 00:00:00, 1999-08-06 00:00:00], standard_name='time', calendar='365_day', var_name='time')

@valeriupredoi
Copy link
Contributor

right, cheers, I know where the bugger is! 🍺

@valeriupredoi
Copy link
Contributor

OK can you guys please give it one more test with #677 - I have shifted the points for full and moved them to the middle of the month so there is no corner case anymore. Hopefully that will fix it 🍺

@Peter9192
Copy link
Contributor

Works for me!

@valeriupredoi
Copy link
Contributor

brilliant! cheers for the quick test @Peter9192 🍺 @LisaBock please don't dash my hopes 😁

@LisaBock
Copy link
Contributor Author

@valeriupredoi I am very sorry...
But for me the same error occured.
Here the critical part of the dates in the MultiModelMean preproc file:
1916-07-31 1916-08-31 1916-10-01 1916-10-31 1916-12-01 1916-12-31 1917-01-31 1917-03-03 1917-03-31 1917-05-01 1917-05-31
Some month occur twice (e.g. 1916-12) and others not even once (1916-11). But the shift of days is now much slowlier, that's why I didn't see it in my test file yesterday...

@valeriupredoi
Copy link
Contributor

looking at it right now. I am smelling a cmip3 particular issue, just noticed your isse is from using cmip3 data. Just got back to Jasmin now so I can do proper debugging 🍺

@valeriupredoi
Copy link
Contributor

@LisaBock a few points:

  • the recipe you posted has wonky data arguments for CMIP3, the data on BADC has this sort of parameters:
      tas:
        preprocessor: clim_ref
        reference_dataset: HadCRUT4
        project: CMIP3
        mip: A1
        modeling_realm: atmos
        exp: 20c3m
        frequency: mon
        ensemble: r1
        start_year: 1850
        end_year: 1999
  • what value did you use (if at all) for --check-level? fix_metadata hangs for me for any value including default

@valeriupredoi
Copy link
Contributor

@LisaBock could you give it one more test please: I managed to reproduce your issue (am back on Jasmin yay!) and @Peter9192 spotted where it actually stemmed from so we fixed it in #677 (hopefully!) 🍺

@LisaBock
Copy link
Contributor Author

@valeriupredoi and @Peter9192 Thanks a lot! It works now for me! Thank you! Thank you! Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants