Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nwdFracLut cmorized files broken #503

Open
zklaus opened this issue Jun 24, 2019 · 8 comments
Open

nwdFracLut cmorized files broken #503

zklaus opened this issue Jun 24, 2019 · 8 comments

Comments

@zklaus
Copy link

zklaus commented Jun 24, 2019

The Problem

This might be related to #469.
We could cmorize nwdFracLut, but the result is broken.

Here is an excerpt from ncdump -h nwdFracLut_Emon_EC-Earth3-Veg_historical_r1i1p1f1_gr_185001-185012.nc

dimensions:
        time = UNLIMITED ; // (12 currently)
        landuse = 4 ;
        strlen = 26 ;
variables:
        char sector(landuse, strlen) ;
                sector:long_name = "Land use type" ;
                sector:standard_name = "area_type" ;
        char type(time) ;
                type:long_name = "Non-Woody Vegetation area type" ;
                type:standard_name = "area_type" ;
        float nwdFracLut(time, landuse, lat, lon) ;
                nwdFracLut:coordinates = "type" ;

from ncdump -v type nwdFracLut_Emon_EC-Earth3-Veg_historical_r1i1p1f1_gr_185001-185012.nc we get

data:

 type = "" ;

and from ncdump -v sector nwdFracLut_Emon_EC-Earth3-Veg_historical_r1i1p1f1_gr_185001-185012.nc

data:

 sector =
  "primary_and_secondary_land",
  "pastures",
  "crops",
  "urban" ;

This has at least two problems:

  • sector is not linked to nwdFracLut
  • type claims to depend on time, but offers no data at all.

Figuring out what this should look like is much more difficult, and judging from the currently available data on ESGF isn't really understood uniformly in the community. Perhaps clarification with an example from LUMIP would be nice.

Suggested Changes

Nevertheless, somewhat following IPSLs lead, I suggest to change this as follows:

  • turn type into a scalar variable by removing the time dependence, adding a dependence on strlen and giving it the value herbaceous_vegetation as given in CMIP6_coordinate.json
  • add sector to the coordinates attribute of nwdFracLut so that it reads "type sector"

Similar Variables are OK

Similar corrections for other variables dealing with area_types seem not to be necessary.
Upon closer inspection of the other type related Emon variables, it seems they are ok, including a correct relationship with sector. The difference is that nwdFracLut seems to be the only one that has an explicit type coordinate and, as far as I can tell, that is because in the data request it was determined that there is no "non-woody" area_type in the official CF area_types and hence there was a perceived need to provide that information in this scalar coordinate.

What do the domain experts think? @etiennesky @nierad

@zklaus
Copy link
Author

zklaus commented Sep 23, 2019

To clarify some questions that appeared offline in e-mails:
This is not about the input (which is the lpjg output), but about the
final cmorized data. So maybe it is helpful to rather think how you will
understand the output of other models in your future analysis from the
data as provided on esgf. In other words, if you want to compare
nwdFracLut among models, can you read the netcdf files and understand
what's going on. For EC-Earth this is not the case at the moment.

An example file from IPSL is available. An except from its ncdump -v type output follows

dimensions:
        axis_nbounds = 2 ;
        lon = 144 ;
        lat = 143 ;
        landuse = 4 ;
        str_len = 255 ;
        time = UNLIMITED ; // (1980 currently)

variables:
        char sector(landuse, str_len) ;
                sector:name = "sector" ;
                sector:standard_name = "area_type" ;
                sector:long_name = "Land use type" ;
                sector:units = "1" ;
        char type(str_len) ;
                type:name = "type" ;
                type:standard_name = "area_type" ;
                type:long_name = "Non-Woody Vegetation area type" ;
        float nwdFracLut(time, landuse, lat, lon) ;
                nwdFracLut:coordinates = "type sector" ;
data:

 type = "typenwd" ;

@etiennesky
Copy link
Contributor

Hi @zklaus which version of ece2cmor3 did you use to cmorize the variable? I reported in #469 that I get this on marenostrum4 with 1.2.0.

@zklaus
Copy link
Author

zklaus commented Sep 26, 2019

According to the global attributes it was

processed by ece2cmor vv1.1.0, git rev. 032f6287076b212e5c49922af94a0ddecb191a16

@nierad
Copy link
Collaborator

nierad commented Feb 4, 2020

Hi,
I found some time to pick up this issue again. It is a little beyond my python-knowledge, I'm afraid.
There are - so far - two variables that have so called singleton dimensions, dimensions with only one entry. It is pastureFrac (dim "typepasture"; Lmon) and nwdFracLut (dim "typenwd"; Emon). Furthermore, there are two more character-type (categorical) dimensions with more than one entry (vegtype (e.g. Lmon) and landUse (e.g. Emon)).
nwdFracLut is the only variable with two categorical dimensions (typenwd and landuse).
As long as there has been only one, cmorization seems to have gone smooth. But with a second one things get messed up. This is probably why it crashes on some machines (as in #469).

For comparison, I have uploaded 3 files to
http://stormbringer.nateko.lu.se/public/lars/cmorization/
laiLut(time,landuse,lat,lon)
pastureFrac(time,lat,lon,type)
nwdFracLut(time,landuse,lat,lon,type)

with bold dimensions being categorical dimensions and are created in lpjg2cmor.py in execute_single_task(...).
Assuming the laiLut- and pastureFrac-files are fine, what seems to go wrong with nwdFracLut is that nwdFracLut:coordinates="sector" (still ok in laiLut) is overwritten by the later defined "type" (as in pastureFrac) and thus becomes dereferenced. Why "type" now has dimension time is beyond me...

I hope anyone can make some sense of this.

Thanks,
Lars

@zklaus
Copy link
Author

zklaus commented Feb 7, 2020

Thanks @nierad, your analysis seems correct to me. @treerink, do you have an idea where the creation of two categorical coordinates might lead to problems?

@nierad
Copy link
Collaborator

nierad commented Feb 7, 2020

@zklaus : It's probably the way they are created. As in your post from IPSL-data above it seems that there is a way.

@zklaus
Copy link
Author

zklaus commented Feb 7, 2020

Oh yeah, my question to @treerink was basically; What is the line number in ece2cmor?

@treerink
Copy link
Collaborator

In lpjg2cmor.py I found the function execute_single_task (line 537) in which the coordinates landUse_axis & vegtype_axis are created when available in tasks. This execute_single_task is called in execute (line 156). In this function starting at line 247 we have (creating the tasks):

                # if this is a land use variable create cmor land use axis
                if "landUse" in outdims.split():
                    create_landuse_axis(task, lpjgfile, freq)

                # if this is a pft variable (e.g. landCoverFrac) create cmor vegtype axis
                if "vegtype" in outdims.split():
                    create_vegtype_axis(task, lpjgfile, freq)

                # if this variable has the soil depth dimension sdepth 
                # (NB! not sdepth1 or sdepth10) create cmor sdepth axis
                if "sdepth" in outdims.split():
                    create_sdepth_axis(task, lpjgfile, freq)

                # if this variable has one or more "singleton axes" (i.e. axes 
                # of length 1) which can be those dimensions 
                # named "type*", these will be created here
                for lpjgcol in outdims.split():                    
                    if lpjgcol.startswith("type"): 
                        # THIS SHOULD BE LINKED TO CIP6_coordinate.json!
                        if lpjgcol == "typenwd":
                            singleton_value = "herbaceous_vegetation"
                        elif lpjgcol == "typepasture":
                            singleton_value = "pastures"
                        else:
                            continue
                        create_singleton_axis(task, lpjgfile, str(lpjgcol), singleton_value)

                # cmorize the current task (variable)
                execute_single_task(dataset, task)

See also create_singleton_axis.

In postproc.py I see a call to the function get_z_axes (which is defined in cdoapi.py), reading a z_axes. I guess all which not lon, lat, time is considered as z_axes.

From the tables (CMIP6_Emon.json & CMIP6_coordinate.json):

        "nwdFracLut": {
            "frequency": "mon", 
            "modeling_realm": "land", 
            "standard_name": "area_fraction", 
            "units": "%", 
            "cell_methods": "area: mean where land over all_area_types time: mean", 
            "cell_measures": "area: areacella", 
            "long_name": "Non-Woody Vegetation Percentage Cover", 
            "comment": "Percentage of land use tile tile that is non-woody vegetation ( e.g. herbaceous crops)", 
            "dimensions": "longitude latitude landUse time typenwd", 
            "out_name": "nwdFracLut", 
            "type": "real", 
            "positive": "", 
            "valid_min": "", 
            "valid_max": "", 
            "ok_min_mean_abs": "", 
            "ok_max_mean_abs": ""
        }, 
        "typenwd": {
            "standard_name": "area_type", 
            "units": "", 
            "axis": "", 
            "long_name": "Non-Woody Vegetation area type", 
            "climatology": "", 
            "formula": "", 
            "must_have_bounds": "no", 
            "out_name": "type", 
            "positive": "", 
            "requested": "", 
            "requested_bounds": "", 
            "stored_direction": "", 
            "tolerance": "", 
            "type": "character", 
            "valid_max": "", 
            "valid_min": "", 
            "value": "herbaceous_vegetation", 
            "z_bounds_factors": "", 
            "z_factors": "", 
            "bounds_values": "", 
            "generic_level_name": ""
        }, 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants