Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more metadata to the output file... #506

Open
ekluzek opened this issue Sep 30, 2019 · 34 comments · Fixed by #513
Open

Add more metadata to the output file... #506

ekluzek opened this issue Sep 30, 2019 · 34 comments · Fixed by #513

Comments

@ekluzek
Copy link

ekluzek commented Sep 30, 2019

The mapping files from OCGIS are pretty bare bones and need more meta-data added to them.

I'd like to see the same sort of metadata that are on the ESMF RegridWeights mapping files. Such as...

// global attributes:
:title = "ESMF Offline Regridding Weight Generator" ;
:normalization = "destarea" ;
:map_method = "Conservative remapping" ;
:ESMF_regrid_method = "First-order Conservative" ;
:conventions = "NCAR-CSM" ;
:domain_a = "/glade/p/cesm/cseg/inputdata/lnd/clm2/mappingdata/grids/SCRIPgrid_0.25x0.25_MODIS_c170321.nc" ;
:domain_b = "/glade/p/cesm/cseg/inputdata/lnd/clm2/mappingdata/grids/0.9x1.25_c110307.nc" ;
:grid_file_src = "/glade/p/cesm/cseg/inputdata/lnd/clm2/mappingdata/grids/SCRIPgrid_0.25x0.25_MODIS_c170321.nc" ;
:grid_file_dst = "/glade/p/cesm/cseg/inputdata/lnd/clm2/mappingdata/grids/0.9x1.25_c110307.nc" ;
:CVS_revision = "6.3.0r" ;

We also add the hostname run on, the user-name of the user doing it, and the "history", so the date done and the exact command that was launched.

@bekozi
Copy link
Contributor

bekozi commented Sep 30, 2019

Some of this functionality maybe needs to go into ESMF's concurrent weight file write routine. Regardless, this should be a straightforward improvement.

@bekozi
Copy link
Contributor

bekozi commented Mar 23, 2020

@rokuingh Added filemode option to ESMPy (https://github.com/esmf-org/esmf/tree/ESMPy-filemode). I've started integrating this into the chunked regridding.

Ping @slevisconsulting

@bekozi
Copy link
Contributor

bekozi commented Mar 23, 2020

@bekozi
Copy link
Contributor

bekozi commented Mar 24, 2020

This is implemented but will require a beta snapshot of ESMF to work. The weight file output is equivalent to the standard ESMF weight file with auxiliary variables and attributes.

@ekluzek I wanted to follow-up on:

We also add the hostname run on, the user-name of the user doing it, and the "history", so the date done and the exact command that was launched.

We could add arbitrary attributes to the output weight file using a JSON string as an argument to ocli. Is this something that sounds appealing?

@ekluzek
Copy link
Author

ekluzek commented Mar 25, 2020

@bekozi hmmm. I'm not sure there's much of a reason to add an arbitrary string as a global attribute to the file, as I can use NCO to add it easily afterwards. But, what about adding those specific things: hostname, user-name, and date? The CF convention has "history" as a standard global attribute, it typically is the date/time that the command line for the creation program/script that was run. username and hostname could be additional things tacked on as well. I think all of these are pretty standard things that are useful to see and document the file and how it was created. I've found this kind of documentation to be extremely helpful when you go back later and try to figure out how a file was created. This is the kind of thing that I continually have to do over and over again, some documentation in the global attributes make it easy -- but otherwise it can be difficult to impossible to do.

@bekozi
Copy link
Contributor

bekozi commented Mar 25, 2020

@ekluzek Got it. Let me cook something up and get back to you with an example.

@bekozi
Copy link
Contributor

bekozi commented Mar 30, 2020

@ekluzek I added the three attributes. They look like:

created_by_user     :: 'benkoziol'
created_on_hostname :: 'system76-laptop'
created_at_datetime :: '2020-03-30 09:42:24.216163'

The names/values can be adjusted fairly easily. I think the user and hostname retrieval are pretty portable, but it may take some fine tuning on some platforms.

@ekluzek
Copy link
Author

ekluzek commented Mar 30, 2020

@bekozi that's great, that gives me the kind of metadata that I've found to be really useful. One other thing I've found useful is the version of the program or script that created the file. For something checked out under git, I store the output of "git describe".

And just to point you to the CF conventions for attributes. I don't know if you are trying to follow any specific conventions -- but that's a good one to follow. The history attribute on it is useful as it both adds the creation date, as well as the program that produced it. And then if someone manipulates it again that manipulation will be added to the history. So history is a good attribute to follow the convention for.

Here's the CF conventions...

http://cfconventions.org/cf-conventions/cf-conventions.html#attribute-appendix

@bekozi
Copy link
Contributor

bekozi commented Mar 31, 2020

@ekluzek In general, these weight files do not follow a convention (I guess it's a SCRIP weight file but no real convention around that). I can add the CF history attribute to the output weight files no problem. Is this where you'd prefer to have the creation information as well? I guess I'm asking if you'd prefer to have the "created" attributes in addition to the "history" attribute.

@ekluzek
Copy link
Author

ekluzek commented Mar 31, 2020

The creation date is best off in the history attribute, because you can then figure out any follow on history. If you have creation_date as a separate attribute, it's not clear to what operation it applies to when there is a string of manipulations on the file. But, the user and hostname don't necessarily lend themselves to easily go into "history". So I've put them as separate attributes and then just need to know that it goes with the original operation on the file, rather than any subsequent ones.

@bekozi
Copy link
Contributor

bekozi commented Apr 1, 2020

Makes sense to me. I'll take this opportunity to format the ocli command line arguments into the history string. Will be back with an example for review.

@bekozi
Copy link
Contributor

bekozi commented Apr 1, 2020

@ekluzek How does this look?

// global attributes:
		:created_by_user = "benkoziol" ;
		:created_on_hostname = "system76-laptop" ;
		:history = "2020-04-01 10:02:49.028146: Created by ocgis (v2.1.1) and ESMF (v8.1.0 beta snapshot) with CLI command: ocli chunked-rwg --weightfilemode BASIC --loglvl INFO --no_verbose False --spatial_subset_path /tmp/ocgis_test_p5i8p9n3/spatial_subset.nc --no_ignore_degenerate False --wd /tmp/ocgis_test_p5i8p9n3/chunks --esmf_regrid_method BILINEAR --esmf_dst_type GRIDSPEC --esmf_src_type GRIDSPEC --weight /tmp/ocgis_test_p5i8p9n3/weights.nc --destination /tmp/ocgis_test_p5i8p9n3/destination.nc --source /tmp/ocgis_test_p5i8p9n3/source.nc" 

@ekluzek
Copy link
Author

ekluzek commented Apr 1, 2020

Perfect. Works for me.

@bekozi
Copy link
Contributor

bekozi commented Apr 3, 2020

Great! I'll work on getting this and the esmf branch merged.

@bekozi
Copy link
Contributor

bekozi commented Apr 8, 2020

For reference, the associated esmpy PR is: esmf-org/esmf#4

@bekozi
Copy link
Contributor

bekozi commented Aug 17, 2020

@slevisconsulting - I'm reopening this to address the issue related to writing auxiliary coordinate variables for high resolution grids. I'm planning to enable the appropriate flags in an ESMF branch to confirm this will fix the problem. I'll then add the appropriate parameters to ESMPy and ocgis.

@bekozi bekozi reopened this Aug 17, 2020
@slevis-lmwg
Copy link

Thank you @bekozi

For my benefit, I'm linking this issue to my PR here.

@bekozi
Copy link
Contributor

bekozi commented Aug 20, 2020

@rokuingh is adding the 64-bit offset flag to ESMPy. He also identified an issue where the file types were not passed to ESMF routines correctly. I'll bring the offset flag into ocli once it's ready in ESMPy. I tested statically setting the flags for the higher resolution UGRID->SCRIP case using a reproducer from @slevisconsulting, and the operation works with auxiliary coordinates.

bekozi pushed a commit that referenced this issue Aug 21, 2020
- Add 64bit_offset option to ocli
- Pass large_file only when it is True allowing older ESMF versions to work
- Add flag documentation
@slevis-lmwg
Copy link

New concern relating to auxiliary data in the context of CTSM's surface data generation (with a piece of very good news):

Running ./mksurfdata_map to generate a surface dataset appears to work now! However, the corresponding log file shows zeros for all variable areas at both the input (raw data) resolutions as well as the output (surface data) resolution. This is because auxiliary variables areaa and areab contain all zeros. This makes CTSM's error-checking unusable.

@bekozi
Copy link
Contributor

bekozi commented Oct 12, 2020

@rokuingh ESMPy's auxiliary variable support will need to be modified to include areas when writing weight files. Is this possible within the current implementation of WITHAUX?

@rokuingh
Copy link
Contributor

I am no expert on ESMF IO, but it looks like the routine that is responsible for writing the weight files does indeed handle the areas (and fractions). The routine consists of a couple thousand lines of Fortran. A quick pass through the code seems to imply that areas are only written when using the conservative method.

@slevis-lmwg
Copy link

it looks like the routine that is responsible for writing the weight files does indeed handle the areas (and fractions). The routine consists of a couple thousand lines of Fortran. A quick pass through the code seems to imply that areas are only written when using the conservative method.

Thank you, @rokuingh
@bekozi if by "conservative method" we mean this option --esmf_regrid_method CONSERVE, then this is what we're doing. So the problem remains that the area variables areaa and areab are all zeros in all the weight files that I've looked at.

@rokuingh
Copy link
Contributor

I will debug this further later this week. Could one of you please send me the aforementioned reproducer?

@bekozi
Copy link
Contributor

bekozi commented Oct 14, 2020

I think the trouble is that the areas are difficult to connect to ESMF_OutputScripWeightFile the way esmpy is calling it. Another solution here is to put a Python wrapper on ESMF_RegridWeightGenFile. It does not necessarily look difficult to wrap, but it does look time consuming. Another option would be to call the CLI RWG to create the weights for each chunk combination and merge them afterwards. What do you think @rokuingh?

@slevis-lmwg
Copy link

I will debug this further later this week. Could one of you please send me the aforementioned reproducer?

qsub /glade/work/slevis/ocgis_work/no_subset_20200825_reproducer.sh

@slevis-lmwg
Copy link

@rokuingh
cc: @bekozi
is there an update regarding the aforementioned debugging?
This issue blocks the use of ocgis in CTSM's mkmapdata tool.

@rokuingh
Copy link
Contributor

rokuingh commented Feb 4, 2021

@slevisconsulting Sorry for the long wait, but I do have a good idea of how to proceed with this. I am working on the upcoming ESMF 8.1.0 release right now, but I have just been approved to work on this next. I will plan to have a snapshot for you before the end of the month.

@slevis-lmwg
Copy link

@rokuingh thank you for prioritizing this issue, I appreciate your help.

@rokuingh
Copy link
Contributor

@slevisconsulting I have been experimenting with this reproducer on Cheyenne, but I have not yet had a successful run even with a walltime of 1 hour. would you mind running this again on your end to make sure nothing has changed with the machine or environment that could explain the issues I am having? In the meanwhile I will move forward with adding the area variables to the weight files.

@slevis-lmwg
Copy link

@rokuingh I have not run this script in a while (likely since Oct 2020). Thank you for the heads-up about it failing. I will look into it soon.

Meanwhile, thank you for moving fwd with adding the area variables to the weight files.

@rokuingh
Copy link
Contributor

@slevisconsulting I've added the ability to write areas to the weight files generated by ESMPy using FileMode.WITHAUX. It is currently available on the develop branch of ESMF, but I could create a tag if that is more easily accessible, or something else? Also, I just realized that you will probably also need fractions since you are using conservative regridding. Please let me know if that is the case.

@rsdunlapiv
Copy link

@slevisconsulting have you been able to test the new weights files from ESMPy with areas added?
@rokuingh

@slevis-lmwg
Copy link

@rsdunlapiv @rokuingh thank you for checking in, and I apologize for not communicating since 4/23.

I'm afraid I haven't tested this. I had hoped to get the CTSM surface-data tool-chain fully working with ocgis while @bekozi was available. At this point I have set that work aside until I hear otherwise from @ekluzek @billsacks @dlawrenncar .

@billsacks
Copy link

We no longer plan to use OCGIS in the CTSM surface data tool chain so, from the CTSM perspective, it's fine for this issue to be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants