Catalog consistency MDTF and user data catalog #588

aradhakrishnanGFDL · 2024-06-07T19:41:43Z

What problem will this feature solve?

Achieves some level of consistency in the input data catalog (from GFDL catalog builder) and the MDTF intermediate catalogs in PP.

Important so users that are new to catalogs can learn one set of terms and specs/template for the data catalog, as they get started.

Helps both GFDL analysis scripts with and without MDTF to use a common catalog and hence improve interoperability.

Helps with training material, shared across GFDL and CESM, and for model inter-comparison projects.

Describe the solution you'd like
To the aggregate_columns:
https://github.com/NOAA-GFDL/MDTF-diagnostics/blob/c87746c7e19870806b025c79c90f96cc33c1d173/src/util/catalog.py#L205:L216

Add: chunk_freq ,
Change: variant_label to member_id (MDTF)
Consider:
For recording the “convention”, evaluate reusing the CMIP CV.
“project_id” as the column name. Example: project_id = CMIP, project_id = dev , project_id = GFDL.

If activity_id is not being used, can it be removed or moved outside of aggregate columns? It was originally used to filter by “MIP” in CMIP6. It could be an “optional” column, rather than in aggregate_columns.

Ordering of the aggregate columns can also be maintained, so that a user that typically uses a "key pattern" to query a dataset is less confused.

Here is how the GFDL catalog builder template looks like (to be merged in, more changes pending):
https://github.com/aradhakrishnanGFDL/CatalogBuilder/blob/129-cmip/cats/gfdl_template.json#L79:L88

(note that modeling_realm will be changed to realm in the above; temporal_subset will be changed to time_range)

Describe alternatives you've considered

Alternate way of handling things considered and following actions to be taken from the GFDL Catalog builder side to help synchronize the data catalog template with MDTF.
(Following is NOT for MDTF framework suggested changes)

Change: modeling_realm to realm (GFDL Catalog Builder)
Change: temporal_subset to time_range (GFDL Catalog Builder)

If there are changes that do not resonate with the framework goals or catalog usage, please raise them to discuss further and rethink solutions.

aradhakrishnanGFDL assigned wrongkindofdoctor Jun 7, 2024

wrongkindofdoctor added the data catalogs Issues related to intake esm data catalogs label Jun 7, 2024

wrongkindofdoctor linked a pull request Jun 10, 2024 that will close this issue

Refactor catalog write #587

Merged

11 tasks

wrongkindofdoctor mentioned this issue Jun 10, 2024

update mdtf output catalog columns #590

Merged

11 tasks

aradhakrishnanGFDL mentioned this issue Jul 23, 2024

schema edits for MDTF interoperability NOAA-GFDL/CatalogBuilder#10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Catalog consistency MDTF and user data catalog #588

Catalog consistency MDTF and user data catalog #588

aradhakrishnanGFDL commented Jun 7, 2024

Catalog consistency MDTF and user data catalog #588

Catalog consistency MDTF and user data catalog #588

Comments

aradhakrishnanGFDL commented Jun 7, 2024