Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing integration of daops as backend for cmip6_preprocessing #55

Open
jbusecke opened this issue Feb 18, 2021 · 1 comment
Open

Testing integration of daops as backend for cmip6_preprocessing #55

jbusecke opened this issue Feb 18, 2021 · 1 comment

Comments

@jbusecke
Copy link

Hi everyone,

I firstly wanted to say thank you for all the efforts that have already been put into this framework. I would love to contribute/integrated daops more into my workflow.

I am maintaining cmip6_preprocessing and am very interested in migrating some of the things I fix (in a quite ad-hoc fashion for now) in a more general way over here.

My primary goal for cmip6_preprocessing is to use it with python and the scientific pangeo stack, but I like the idea of documenting the actual problems (needing 'fixes') in an general and language-agnostic way over here. I was very impressed by the demo @agstephens gave a while ago during the CMIP6 cloud meeting and am now thinking of finally getting to work on this.

I am still really unsure how to actually contribute fixes to this repo, though. What I propose is to work my way through this using some quite simple fixes that are relatively easy to apply and are already documented in errata.

Specifically, I am currently testing this python code, which changes some of the metadata necessary to determine the point in time where a dataset was branched of the parent model run.

def fix_metadata_issues(ds):
    # https://errata.es-doc.org/static/view.html?uid=2f6b5963-f87e-b2df-a5b0-2f12b6b68d32
    if ds.attrs["source_id"] == "GFDL-CM4" and ds.attrs["experiment_id"] in [
        "1pctCO2",
        "abrupt-4xCO2",
        "historical",
    ]:
        ds.attrs["branch_time_in_parent"] = 91250
    # https://errata.es-doc.org/static/view.html?uid=61fb170e-91bb-4c64-8f1d-6f5e342ee421
    if ds.attrs["source_id"] == "GFDL-CM4" and ds.attrs["experiment_id"] in [
        "ssp245",
        "ssp585",
    ]:
        ds.attrs["branch_time_in_child"] = 60225
    return ds

These ingest an xarray.dataset and check certain conditions within the attributes, and then overwrite attributes accordingly. I could easily parse those out to dataset-specific 'fixes'.

Where exactly could I translate this into a fix within the daops framework? Very happy to start a PR (and then test the implementation from cmip6_preprocessing), but I am afraid I am still a bit unsure about the daops internals. Any pointers would be greatly appreciated.

@agstephens
Copy link
Collaborator

Hi @jbusecke, It's great that you would like to get more involved with developing daops and (dachar) as a suitable approach.

There are multiple components that will be relevant here:

  1. A generic python function to apply the fix, we populate this function:
    https://github.com/roocs/daops/blob/master/daops/data_utils/attr_utils.py

with some code like:

def set_attributes(ds, **operands):
    """
    :param ds: Xarray Dataset
    :param operands: (dict) Each `key/value` pair will be set as attributes on the Dataset.
    :return: Xarray Dataset
    """
    for key, value in operands.items():
        ds.attrs[key] = value

    return ds
  1. We add a fix class in dachar that represents this fix:

See example: https://github.com/roocs/dachar/blob/master/dachar/fixes/coord_fixes.py

We will need to add:

https://github.com/roocs/dachar/blob/master/dachar/fixes/attr_fixes.py

  • maybe add the class: SetAttributesFix
  1. Gather your collection of Dataset Identifiers and the changes that you want to make.

  2. Working with my colleague @ellesmith88, we can use this command-line tool to work with the fix pipeline:

https://github.com/roocs/dachar/blob/master/dachar/cli.py

  • propose fixes
  • review fixes
  • publish fixes (to ElasticSearch)
  1. Test that the fixes work when you import daops and do some processing on one of those datasets.

We are happy/keen to work with you to prototype this - and to get your feedback on how we make it all more useful and accessible.

jbusecke added a commit to jbusecke/xMIP that referenced this issue Jun 22, 2021
This implements a hardcoded way of fixing faulty metadata. I plan to replace this eventually with roocs/daops#55 (comment)
jbusecke added a commit to jbusecke/xMIP that referenced this issue Jun 23, 2021
* Implement metadata fixing

This implements a hardcoded way of fixing faulty metadata. I plan to replace this eventually with roocs/daops#55 (comment)

* Update test_preprocessing.py

* linter correction

* some more linter fixes

* Update whats-new.rst
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants