Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drop spatial bounds coordinates if present in cmip6 cleaning #177

Conversation

emileten
Copy link
Contributor

@emileten emileten commented Feb 10, 2022

The presence of coordinates values associated with the spatial bounds dimension can cause errors downstream, in particular when doing distributed regridding. We have an example of this kind of problem in the above linked issue.

Since the libraries we use do not need this information -- these libraries have been doing their job correctly when exposed to datasets that do not have these coordinates -- I decided to simply drop this information in cleaning.

This only drops the coordinates, not the dimension, which we need.

@emileten
Copy link
Contributor Author

emileten commented Feb 10, 2022

I haven't tested this in an actual workflow. I checked this in a notebook.

I was hoping to merge this in master to be able to test it in a workflow using dodola:dev, so as to avoid convoluted workflow manipulation.

@brews could you have a look ? Let me know if you see any problem with this change. Thank you !

@emileten emileten self-assigned this Feb 10, 2022
@emileten emileten added the bug Something isn't working label Feb 10, 2022
@emileten emileten changed the title drop bounds coordinate values if present in cmip6 cleaning drop spatial bounds coordinates if present in cmip6 cleaning Feb 10, 2022
@emileten emileten requested a review from brews February 10, 2022 09:53
@@ -357,6 +357,12 @@ def standardize_gcm(ds, leapday_removal=True):
coords_to_drop, drop=True
)

# Some models have coordinates values (e.g. [1.0, 2.0]) for the spatial bounds dimension. We don't need this.
if "bnds" in ds_cleaned.variables:
ds_cleaned = ds_cleaned.drop(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ds_cleaned = ds_cleaned.drop(
ds_cleaned = ds_cleaned.drop_vars(

I think .drop() was lightly deprecated in favor of more specific .drop_vars() or .drop_sel() back in pydata/xarray#3475.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...Interesting because xarray's error message says to use .drop():

ValueError: when setting region explicitly in to_zarr(), all variables in the dataset to write must have at least one dimension in common with the region's dimensions ['time'], but that is not the case for some variables here. To drop these variables from this dataset before exporting to zarr, write: .drop(['bnds'])

Might be worth a PR to xarray updating this message.

@brews
Copy link
Member

brews commented Feb 10, 2022

Thanks for making progress on this, @emileten!

As I look at this, I see there might actually be a better way around this back in the regridding workflowtemplate... let me try something and I'll follow up here or in ClimateImpactLab/downscaleCMIP6#494

@brews
Copy link
Member

brews commented Feb 14, 2022

I'm going to close this because I think we resolved this problem elsewhere. Please reopen or file a new issue if still needed.

@brews brews closed this Feb 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working invalid This doesn't seem right
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants