Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Horizontal regridding from limited domain to global #528

Closed
acordonez opened this issue Aug 9, 2023 · 20 comments · Fixed by #613
Closed

[Bug]: Horizontal regridding from limited domain to global #528

acordonez opened this issue Aug 9, 2023 · 20 comments · Fixed by #613
Assignees
Labels
type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@acordonez
Copy link
Contributor

acordonez commented Aug 9, 2023

What happened?

I was told this might be a bug, so documenting here. I'm regridding LOCA2 downscaled data on a CONUS domain to a global 2.5 degree grid using the xesmf tool and "conservative_normed" method. On the resulting grid, any cells that were not part of the original CONUS domain are populated with zeros.

Original data example:
dataset_mean_original

Regridded example:
dataset_mean_regridded

What did you expect to happen? Are there are possible answers you came across?

Based on this issue shared by @pochedls , it seems like this might be the expected behavior when using the xesmf tool to regrid data to a larger domain? But it would be helpful to have some ability to use NaN's as the fill value in stead of zero for areas outside the original domain.

Minimal Complete Verifiable Example (MVCE)

import xcdat as xc
from xcdat.regridder import grid

# This is not a freely accessible data set, unfortunately
# Same data set shown in attached figures
f="/global/cfs/projectdirs/m3522/cmip6/LOCA2/ACCESS-CM2/0p0625deg/r1i1p1f1/historical/pr/pr.ACCESS-CM2.historical.r1i1p1f1.1950-2014.LOCA_16thdeg_v20220519.nc"

ds=xc.open_dataset(f).isel({"time":slice(0,365)})
# Get regridding errors for this dataset if I don't set these attributes
ds.lat.attrs["standard_name"]="Y"
ds.lon.attrs["standard_name"]="X"

tgrid = grid.create_uniform_grid(-89, 89, 2.0, 0.0, 358., 2.0)
drg = ds.regridder.horizontal("pr", tgrid, tool="xesmf", method="conservative_normed",periodic=True)["pr"]

Relevant log output

N/A

Anything else we need to know?

Here's some Dataset information about my CONUS data:

<xarray.Dataset>
Dimensions:    (lon: 944, lat: 474, time: 365, bnds: 2)
Coordinates:
  * lon        (lon) float32 234.5 234.6 234.7 234.7 ... 293.3 293.3 293.4 293.5
  * lat        (lat) float32 23.91 23.97 24.03 24.09 ... 53.28 53.34 53.41 53.47
  * time       (time) object 1950-01-01 12:00:00 ... 1950-12-31 12:00:00
Dimensions without coordinates: bnds
Data variables:
    pr         (time, lat, lon) float32 nan nan nan nan nan ... nan nan nan nan
    lon_bnds   (lon, bnds) float32 234.5 234.6 234.6 234.6 ... 293.4 293.4 293.5
    lat_bnds   (lat, bnds) float32 23.88 23.94 23.94 24.0 ... 53.44 53.44 53.5
    time_bnds  (time, bnds) object 1950-01-01 00:00:00 ... 1951-01-01 00:00:00
Attributes: (12/84)
    SIOCRD_netCDF_Version:               1.0
    title:                               LOCA statistically downscaled climat...
    history:                             2019-11-09T02:13:29Z ; CMOR rewrote ...
    Conventions:                         CF-1.7 CMIP-6.2
    activity_id:                         CMIP
    branch_method:                       standard
    ...                                  ...
    fname_coarse_obs:                    ../../training_data/LOCA2_training_2...
    fname_gcm_hist:                      ../../Models/ACCESS-CM2/0p5x0p5/r1i1...
    fname_gcm_in:                        ../../Models/ACCESS-CM2/0p5x0p5/r1i1...
    loca_post_ds_bc_id:                  $Id: loca_post_ds_bc.F90,v 1.49 2022...
    loca_post_ds_bc_source:              $Source: /home6/dwpierc2/src/mine/lo...
    LOCA2_version:                       v20220519

Longitude:

<xarray.DataArray 'lon' (lon: 944)>
array([234.53125, 234.59375, 234.65625, ..., 293.34375, 293.40625, 293.46875],
      dtype=float32)
Coordinates:
  * lon      (lon) float32 234.5 234.6 234.7 234.7 ... 293.3 293.3 293.4 293.5
Attributes:
    units:          degreesE
    long_name:      lon
    bounds:         lon_bnds
    standard_name:  X

Latitude

<xarray.DataArray 'lat' (lat: 474)>
array([23.90625, 23.96875, 24.03125, ..., 53.34375, 53.40625, 53.46875],
      dtype=float32)
Coordinates:
  * lat      (lat) float32 23.91 23.97 24.03 24.09 ... 53.28 53.34 53.41 53.47
Attributes:
    units:          degreesN
    long_name:      lat
    bounds:         lat_bnds
    standard_name:  Y

Environment

INSTALLED VERSIONS

commit: None
python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 5.14.21-150400.24.46_12.0.73-cray_shasta_c
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.1
libnetcdf: 4.9.2

xarray: 2023.7.0
pandas: 2.0.3
numpy: 1.22.4
scipy: 1.11.1
netCDF4: 1.6.4
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.8.0
distributed: 2023.8.0
matplotlib: 3.7.2
cartopy: 0.21.1
seaborn: 0.12.2
numbagg: None
fsspec: 2023.6.0
cupy: None
pint: None
sparse: 0.14.0
flox: None
numpy_groupies: None
setuptools: 68.0.0
pip: 23.2.1
conda: None
pytest: None
mypy: None
IPython: 8.14.0
sphinx: None

@acordonez acordonez added the type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label Aug 9, 2023
@pochedls
Copy link
Collaborator

pochedls commented Aug 9, 2023

Thanks for documenting this @acordonez. I agree that regardless of whether this is a bug/feature request, it would be nice to better handle the out-of-domain values.

@jasonb5 jasonb5 self-assigned this Aug 11, 2023
@jasonb5
Copy link
Collaborator

jasonb5 commented Aug 30, 2023

@acordonez @pochedls
xESMF has an option to set unmapped values to NaN rather than zero by passing unmapped_to_nan=True to the regridder, e.g. ds.regridder.horizontal("pr", tgrid, tool="xesmf", method="conservative_normed",periodic=True, unmapped_to_nan=True).

This brings up a good question, is the expected behavior usually or always that unmapped values should be NaN i.e. missing? If that's the case we could default unmapped_to_nan to True.

We could also add an option that would automatically align the input and output grids, this could be a feature.

Also the active xESMF repo is https://github.com/pangeo-data/xESMF for future reference.

@lee1043
Copy link
Collaborator

lee1043 commented Aug 31, 2023

@jasonb5 thank you for investigating on this. I think it makes more sense to set unmapped value to NaN instead of zero, curious why they set it to zero as default.

@pochedls
Copy link
Collaborator

Maybe before we set this as the default, we could reach out to the appropriate library (pangeo xESMF?) to ask why they didn't set this as the default?

@tomvothecoder
Copy link
Collaborator

Maybe before we set this as the default, we could reach out to the appropriate library (pangeo xESMF?) to ask why they didn't set this as the default?

Yeah, good point. This goes one layer deeper with esmpy defaulting to zero instead of nan.

From the original developer back in 2018 (unmapped_to_nan has since been added):

Currently only zero is used, which is the default behavior in ESMPy. Adding nan as an option is a good suggestion and I am happy to add it in the next version. Defaulting to 0 also has many use cases especially for multi-tile grids (regrid one by one and add up).

A quick and dirty way to get nan right now is:

1. Make sure your input data has no 0. Add a constant like `indata + C`

2. Regrid as usual `outdata = regridder(indata)`

3. Set 0 to nan, by `outdata[outdata==0.0] = np.nan`. It is fine to use `==` for floating point comparison here because the unmapped cell is exactly 0.

4. Scale down the output data `outdata - C`

Notes on ESMPy: The UnmappedAction option can only take ERROR or IGNORE. With ERROR the regridding operation would simply fail if unmapped cell exists. With IGNORE, 0 will be assigned to unmapped cells. But it is not hard to convert zero to nan in xESMF level.

-- JiaweiZhuang/xESMF#15 (comment)

@jasonb5
Copy link
Collaborator

jasonb5 commented Aug 31, 2023

I can default unmapped_to_nan to True, I can align this behavior for all the regridders.

@acordonez
Copy link
Contributor Author

@jasonb5 Thanks for finding the unmapped_to_nan=True setting, I'll use that for now.

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Sep 13, 2023

From 9/13/23 meeting:

Jason thinks that ESMF represents missing values with 0 because nan does not exist in C. This behavior seems to propagate up to esmpy and xESMF. We will reach out to ESMF support to see confirm if what Jason is saying is correct, or if they have other reasons.

If their reasoning is not strong enough, we can set unmapped_to_nan=True as the default (especially since Xarray represents missing values with nan).

@tomvothecoder tomvothecoder added this to the FY24Q1 (v0.7.0) milestone Sep 27, 2023
@tomvothecoder
Copy link
Collaborator

Steve reached out to ESMF support, they said they don’t see an issue with defaulting unmapped_to_nan=True
Although they didn’t say why they defaulted missing values to 0 in EMSF -- Jason said that C doesn't support nan thinks that 0 was chosen as the default value.

@lee1043
Copy link
Collaborator

lee1043 commented Mar 12, 2024

@acordonez do we have any update on this? Just housekeeping.

@acordonez
Copy link
Contributor Author

@lee1043 I've been using the unmapped_to_nan=True option and that's working for me.

@lee1043
Copy link
Collaborator

lee1043 commented Mar 12, 2024

@acordonez thank you for confirming. Are you okay with closing this issue?

@acordonez
Copy link
Contributor Author

@lee1043 Sure, although it looks like it's associated with some future milestones? Would closing the issue affect that?

@lee1043
Copy link
Collaborator

lee1043 commented Mar 12, 2024

@tomvothecoder should we keep this issue opened?

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Mar 12, 2024

@tomvothecoder should we keep this issue opened?

I believe the plan was to make unmapped_to_nan=True the default value based on the comment above and because it affects plotting. @jasonb5 should be working on this soon.

@jasonb5
Copy link
Collaborator

jasonb5 commented Mar 12, 2024

I have this and the regrid2 nan fix coming in a PR today.

@lee1043
Copy link
Collaborator

lee1043 commented Mar 12, 2024

@tomvothecoder thanks for reminding me, and @jasonb5 thanks for the PR!

@lee1043
Copy link
Collaborator

lee1043 commented Mar 13, 2024

@acordonez can you rerun your minimal code in #528 (comment) with #613?

@acordonez
Copy link
Contributor Author

@lee1043 I've rerun my minimal example code using the #613 branch and it appears to be fixing this issue:

Screenshot 2024-04-03 at 11 29 01 AM

@lee1043
Copy link
Collaborator

lee1043 commented Apr 3, 2024

@acordonez thanks for checking this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
Status: Done
6 participants