Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG/ISSUE] xarray open_mfdataset cannot open multiple GCHP diagnostic/restart files created with MAPL v1.0.0 #26

Closed
yantosca opened this issue Sep 6, 2019 · 4 comments
Assignees
Labels
category: Bug Something isn't working

Comments

@yantosca
Copy link
Contributor

yantosca commented Sep 6, 2019

Describe the bug
The xarray open_mfdataset function dies with an error when trying to read more than one file created by GCHP using the new MAPL v1.0.0 (i.e. in GCHP 12.5.0 and later versions).

To Reproduce

import xarray as xr
filelist = [ '/path/to/GCHP/file1.nc', '/path/to/GCHP/file2.nc']
ds = xr.open_mfdataset(filelist)

Expected behavior
The returned dataset should contain the merged data from e.g. file1.nc and file2.nc.

Screenshots
Instead, this error occurs:

  File "./run_1mo_benchmark.py", line 479, in <module>
    ds = xr.open_mfdataset([gchp_vs_gchp_refspc, gchp_vs_gchp_refaod])
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/backends/api.py", line 719, in open_mfdataset
    ids=ids)
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/combine.py", line 553, in _auto_combine
    data_vars=data_vars, coords=coords)
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/combine.py", line 475, in _combine_nd
    compat=compat)
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/combine.py", line 493, in _auto_combine_all_along_first_dim
    data_vars, coords)
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/combine.py", line 514, in _auto_combine_1d
    merged = merge(concatenated, compat=compat)
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/merge.py", line 532, in merge
    variables, coord_names, dims = merge_core(dict_like_objects, compat, join)
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/merge.py", line 451, in merge_core
    variables = merge_variables(expanded, priority_vars, compat=compat)
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/merge.py", line 170, in merge_variables
    merged[name] = unique_variable(name, var_list, compat)
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/merge.py", line 90, in unique_variable
    % (name, out, var))
xarray.core.merge.MergeError: conflicting values for variable 'anchor' on objects to be combined:
first value: <xarray.Variable (nf: 6, ncontact: 4)>
dask.array<shape=(6, 4, 4), dtype=int32, chunksize=(6, 4, 4)>
Attributes:
    long_name:  anchor point
second value: <xarray.Variable (nf: 6, ncontact: 4)>
dask.array<shape=(6, 4, 4), dtype=int32, chunksize=(6, 4, 4)>
Attributes:
    long_name:  anchor point

Required information:

  • OS: CentOS 7
  • python 3.6.9
  • xarray 0.12.1
  • netcdf-fortran 4.4.4
  • netcdf4 1.4.2
  • dask 2.3.0
  • dask-core 2.3.0
  • numpy 1.16.4
  • numpy-base 1.16.4
  • scipy 1.3.1

Additional context
The problem seems to be caused by a single variable called "anchor".

@lizziel
Copy link
Contributor

lizziel commented Sep 6, 2019

@yantosca I think this warrants opening an issue with xarray.

@yantosca
Copy link
Contributor Author

yantosca commented Sep 6, 2019

I just did; pydata/xarray#3286

yantosca added a commit that referenced this issue Sep 6, 2019
According to xarray issues:
   pydata/xarray#3286
   pydata/xarray#1378

The open_mfdataset function has problems in creating a merged
dataset from multiple files in which variables have repeated
dimension names.  The easiest thing to do in this case is to
prevent such variables from being read in.

We now have added the drop_variables keyword to avoid reading
in the "anchor" variable in all calls to open_dataset and
open_mfdataset in both benchmark.py and core.py.  This variable is
only present in GCHP-created netCDF files using MAPL v1.0.0, which
is in GCHP 12.5.0 and later.

This commit should resolve GCPy issue #26:
   #26

Signed-off-by: Bob Yantosca <[email protected]>
@yantosca
Copy link
Contributor Author

yantosca commented Sep 6, 2019

According to the xarray support team, xarray cannot gracefully handle merging files that contain variables with repeated dimension names (cf. pydata/xarray#3286 and pydata/xarray#1378).

The easiest solution is to exclude the offending variable (which in this case is the "anchor" variable from GCHP output using MAPL v1.0.0) in all calls to xr.open_dataset and xr.open_mfdataset. This is done by passing the "drop_variables" keyword argument to these functions.

I have pushed commit 5411f3d to master, which should resolve this issue.

@yantosca yantosca closed this as completed Sep 6, 2019
@yantosca
Copy link
Contributor Author

yantosca commented Sep 6, 2019

Also pushed commit bfe6504 to fix a typo in commit 5411f3d.

@yantosca yantosca self-assigned this Oct 17, 2019
@yantosca yantosca added category: Bug Something isn't working resolved labels Oct 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants