Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem opening unstructured grid ocean forecasts with 4D vertical coordinates #2233

Closed
rsignell-usgs opened this issue Jun 14, 2018 · 15 comments · Fixed by #7989
Closed

Problem opening unstructured grid ocean forecasts with 4D vertical coordinates #2233

rsignell-usgs opened this issue Jun 14, 2018 · 15 comments · Fixed by #7989

Comments

@rsignell-usgs
Copy link

rsignell-usgs commented Jun 14, 2018

We can't open the IOOS New England triangular mesh ocean forecasts with Xarray because it doesn't understand their more complex CF vertical coordinate system.

import xarray as xr
url='http://www.smast.umassd.edu:8080/thredds/dodsC/FVCOM/NECOFS/Forecasts/NECOFS_GOM3_FORECAST.nc'
xr.open_dataset(url)

fails with:

MissingDimensionsError: 'siglay' has more than 1-dimension and the same name as one of its dimensions ('siglay', 'node'). xarray disallows such variables because they conflict with the coordinates used to label dimensions.

If you open this dataset with nc = netCDF4.Dataset(url) you can see what the data variables (e.g. temp) and coordinates (e.g. siglay) look like:

print(nc['temp'])

<class 'netCDF4._netCDF4.Variable'>
float32 temp(time, siglay, node)
    long_name: temperature
    standard_name: sea_water_potential_temperature
    units: degrees_C
    coordinates: time siglay lat lon
    type: data
    coverage_content_type: modelResult
    mesh: fvcom_mesh
    location: node
unlimited dimensions: time
current shape = (145, 40, 53087)

print(nc['siglay'])

<class 'netCDF4._netCDF4.Variable'>
float32 siglay(siglay, node)
    long_name: Sigma Layers
    standard_name: ocean_sigma_coordinate
    positive: up
    valid_min: -1.0
    valid_max: 0.0
    formula_terms: sigma: siglay eta: zeta depth: h
unlimited dimensions: 
current shape = (40, 53087)

So the siglay variable in this dataset specifies the fraction of water column contained in the layer and because this fraction changes over the grid, it has dimensions of siglay and node. The variable siglay is just one of the variables used in the calculation of this CF-compliant vertical coordinate. The actual vertical coordinate (after computation via formula_terms) ends up being 4D.

While we understand that there is no way to represent the vertical coordinate with a one-dimensional coordinate that xarray would like, it would be nice if there way to at least load the variable array data like temp into xarray. We tried:

ds = xr.open_dataset(url,decode_times=False, decode_coords=False, decode_cf=False)

and we get the same error.

Is there any workaround for this?

---------------------------------------------------------------------------
MissingDimensionsError                    Traceback (most recent call last)
<ipython-input-18-723c5c460db2> in <module>()
----> 1 xr.open_dataset(url)

~/miniconda3/envs/pangeo/lib/python3.6/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs)
    344             lock = _default_lock(filename_or_obj, engine)
    345         with close_on_error(store):
--> 346             return maybe_decode_store(store, lock)
    347     else:
    348         if engine is not None and engine != 'scipy':

~/miniconda3/envs/pangeo/lib/python3.6/site-packages/xarray/backends/api.py in maybe_decode_store(store, lock)
    256             store, mask_and_scale=mask_and_scale, decode_times=decode_times,
    257             concat_characters=concat_characters, decode_coords=decode_coords,
--> 258             drop_variables=drop_variables)
    259 
    260         _protect_dataset_variables_inplace(ds, cache)

~/miniconda3/envs/pangeo/lib/python3.6/site-packages/xarray/conventions.py in decode_cf(obj, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables)
    428         vars, attrs, concat_characters, mask_and_scale, decode_times,
    429         decode_coords, drop_variables=drop_variables)
--> 430     ds = Dataset(vars, attrs=attrs)
    431     ds = ds.set_coords(coord_names.union(extra_coords).intersection(vars))
    432     ds._file_obj = file_obj

~/miniconda3/envs/pangeo/lib/python3.6/site-packages/xarray/core/dataset.py in __init__(self, data_vars, coords, attrs, compat)
    363             coords = {}
    364         if data_vars is not None or coords is not None:
--> 365             self._set_init_vars_and_dims(data_vars, coords, compat)
    366         if attrs is not None:
    367             self.attrs = attrs

~/miniconda3/envs/pangeo/lib/python3.6/site-packages/xarray/core/dataset.py in _set_init_vars_and_dims(self, data_vars, coords, compat)
    381 
    382         variables, coord_names, dims = merge_data_and_coords(
--> 383             data_vars, coords, compat=compat)
    384 
    385         self._variables = variables

~/miniconda3/envs/pangeo/lib/python3.6/site-packages/xarray/core/merge.py in merge_data_and_coords(data, coords, compat, join)
    363     indexes = dict(extract_indexes(coords))
    364     return merge_core(objs, compat, join, explicit_coords=explicit_coords,
--> 365                       indexes=indexes)
    366 
    367 

~/miniconda3/envs/pangeo/lib/python3.6/site-packages/xarray/core/merge.py in merge_core(objs, compat, join, priority_arg, explicit_coords, indexes)
    433     coerced = coerce_pandas_values(objs)
    434     aligned = deep_align(coerced, join=join, copy=False, indexes=indexes)
--> 435     expanded = expand_variable_dicts(aligned)
    436 
    437     coord_names, noncoord_names = determine_coords(coerced)

~/miniconda3/envs/pangeo/lib/python3.6/site-packages/xarray/core/merge.py in expand_variable_dicts(list_of_variable_dicts)
    209                     var_dicts.append(coords)
    210 
--> 211                 var = as_variable(var, name=name)
    212                 sanitized_vars[name] = var
    213 

~/miniconda3/envs/pangeo/lib/python3.6/site-packages/xarray/core/variable.py in as_variable(obj, name)
    112                 'dimensions %r. xarray disallows such variables because they '
    113                 'conflict with the coordinates used to label '
--> 114                 'dimensions.' % (name, obj.dims))
    115         obj = obj.to_index_variable()
    116 

MissingDimensionsError: 'siglay' has more than 1-dimension and the same name as one of its dimensions ('siglay', 'node'). xarray disallows such variables because they conflict with the coordinates used to label dimensions.
@rsignell-usgs rsignell-usgs changed the title Can't open these unstructured grid ocean forecasts Problem opening unstructured grid ocean forecasts with 2D vertical coordinates Jun 14, 2018
@rsignell-usgs rsignell-usgs changed the title Problem opening unstructured grid ocean forecasts with 2D vertical coordinates Problem opening unstructured grid ocean forecasts with 4D vertical coordinates Jun 14, 2018
@ocefpaf
Copy link
Contributor

ocefpaf commented Jun 14, 2018

It is not ideal but you can workaround that by dropping the siglay variable.

ds = xr.open_dataset(url, drop_variables='siglay')

@rabernat
Copy link
Contributor

@rsignell-usgs -- could you print the ncdump for the file?

Is there some documentation on these new additions to CF conventions that you can point us to? We need to develop a comprehensive strategy towards supporting whatever is in there from xarray.

@shoyer
Copy link
Member

shoyer commented Jun 14, 2018

We currently enforce the requirement that non-1D variable cannot be assigned in a Dataset with a name that matches one of their dimensions. This is somewhat useful, because it means that dataset.variables[dim] is guaranteed to be a 1D set of coordinate labels, in the form of a xarray.IndexVariable backed by a pandas.Index.

It might make sense to relax this requirements part of the eventual "explicit indexes" refactor (#1603). Then the right way to get out an index will simply be dataset.indexes[dim] (which may or may not exist). However, this would definitely be a breaking change: it is likely that at least some code (inside and outside xarray) relies on this assumption.

@rsignell-usgs
Copy link
Author

@rabernat , this unstructured grid model output follows the UGRID Conventions, which layer on top of the CF Conventions. The issue Xarray is having here is with the vertical coordinate however, so this issue could arise with any CF convention model where the vertical stretching function varies over the domain.

As requested, here is the ncdump of this URL:

jovyan@jupyter-rsignell-2dusgs:~$ ncdump -h http://www.smast.umassd.edu:8080/thredds/dodsC/FVCOM/NECOFS/Forecasts/NECOFS_GOM3_FORECAST.nc
netcdf NECOFS_GOM3_FORECAST {
dimensions:
        time = UNLIMITED ; // (145 currently)
        maxStrlen64 = 64 ;
        nele = 99137 ;
        node = 53087 ;
        siglay = 40 ;
        three = 3 ;
variables:
        float lon(node) ;
                lon:long_name = "nodal longitude" ;
                lon:standard_name = "longitude" ;
                lon:units = "degrees_east" ;
        float lat(node) ;
                lat:long_name = "nodal latitude" ;
                lat:standard_name = "latitude" ;
                lat:units = "degrees_north" ;
        float xc(nele) ;
                xc:long_name = "zonal x-coordinate" ;
                xc:units = "meters" ;
        float yc(nele) ;
                yc:long_name = "zonal y-coordinate" ;
                yc:units = "meters" ;
        float lonc(nele) ;
                lonc:long_name = "zonal longitude" ;
                lonc:standard_name = "longitude" ;
                lonc:units = "degrees_east" ;
        float latc(nele) ;
                latc:long_name = "zonal latitude" ;
                latc:standard_name = "latitude" ;
                latc:units = "degrees_north" ;
        float siglay(siglay, node) ;
                siglay:long_name = "Sigma Layers" ;
                siglay:standard_name = "ocean_sigma_coordinate" ;
                siglay:positive = "up" ;
                siglay:valid_min = -1. ;
                siglay:valid_max = 0. ;
                siglay:formula_terms = "sigma: siglay eta: zeta depth: h" ;
        float h(node) ;
                h:long_name = "Bathymetry" ;
                h:standard_name = "sea_floor_depth_below_geoid" ;
                h:units = "m" ;
                h:coordinates = "lat lon" ;
                h:type = "data" ;
                h:mesh = "fvcom_mesh" ;
                h:location = "node" ;
        int nv(three, nele) ;
                nv:long_name = "nodes surrounding element" ;
                nv:cf_role = "face_node_connnectivity" ;
                nv:start_index = 1 ;
        float time(time) ;
                time:long_name = "time" ;
                time:units = "days since 1858-11-17 00:00:00" ;
                time:format = "modified julian day (MJD)" ;
                time:time_zone = "UTC" ;
                time:standard_name = "time" ;
        float zeta(time, node) ;
                zeta:long_name = "Water Surface Elevation" ;
                zeta:units = "meters" ;
                zeta:standard_name = "sea_surface_height_above_geoid" ;
                zeta:coordinates = "time lat lon" ;
                zeta:type = "data" ;
                zeta:missing_value = -999. ;
                zeta:field = "elev, scalar" ;
                zeta:coverage_content_type = "modelResult" ;
                zeta:mesh = "fvcom_mesh" ;
                zeta:location = "node" ;
        int nbe(three, nele) ;
                nbe:long_name = "elements surrounding each element" ;
        float u(time, siglay, nele) ;
                u:long_name = "Eastward Water Velocity" ;
                u:units = "meters s-1" ;
                u:type = "data" ;
                u:missing_value = -999. ;
                u:field = "ua, scalar" ;
                u:coverage_content_type = "modelResult" ;
                u:standard_name = "eastward_sea_water_velocity" ;
                u:coordinates = "time siglay latc lonc" ;
                u:mesh = "fvcom_mesh" ;
                u:location = "face" ;
        float v(time, siglay, nele) ;
                v:long_name = "Northward Water Velocity" ;
                v:units = "meters s-1" ;
                v:type = "data" ;
                v:missing_value = -999. ;
                v:field = "va, scalar" ;
                v:coverage_content_type = "modelResult" ;
                v:standard_name = "northward_sea_water_velocity" ;
                v:coordinates = "time siglay latc lonc" ;
                v:mesh = "fvcom_mesh" ;
                v:location = "face" ;
        float ww(time, siglay, nele) ;
                ww:long_name = "Upward Water Velocity" ;
                ww:units = "meters s-1" ;
                ww:type = "data" ;
                ww:coverage_content_type = "modelResult" ;
                ww:standard_name = "upward_sea_water_velocity" ;
                ww:coordinates = "time siglay latc lonc" ;
                ww:mesh = "fvcom_mesh" ;
                ww:location = "face" ;
        float ua(time, nele) ;
                ua:long_name = "Vertically Averaged x-velocity" ;
                ua:units = "meters s-1" ;
                ua:type = "data" ;
                ua:missing_value = -999. ;
                ua:field = "ua, scalar" ;
                ua:coverage_content_type = "modelResult" ;
                ua:standard_name = "barotropic_eastward_sea_water_velocity" ;
                ua:coordinates = "time latc lonc" ;
                ua:mesh = "fvcom_mesh" ;
                ua:location = "face" ;
        float va(time, nele) ;
                va:long_name = "Vertically Averaged y-velocity" ;
                va:units = "meters s-1" ;
                va:type = "data" ;
                va:missing_value = -999. ;
                va:field = "va, scalar" ;
                va:coverage_content_type = "modelResult" ;
                va:standard_name = "barotropic_northward_sea_water_velocity" ;
                va:coordinates = "time latc lonc" ;
                va:mesh = "fvcom_mesh" ;
                va:location = "face" ;
        float temp(time, siglay, node) ;
                temp:long_name = "temperature" ;
                temp:standard_name = "sea_water_potential_temperature" ;
                temp:units = "degrees_C" ;
                temp:coordinates = "time siglay lat lon" ;
                temp:type = "data" ;
                temp:coverage_content_type = "modelResult" ;
                temp:mesh = "fvcom_mesh" ;
                temp:location = "node" ;
        float salinity(time, siglay, node) ;
                salinity:long_name = "salinity" ;
                salinity:standard_name = "sea_water_salinity" ;
                salinity:units = "0.001" ;
                salinity:coordinates = "time siglay lat lon" ;
                salinity:type = "data" ;
                salinity:coverage_content_type = "modelResult" ;
                salinity:mesh = "fvcom_mesh" ;
                salinity:location = "node" ;
        int fvcom_mesh ;
                fvcom_mesh:cf_role = "mesh_topology" ;
                fvcom_mesh:topology_dimension = 2 ;
                fvcom_mesh:node_coordinates = "lon lat" ;
                fvcom_mesh:face_coordinates = "lonc latc" ;
                fvcom_mesh:face_node_connectivity = "nv" ;

// global attributes:
                :title = "NECOFS GOM3 (FVCOM) - Northeast US - Latest Forecast" ;
                :institution = "School for Marine Science and Technology" ;
                :source = "FVCOM_3.0" ;
                :Conventions = "CF-1.0, UGRID-1.0" ;
                :summary = "Latest forecast from the FVCOM Northeast Coastal Ocean Forecast System using an newer, higher-resolution GOM3 mesh (GOM2 was the preceding mesh)" ;
    

@rabernat
Copy link
Contributor

Ok I see...the basic problem, as @shoyer says, is that siglay is used as a dimension coordinate, but it siglay itself is two-dimensional: siglay(siglay, node). Xarray can't fit this into its notion of coordinates.

This defines a nice specific problem to solve for the index refactoring.

@stale
Copy link

stale bot commented May 15, 2020

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label May 15, 2020
@dcherian dcherian removed the stale label May 15, 2020
@angelra
Copy link

angelra commented Jun 9, 2020

I had to go around this issue and not use xarray but pandas instead or plain netdcf4

nc = netCDF4.Dataset(input_file)
h = nc.variables[vname]
times = nc.variables['time']
jd = netCDF4.num2date(times[:],times.units)
hs = pd.Series(h[:,station],index=jd)

I would love to know if there is a way to do it over xarray since it is so nice to use.
Best regards

@jthielen
Copy link
Contributor

jthielen commented Feb 19, 2021

I've seen this issue come up a few more times recently, so I wanted to ask: in lieu of a "full fix" (a la #2405, with deep data model changes holding up the PR), would there be support for a partial coordinate-renaming-based fix? I'd imagine something like the following:

To read in netCDF like the following,

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    Conventions: CDM-Extended-CF
    history: Written by CFPointWriter
    title: Extract Points data from Grid file /data/ldm/pub/native/grid/NCEP/RR/CONUS_13km/RR_CONUS_13km_20210219_1400.grib2.ncx3#LambertConformal_337X451-40p01N-98p14W
    featureType: timeSeriesProfile
    time_coverage_start: 2021-02-19T16:00:00Z
    time_coverage_end: 2021-02-19T16:00:00Z
    geospatial_lat_min: 42.0295
    geospatial_lat_max: 42.0305
    geospatial_lon_min: -93.6405
    geospatial_lon_max: -93.6395
    dimensions(sizes): profile(1), station(1), isobaric(37), station_name_strlen(10), station_description_strlen(34)
    variables(dimensions): float32 isobaric(station, profile, isobaric), float32 u-component_of_wind_isobaric(station, profile, isobaric), float32 v-component_of_wind_isobaric(station, profile, isobaric), float32 Temperature_isobaric(station, profile, isobaric), float32 Relative_humidity_isobaric(station, profile, isobaric), |S1 station_name(station, station_name_strlen), |S1 station_description(station, station_description_strlen), float64 latitude(station), float64 longitude(station), float64 time(station, profile)
    groups: 

(note the problematic isobaric(station, profile, isobaric)), one could specify a kwarg to xr.open_dataset to rename it,

ds = xr.open_dataset("my_problematic_file.nc", rename_vars={'isobaric': 'isobaric_coord'})

thus giving

<xarray.Dataset>
Dimensions:                       (isobaric: 37, profile: 1, station: 1)
Dimensions without coordinates: isobaric, profile, station
Data variables:
    u-component_of_wind_isobaric  (station, profile, isobaric) float32 ...
    v-component_of_wind_isobaric  (station, profile, isobaric) float32 ...
    Temperature_isobaric          (station, profile, isobaric) float32 ...
    Relative_humidity_isobaric    (station, profile, isobaric) float32 ...
    station_name                  (station) |S10 ...
    station_description           (station) |S34 ...
    latitude                      (station) float64 ...
    longitude                     (station) float64 ...
    time                          (station, profile) datetime64[ns] ...
    isobaric_coord                (station, profile, isobaric) float32 ...
Attributes:
    Conventions:          CDM-Extended-CF
    history:              Written by CFPointWriter
    title:                Extract Points data from Grid file /data/ldm/pub/na...
    featureType:          timeSeriesProfile
    time_coverage_start:  2021-02-19T16:00:00Z
    time_coverage_end:    2021-02-19T16:00:00Z
    geospatial_lat_min:   42.0295
    geospatial_lat_max:   42.0305
    geospatial_lon_min:   -93.6405
    geospatial_lon_max:   -93.6395

@iuryt
Copy link

iuryt commented Mar 14, 2022

Hi

For now, I found a workaround loading and renaming the problematic coordinates with netCDF4.Dataset().
Soon I will post this and other solutions for this model output in iuryt/FVCOMpy.

For now, you could try:

import xarray as xr
from netCDF4 import Dataset

# define year and month to be read
year = 2019
month = 5
# we could use this to run a loop through the years/months we need


# list problematic coordinates
drop_variables = ['siglay','siglev']

# base url for openDAP server
url = "".join(["http://www.smast.umassd.edu:8080/thredds/dodsC/models/fvcom/",
        f"NECOFS/Archive/NECOFS_GOM/{year}/gom4_{year}{month:02d}.nc?"])

# lazy load of the data
ds = xr.open_dataset(url,drop_variables=drop_variables,decode_times=False)

# load data with netCDF4
nc = Dataset(url)
# load the problematic coordinates
coords = {name:nc[name] for name in drop_variables}
# function to extract ncattrs from `Dataset()`
get_attrs = lambda name: {attr:coords[name].getncattr(attr) for attr in coords[name].ncattrs()}
# function to convert from `Dataset()` to `xr.DataArray()`
nc2xr = lambda name: xr.DataArray(coords[name],attrs=get_attrs(name),name=f'{name}_coord',dims=(f'{name}','node'))
# merge `xr.DataArray()` objects
coords = xr.merge([nc2xr(name) for name in coords.keys()])
# reassign to the main `xr.Dataset()`
ds = ds.assign_coords(coords)

Leaving it here in case someone fall into the same problem.

@rsignell-usgs
Copy link
Author

rsignell-usgs commented Mar 24, 2022

#2233 (comment)
Would the new xarray index/coordinate internal refactoring now allow us to address this issue?

cc @kthyng

@dcherian

This comment was marked as outdated.

@shoyer
Copy link
Member

shoyer commented Mar 25, 2022

This is the second follow-up item in #6293

I think we could definitely experiment with relaxing this constraint now, although ideally we would continue to check off auditing all of the methods in that long list first.

@kthyng
Copy link

kthyng commented Jun 30, 2022

I've looked through the github issues associated with the explicit indices, but can't quite tell if I can use them to load FVCOM model output. In any case I just updated and tried without doing anything new and it didn't work:

import xarray as xr
url = 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/SFBOFS/MODELS/2022/06/30/nos.sfbofs.fields.f014.20220630.t09z.nc'  # this file will not be available in a few days but one for the present day will be available
ds = xr.open_dataset(url, drop_variables='Itime2')

Same error message as before:

MissingDimensionsError: 'siglay' has more than 1-dimension and the same name as one of its dimensions ('siglay', 'node'). xarray disallows such variables because they conflict with the coordinates used to label dimensions.

@benbovy
Copy link
Member

benbovy commented Jun 30, 2022

In any case I just updated and tried without doing anything new and it didn't work

Yes it is not yet implemented, still on the todo list (see 2nd item in #6293).

@kthyng
Copy link

kthyng commented Jun 30, 2022

@benbovy Ah, I see you mean under "Relax all constraints related to “dimension (index) coordinates” in Xarray". Ok, thank you for clarifying that for me! (I wasn't sure what the second item meant in the list of lists.)

dcherian added a commit to dcherian/xarray that referenced this issue Jul 15, 2023
Avoid automatic creating of Index variable
when nD variable shares name with one
of its dimensions.

Closes pydata#2233
dcherian added a commit that referenced this issue Jul 19, 2023
* Warn instead of raising for nD index variable

Avoid automatic creating of Index variable
when nD variable shares name with one
of its dimensions.

Closes #2233

* fix tests

* Remove warning

* Add invariants check

Co-authored-by: Benoit Bovy <[email protected]>

* Add whats-new

---------

Co-authored-by: Benoit Bovy <[email protected]>
dcherian added a commit to dcherian/xarray that referenced this issue Aug 31, 2023
dcherian added a commit to dcherian/xarray that referenced this issue Aug 31, 2023
dcherian added a commit that referenced this issue Sep 22, 2023
* Allow creating DataArrays with nD coordinate variables

Closes #2233
Closes #8106

* more test

more test#	make_aggs.bash

* Fix  test

* Apply suggestions from code review

Co-authored-by: Michael Niklas  <[email protected]>

* Update  test

---------

Co-authored-by: Michael Niklas <[email protected]>
Co-authored-by: Maximilian Roos <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
10 participants