Allow `.chunk` for datasets with duplicated dimension names, e.g. Sentinel-3 OLCI files #8579

fwfichtner · 2024-01-02T09:12:50Z

What is your issue?

Sentinel-3 OLCI files (e.g. taken from Copernicus Data Space Ecosystem) come with duplicate dimensions which causes xarray 2023.12.0 to raise after #8491. Specifically instrument_data.nc cannot be opened anymore:

import xarray as xr

dataset = xr.open_dataset("instrument_data.nc", decode_cf=True, mask_and_scale=True, chunks="auto")

Results in the now expected ValueError:

ValueError: This function cannot handle duplicate dimensions, but dimensions {'bands'} appear more than once on this object's dims: ('bands', 'bands')

ncdump -h prints:

netcdf instrument_data {
dimensions:
	bands = 21 ;
	columns = 4865 ;
	detectors = 3700 ;
	rows = 1953 ;
variables:
	float FWHM(bands, detectors) ;
		FWHM:_FillValue = -1.f ;
		FWHM:ancillary_variables = "detector_index lambda0" ;
		FWHM:long_name = "OLCI bandwidth (Full Widths at Half Maximum)" ;
		FWHM:units = "nm" ;
		FWHM:valid_max = 650.f ;
		FWHM:valid_min = 0.f ;
	short detector_index(rows, columns) ;
		detector_index:_FillValue = -1s ;
		detector_index:coordinates = "time_stamp altitude latitude longitude" ;
		detector_index:long_name = "Detector index" ;
		detector_index:valid_max = 3699s ;
		detector_index:valid_min = 0s ;
	byte frame_offset(rows, columns) ;
		frame_offset:_FillValue = -128b ;
		frame_offset:long_name = "Re-sampling along-track frame offset" ;
		frame_offset:valid_max = 15b ;
		frame_offset:valid_min = -15b ;
	float lambda0(bands, detectors) ;
		lambda0:_FillValue = -1.f ;
		lambda0:ancillary_variables = "detector_index FWHM" ;
		lambda0:long_name = "OLCI characterised central wavelength" ;
		lambda0:units = "nm" ;
		lambda0:valid_max = 1040.f ;
		lambda0:valid_min = 390.f ;
	float relative_spectral_covariance(bands, bands) ;
		relative_spectral_covariance:_FillValue = NaNf ;
		relative_spectral_covariance:ancillary_variables = "lambda0" ;
		relative_spectral_covariance:long_name = "Relative spectral covariance matrix" ;
	float solar_flux(bands, detectors) ;
		solar_flux:_FillValue = -1.f ;
		solar_flux:ancillary_variables = "detector_index lambda0" ;
		solar_flux:long_name = "In-band solar irradiance, seasonally corrected" ;
		solar_flux:units = "mW.m-2.nm-1" ;
		solar_flux:valid_max = 2500.f ;
		solar_flux:valid_min = 500.f ;

// global attributes:
		:absolute_orbit_number = 29437U ;
		:ac_subsampling_factor = 64US ;
		:al_subsampling_factor = 1US ;
		:comment = " " ;
		:contact = "[email protected]" ;
		:creation_time = "2023-12-20T07:20:24Z" ;
		:history = "  2023-12-20T07:20:24Z: PUGCoreProcessor JobOrder.3302865.xml" ;
		:institution = "PS2" ;
		:netCDF_version = "4.2 of Jan 13 2023 10:05:23 $" ;
		:processing_baseline = "OL__L1_.003.03.01" ;
		:product_name = "S3B_OL_1_EFR____20231220T045944_20231220T050110_20231220T072024_0085_087_290_1980_PS2_O_NR_003.SEN3" ;
		:references = "S3IPF PDS 004.1 - i2r6 - Product Data Format Specification - OLCI Level 1, S3IPF PDS 002 - i1r8 - Product Data Format Specification - Product Structures, S3IPF DPM 002 - i2r9 - Detailed Processing Model - OLCI Level 1" ;
		:resolution = "[ 270 294 ]" ;
		:source = "IPF-OL-1-EO 06.17" ;
		:start_time = "2023-12-20T04:59:43.719978Z" ;
		:stop_time = "2023-12-20T05:01:09.611725Z" ;
		:title = "OLCI Level 1b Product, Instrument Data Set" ;
}

The relative_spectral_covariance variable has duplicate dimensions. What do you suggest doing in such cases?

I guess this is related to #1378.

The text was updated successfully, but these errors were encountered:

welcome · 2024-01-02T09:12:52Z

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

keewis · 2024-01-02T09:34:04Z

You should have received a warning when opening the file with instructions on what to do (see also the issue you referenced):

In [5]: import xarray as xr
   ...: 
   ...: ds = xr.Dataset({"a": (("x", "x"), [[0, 1], [2, 3]])})
   ...: ds
.../xarray/namedarray/core.py:487: UserWarning: Duplicate dimension names present: dimensions {'x'} appear more than once in dims=('x', 'x'). We do not yet support duplicate dimension names, but we do allow initial construction of the object. We recommend you rename the dims immediately to become distinct, as most xarray functionality is likely to fail silently if you do not. To rename the dimensions you will need to set the ``.dims`` attribute of each variable, ``e.g. var.dims=('x0', 'x1')``.
  warnings.warn(
Out[5]: 
<xarray.Dataset>
Dimensions:  (x: 2)
Dimensions without coordinates: x
Data variables:
    a        (x, x) int64 0 1 2 3

The warning itself is not as helpful for duplicated dimensions on a variable within a dataset, though, since for DataArray objects the dimensions are not mutable. Instead, we can do the operation directly on the variable:

In [6]: ds.variables["a"].dims = ("x0", "x1")
   ...: ds
Out[6]: 
<xarray.Dataset>
Dimensions:  (x: 2)
Dimensions without coordinates: x
Data variables:
    a        (x0, x1) int64 0 1 2 3

fwfichtner · 2024-01-02T10:02:09Z

Alright, thanks! So in this case the chunking fails unless the dimensions are renamed. The solution would therefore be something like:

ds = xr.open_dataset("instrument_data.nc", decode_cf=True, mask_and_scale=True)
ds.variables["relative_spectral_covariance"].dims = ("x0", "x1")
ds.chunk(chunks="auto")

djhoese · 2024-01-02T20:10:51Z

So am I reading this correctly that there is no way to workaround this if we want to use open_dataset with dask chunking (ex. chunks="auto"). There is no real choice but to accept the performance penalty, right?

dcherian · 2024-01-02T20:50:04Z

I think we can enable .chunk to handle duplicated dimensions. There's only one unambiguous interpretation IIUC. And clearly there's a use-case for just opening files successfully.

fwfichtner added the needs triage Issue that has not been reviewed by xarray team member label Jan 2, 2024

fwfichtner mentioned this issue Jan 2, 2024

olci_l1b-reader and xarray=2023.12.0 are seemingly incompatible pytroll/satpy#2705

Closed

fwfichtner closed this as completed Jan 2, 2024

dcherian reopened this Jan 2, 2024

dcherian changed the title ~~Sentinel-3 OLCI files come with now disallowed duplicate dimensions~~ Allow .chunk for datasets with duplicated dimension names, e.g. Sentinel-3 OLCI files Jan 2, 2024

max-sixty added topic-chunked-arrays Managing different chunked backends, e.g. dask and removed needs triage Issue that has not been reviewed by xarray team member labels Feb 26, 2024

bkremmli mentioned this issue May 17, 2024

Adapt reader mviri_l1b_fiduceo_nc pytroll/satpy#2802

Open

1 task

This comment was marked as off-topic.

Sign in to view

dcherian added the contrib-help-wanted label May 28, 2024

mraspaud mentioned this issue Jun 12, 2024

Support duplicate dimensions in .chunk #9099

Merged

4 tasks

dcherian closed this as completed in #9099 Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow `.chunk` for datasets with duplicated dimension names, e.g. Sentinel-3 OLCI files #8579

Allow `.chunk` for datasets with duplicated dimension names, e.g. Sentinel-3 OLCI files #8579

fwfichtner commented Jan 2, 2024 •

edited

Loading

welcome bot commented Jan 2, 2024

keewis commented Jan 2, 2024

fwfichtner commented Jan 2, 2024

djhoese commented Jan 2, 2024

dcherian commented Jan 2, 2024

This comment was marked as off-topic.

Allow .chunk for datasets with duplicated dimension names, e.g. Sentinel-3 OLCI files #8579

Allow .chunk for datasets with duplicated dimension names, e.g. Sentinel-3 OLCI files #8579

Comments

fwfichtner commented Jan 2, 2024 • edited Loading

What is your issue?

welcome bot commented Jan 2, 2024

keewis commented Jan 2, 2024

fwfichtner commented Jan 2, 2024

djhoese commented Jan 2, 2024

dcherian commented Jan 2, 2024

This comment was marked as off-topic.

Allow `.chunk` for datasets with duplicated dimension names, e.g. Sentinel-3 OLCI files #8579

Allow `.chunk` for datasets with duplicated dimension names, e.g. Sentinel-3 OLCI files #8579

fwfichtner commented Jan 2, 2024 •

edited

Loading