Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify export function #525

Closed
mortenwh opened this issue Dec 22, 2022 · 0 comments · Fixed by #528
Closed

Modify export function #525

mortenwh opened this issue Dec 22, 2022 · 0 comments · Fixed by #528

Comments

@mortenwh
Copy link
Contributor

mortenwh commented Dec 22, 2022

GDAL changes the attribute names by prepending "GDAL_" and adds band numbers to the variable names. This should not be done. Also, it is a bit unclear to me if np.nan and netCDF fill values are treated correctly. This seems to be related to #123 as well.

Regarding the attribute and variable names, I suggest adding some "cleaning" functions at the end of the export function.

A PR related to this will also contain some other minor changes that I've made in my own fork.

Regarding the fill values, I have played with the arctic.nc test file. It turns out the fill values in that file are -10000 and not -32767 as given by the metadata:

In [3]: ds = Dataset('data/arctic.nc')

In [4]: ds.variables['Bootstrap']
Out[4]: 
<class 'netCDF4._netCDF4.Variable'>
int16 Bootstrap(time, y, x)
    grid_mapping: polar_stereographic
    description: Sea ice algorithm name: Bootstrap
    short_name: ice_conc
    minmax: 0 100
    colormap: jet
    standard_name: sea_ice_area_fraction
    long_name: Sea Ice Concentration (Bootstrap)
    valid_min: 0
    units: %
    wkv: sea_ice_area_fraction
    valid_max: 100
    name: Bootstrap
unlimited dimensions: 
current shape = (1, 240, 240)
filling on, default _FillValue of -32767 used

In [5]: ds.variables['Bootstrap'][:].data
Out[5]: 
array([[[     0,      0,      0, ...,      0,      0,      0],
        [     0,      0,      0, ...,      0,      0,      0],
        [     0,      0,      0, ...,      0,      0,      0],
        ...,
        [     0,      0,      0, ..., -10000, -10000, -10000],
        [     0,      0,      0, ..., -10000, -10000, -10000],
        [     0,      0,      0, ..., -10000, -10000, -10000]]],
      dtype=int16)

But netCDF4 treats the data correctly:

In [4]: ds.variables['Bootstrap'][:]
Out[4]: 
masked_array(
  data=[[[0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0],
         ...,
         [0, 0, 0, ..., --, --, --],
         [0, 0, 0, ..., --, --, --],
         [0, 0, 0, ..., --, --, --]]],
  mask=[[[False, False, False, ..., False, False, False],
         [False, False, False, ..., False, False, False],
         [False, False, False, ..., False, False, False],
         ...,
         [False, False, False, ...,  True,  True,  True],
         [False, False, False, ...,  True,  True,  True],
         [False, False, False, ...,  True,  True,  True]]],
  fill_value=-32767,
  dtype=int16)

Opening the file with nansat using python3.7 and GDAL3.0.4:

In [11]: n = Nansat('nansat/tests/data/arctic.nc')
HDF5-DIAG: Error detected in HDF5 (1.10.4) thread 140474394756928:
  #000: H5F.c line 509 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1400 in H5F__open(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #002: H5Fint.c line 1700 in H5F_open(): unable to read superblock
    major: File accessibilty
    minor: Read failed
  #003: H5Fsuper.c line 411 in H5F__super_read(): file signature not found
    major: File accessibilty
    minor: Not an HDF5 file

In [12]: n['Bootstrap']
Out[12]: 
array([[     0,      0,      0, ..., -10000, -10000, -10000],
       [     0,      0,      0, ..., -10000, -10000, -10000],
       [     0,      0,      0, ..., -10000, -10000, -10000],
       ...,
       [     0,      0,      0, ...,      0,      0,      0],
       [     0,      0,      0, ...,      0,      0,      0],
       [     0,      0,      0, ...,      0,      0,      0]], dtype=int16)

And we don't have information to mask the invalid data in Nansat

In [13]: n.get_metadata(band_id='Bootstrap', key='_FillValue')
Out[13]: '-32767.0'

With python3.9 and GDAL3.4.1:

In [13]: n = Nansat('data/arctic.nc')

In [14]: n['Bootstrap']
Out[14]: 
array([[     0,      0,      0, ..., -32767, -32767, -32767],
       [     0,      0,      0, ..., -32767, -32767, -32767],
       [     0,      0,      0, ..., -32767, -32767, -32767],
       ...,
       [     0,      0,      0, ...,      0,      0,      0],
       [     0,      0,      0, ...,      0,      0,      0],
       [     0,      0,      0, ...,      0,      0,      0]], dtype=int16)

And here we have the same fill value in nansat:

In [6]: n.get_metadata(band_id='Bootstrap', key='_FillValue')
Out[6]: '-32767.0'

The nansat code is the same but I'm wondering if gdal v3.4.1 masks values outside the interval valid_min:valid_max.

What to do, except being careful when using nansat?

mortenwh added a commit to mortenwh/nansat that referenced this issue Dec 23, 2022
mortenwh added a commit to mortenwh/nansat that referenced this issue Dec 23, 2022
mortenwh added a commit to mortenwh/nansat that referenced this issue Dec 23, 2022
mortenwh added a commit to mortenwh/nansat that referenced this issue Dec 25, 2022
mortenwh added a commit to mortenwh/nansat that referenced this issue Dec 25, 2022
@mortenwh mortenwh changed the title Make new export function that uses xarray instead of gdal Modify export function Dec 25, 2022
mortenwh added a commit to mortenwh/nansat that referenced this issue Dec 25, 2022
… issue when fill values are not correctly picked up
mortenwh added a commit to mortenwh/nansat that referenced this issue Dec 26, 2022
mortenwh added a commit to mortenwh/nansat that referenced this issue Dec 27, 2022
mortenwh added a commit to mortenwh/nansat that referenced this issue Jan 4, 2023
…'CF-1.5' which is obviously wrong. This is now deleted. Same goes for the history. That should also be handled in the processing software rather then by the gdal netcdf driver.
@akorosov akorosov mentioned this issue Jan 5, 2023
akorosov pushed a commit that referenced this issue Jan 5, 2023
akorosov pushed a commit that referenced this issue Jan 5, 2023
akorosov pushed a commit that referenced this issue Jan 5, 2023
akorosov pushed a commit that referenced this issue Jan 5, 2023
akorosov pushed a commit that referenced this issue Jan 5, 2023
…ch is obviously wrong. This is now deleted. Same goes for the history. That should also be handled in the processing software rather then by the gdal netcdf driver.
akorosov pushed a commit that referenced this issue Jan 5, 2023
akorosov pushed a commit that referenced this issue Jan 5, 2023
akorosov pushed a commit that referenced this issue Jan 5, 2023
akorosov pushed a commit that referenced this issue Jan 5, 2023
akorosov pushed a commit that referenced this issue Jan 5, 2023
…ch is obviously wrong. This is now deleted. Same goes for the history. That should also be handled in the processing software rather then by the gdal netcdf driver.
mortenwh added a commit to mortenwh/nansat that referenced this issue Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant