Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong error with flag_masks containing only one element #914

Closed
jdemaria opened this issue Apr 4, 2022 · 7 comments · Fixed by #915
Closed

Wrong error with flag_masks containing only one element #914

jdemaria opened this issue Apr 4, 2022 · 7 comments · Fixed by #915

Comments

@jdemaria
Copy link

jdemaria commented Apr 4, 2022

Version of compliance checker running: latest git master version at this time: 79c67c4

Describe the checker this affects: CF

Attach a minimal CDL or NetCDF file which is able to reproduce the issue
flag_masks_one_elem.txt

netcdf flag_masks_one_elem {
dimensions:
	pixels = 10 ;
variables:
	byte flags(pixels) ;
		flags:long_name = "Flags" ;
		flags:standard_names = "status_flag" ;
		flags:flag_meanings = "LAND" ;
		flags:flag_masks = 1b ;

// global attributes:
		:Conventions = "CF-1.8" ;
		:title = "title" ;
		:history = "history" ;
}

To Reproduce:

ncgen flag_masks_one_elem.txt -o flag_masks_one_elem.nc
cchecker.py -t cf flag_masks_one_elem.nc

Describe the issue below:
flag_masks attribute containing only one element seems not well supported as it should be and gives this error:

                                     Errors                                     
--------------------------------------------------------------------------------
§3.5 Flags
* flags's flag_masks must be an array of values not int8

With at least two elements in flag_masks there is no error.
Maybe this is related to the fact that the netCDF4 or xarray Python modules return a scalar numpy.int8 if the attribute has only one elements but a numpy.ndarray if it has more than one, which is not always easy to handle.

@jdemaria
Copy link
Author

jdemaria commented Apr 4, 2022

Maybe related to #792

@benjwadams
Copy link
Contributor

benjwadams commented Apr 5, 2022

Hi, flag_masks here is a scalar, as you indicated.

It's been a while since I've reviewed implementation of the flags (although I am am recently re-reviewing the conformance pages). In the CF 1.6 conventions flags section, there is the following snippet (emphasis mine):

The flag_masks and flag_meanings attributes describe a number of independent Boolean conditions using bit field notation by setting unique bits in each flag_masks value. The flag_masks attribute is the same type as the variable to which it is attached, and contains a list of values matching unique bit fields.

@jdemaria
Copy link
Author

jdemaria commented Apr 6, 2022

Hi, do you mean that it is not allowed with CF to have a flags variable containing only one bit field? What is the reason? This seems not very logical to me.

@jdemaria
Copy link
Author

jdemaria commented Apr 6, 2022

Note also that the other NetCDF checker https://github.com/cedadev/cf-checker accepts scalar flag_masks.

@benjwadams
Copy link
Contributor

Hi, it looks like netCDF4-Python casts one-element 1D arrays to the corresponding scalar type of the first (and only) element. I checked both the NetCDF User Guide and the netCDF4 Python Documentation to see if there was something describing this behavior. It looks like it may be related to the NetCDF4 classic data model wherein attributes are a 1D array of values: https://www.unidata.ucar.edu/software/netcdf/workshops/2011/datamodels/NcClassicModel.html . If this is the case, this will be a bug on our end, although the behavior could be made a little clearer in the Python NetCDF4 documentation. Usually files with flag_values and/or flag_masks will have multiple values defined, which may explain why this situation has not occurred in the past.

(Pdb) dataset.variables["time"].leap_month = np.array([], dtype=np.uint8)
(Pdb) dataset.variables["time"].leap_month
array([], dtype=uint8)
(Pdb) dataset.variables["time"].leap_month = np.array([1], dtype=np.uint8)
(Pdb)
(Pdb) dataset.variables["time"].leap_month
1
(Pdb) dataset.variables["time"].leap_month = np.array([1], dtype=np.uint8)
(Pdb) dataset.variables["time"].leap_month = np.array([], dtype=np.uint8)
(Pdb) print(dataset.variables["time"].leap_month)
[]
(Pdb) dataset.variables["time"].leap_month = np.array([2, 5], dtype=np.uint8)
(Pdb) print(dataset.variables["time"].leap_month)
[2 5]
(Pdb) type(dataset.variables["time"].leap_month)
<class 'numpy.ndarray'>
(Pdb) single_element_1d = np.array([2], dtype=np.uint8)
(Pdb) dataset.variables["time"].leap_month = single_element_1d
(Pdb) dataset.variables["time"].leap_month == single_element_1d
array([ True])
(Pdb) #uh
*** SyntaxError: unexpected EOF while parsing
(Pdb) type(dataset.variables["time"].leap_month)
<class 'numpy.uint8'>
(Pdb) type(single_element_1d)
<class 'numpy.ndarray'>
(Pdb) single_element_1d.dtype
dtype('uint8')
(Pdb) dataset.variables["time"].leap_month.dtype
dtype('uint8')

@benjwadams
Copy link
Contributor

@jdemaria, I tried your file generated from ncgen. Please confirm that the merged fix works off of master branch and I will close this issue.

@jdemaria
Copy link
Author

jdemaria commented Apr 8, 2022

@benjwadams, I confirm that your fix works on my side too, thanks a lot for the quick analysis and fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants