Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diagnostics (and more) broken by string-valued array, cdscan broken by 0-d array #2145

Open
painter1 opened this issue Dec 1, 2016 · 8 comments

Comments

@painter1
Copy link
Contributor

painter1 commented Dec 1, 2016

There are two problems with the new UV-CDAT 2.8 which I have had to fix in order to get the diagnostics (uvcmetrics) to work.

First, suppose you have several related NetCDF files. Run cdscan to produce an xml file describing them all. Usually you can't! Generally the variables in the files will have a missing_value attribute. This will be a 0-d array (thus representing a scalar), so the cdscan function cleanupAttrs() issues an exception when trying to compute len(attval). The solution is to catch the exception; I'll paste a working version of this function below.

Second, if a Numpy array be string-valued, Numpy assigns the array a fill_value attribute of 'N/A' (It's a stupid choice, but we have to live with it.) In UV-CDAT, if you read a NetCDF file containing a variable which is a string-valued array, the original fill_value attribute gets propagated into the missing_value attribute of the variable. So far, so good. That is harmless to the diagnostics, although it is troublesome in general. For example, you can't use UV-CDAT to read such a variable.

The problem for the diagnostics is when there are several NetCDF files, each containing a string-valued array. Again, run cdscan to produce an xml file. This works, if it has been patched as described here. Now open that xml file. It breaks. When you call cdms2.open(), all variables are made available. That includes the string-valued variables even if you don't want them. Look in avariable.py, at AbstractVariable.init(). This has a test "numpy.isnan(self.missing_value)". But the function insnan() fails on strings, and our self.missing_value is 'N/A', a string! The local solution is to test for a string first. I'll include sample code at the end.

But note that I called this the "local" solution. I think that a better solution is to replace that 'N/A' with None right away, when it comes in as a fill_value attribute from Numpy. The patch below is just an easy way to get things going for uvcmetrics.

Here is the replacement function for cleanupAttrs() in cdscan.py:

def cleanupAttrs(attrs):
    for attname in attrs.keys():
        attval = attrs[attname]
        if type(attval) is numpy.ndarray:
            try:
                if len(attval)==1:
                    attrs[attname] = attval[0]
                else:
                    attrs[attname] = str(attval)
            except:
                # can happen when attval is really a scalar.  That doesn't even have a length,
                # though then attval.size==1
                attrs[attname] = str(attval)
    if attrs.has_key('missing_value') and attrs['missing_value'] is None:
        del attrs['missing_value']

Here is an improved if...elif...else clause from the end of AbstractVariable.init():

        if not hasattr(self,'missing_value'):
            self.missing_value = None
        elif type(self.missing_value) is str:  # numpy sets fill_value to 'N/A' if the data value be a string
            self.missing_value = None
        elif numpy.isnan(self.missing_value):
          self.missing_value = None
@doutriaux1
Copy link
Contributor

@painter1 can you please attach here a zip of the failing files. @mcenerney1 gave us some that work on my system w/o patchng. It is VERY disturbing if cdscan behaves differently on the same files depending on the system

@painter1
Copy link
Contributor Author

painter1 commented Jan 3, 2017

T42.zip

Unzip this little file and run
cdscan -x T42.xml T42_ORO_ANN_climo.nc
It won't run without the above patched version of cleanupAttrs(), or something similar.

I'll attach a demo file for the 'N/A' problem in my next message.

@doutriaux1
Copy link
Contributor

@painter1 on my ubunut:

(2.8) doutriaux1@omar:[Downloads]:[21099]> cdscan -x crap.xm T42_ORO_ANN_climo.nc 
Finding common directory ...
Common directory: 
Scanning files ...
T42_ORO_ANN_climo.nc
crap.xm written

without your full patch

@painter1
Copy link
Contributor Author

painter1 commented Jan 3, 2017

This zip file contains a NetCDF file cam_dw.nc. It has only one variable, date_written.

cam_dw.zip

This will give you an error:

>>> f=cdms2.open('cam_dw.nc')
>>> f('date_written')

That's not the error I had complained about, though.

Or you can run cdscan first:

cdscan -x cam_dw.xml cam_dw.nc
python
>>> f=cdms2.open('cam_dw.xml')
>>> f('date_written')

Now the error message is about 'N/A', roughly as I had described.
In my more complicated setting, it wasn't necessary to explicitly open the variable 'date_written'.

@painter1
Copy link
Contributor Author

painter1 commented Jan 3, 2017

@doutriaux1 What exactly is your function cleanupAttrs() which worked for you?

@painter1
Copy link
Contributor Author

painter1 commented Jan 3, 2017

I just looked at the cdscan.py on github, UV-CDAT/cdms, master branch. It has a better cleanupAttrs() then the one which I got with my UV-CDAT 2.8, one which should work. I see that @dnadeau4 had fixed it two weeks ago. So I suppose, that you, @doutriaux1, are running the very latest UV-CDAT while I was running the 2.8 release version.

@doutriaux1
Copy link
Contributor

@painter1 this actually using cdscan before any patch...

@doutriaux1
Copy link
Contributor

@painter1 ok I get the same error as you on crunchy with older version of cdscan. I'm going to try to apply your patch and see if that makes any difference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants