Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect conversion of fillval to datetime #230

Open
tien-vo opened this issue Sep 3, 2023 · 6 comments
Open

Incorrect conversion of fillval to datetime #230

tien-vo opened this issue Sep 3, 2023 · 6 comments

Comments

@tien-vo
Copy link

tien-vo commented Sep 3, 2023

Using the MMS example of cdf_to_xarray in the documentation,

import xarray as xr
import os
import urllib.request
from cdflib.xarray import cdf_to_xarray

fname = 'mms2_fgm_srvy_l2_20160809_v4.47.0.cdf'
url = ("https://lasp.colorado.edu/maven/sdc/public/data/sdc/web/cdflib_testing/mms2_fgm_srvy_l2_20160809_v4.47.0.cdf")
if not os.path.exists(fname):
    urllib.request.urlretrieve(url, fname)

ds = cdf_to_xarray("mms2_fgm_srvy_l2_20160809_v4.47.0.cdf", to_datetime=True, fillval_to_nan=True)

ds.Epoch.attrs["FILLVAL"] has value "1707-09-22T12:12:10.961224", but it should be "9999-12-31T23:59:59.99..." for MMS data. VALIDMIN and VALIDMAX attributes seem to be correctly converted, though. It's not a huge issue but I think this bug still needs to be fixed.

@dstansby
Copy link
Contributor

dstansby commented Sep 3, 2023

Just looking at the Epoch variable, the attributes are:

>>> print(cdf.varattsget("Epoch"))
{
'CATDESC': 'Interval centered time tag (TBC)',
'FIELDNAM': 'Time since Jan 1, 1958',
'FILLVAL': -9223372036854775808,
'LABLAXIS': 'mms2_fgm_srvy_Epoch',
'UNITS': 'ns',
'VALIDMIN': 315576066184000000,
'VALIDMAX': 946728068183000000,
'VAR_TYPE': 'support_data',
'SCALETYP': 'linear',
'MONOTON': 'INCREASE',
'TIME_BASE': 'J2000',
'TIME_SCALE': 'Terrestrial Time',
'REFERENCE_POSITION': 'Rotating Earth Geoid',
'SI_CONVERSION': '1.0e-9>s',
'DELTA_PLUS_VAR': 'mms2_fgm_bdeltahalf_srvy_l2',
'DELTA_MINUS_VAR': 'mms2_fgm_bdeltahalf_srvy_l2'
}

It looks like 1707-09-22T12:12:10.961224 is a reasonable value given given a FILLVAL of -9223372036854775808?

@tien-vo
Copy link
Author

tien-vo commented Sep 4, 2023

Hm then the issue must be the parsing of the cdf file instead of datetime conversion. Using cdfbrowse (CDF v3.8.1) to investigate mms2_fgm_srvy_l2_20160809_v4.47.0.cdf directly, the FILLVAL values are

+------------------------ vAttribute "FILLVAL" Entries ------------------------+
|0 rEntries, 15 zEntries                                                       |
|VarName          DataSpec  Value(s)                                           |
+------------------------------------------------------------------------------+
|"Epoch"          TT2000/1  9999-12-31T23:59:59.999999999                      |
|"mms2_fgm_b_..." REAL4/1   -1.0e+31                                           |
|"mms2_fgm_b_..." REAL4/1   -1.0e+31                                           |
|"mms2_fgm_b_..." REAL4/1   -1.0e+31                                           |
|"mms2_fgm_b_..." REAL4/1   -1.0e+31                                           |
|"mms2_fgm_fl..." UINT4/1   4294967295                                         |
|"Epoch_state"    TT2000/1  9999-12-31T23:59:59.999999999                      |
|"mms2_fgm_r_..." REAL4/1   -1.0e+31                                           |
|"mms2_fgm_r_..." REAL4/1   -1.0e+31                                           |
|"mms2_fgm_hi..." UINT1/1   255                                                |
|"mms2_fgm_bd..." FLOAT/1   -1.0e+31                                           |
|"mms2_fgm_st..." FLOAT/1   nan                                                |
|"mms2_fgm_et..." FLOAT/1   nan                                                |
|"mms2_fgm_mo..." FLOAT/1   -1.0e+31                                           |
+------------------------------------------------------------------------ Top -+
|Select: <Return>   Exit: <Ctrl-E>   Help: <Ctrl-K>   Next attribute: <Ctrl-J> |
+------------------------------------------------------------------------------+

So what might lead to this inconsistency?

@dstansby
Copy link
Contributor

I had a read of some CDF documents, and I think it's just convention (not a requirement) that FILLVAL time values are represented as 9999-12-31T23:59:59.999999999. It seems like a reasonable request though, do you think you'd be able to make the change yourself in the cdflib code and open a pull request?

@bryan-harter
Copy link
Collaborator

Yes I went looking in the C code (cdftt2000.c), and it literally just says if (nanoSecSinceJ2000 == FILLED_TT2000_VALUE) then they call it 9999-12-31T23:59:59.999999999. I assume this is just for readability.

cdflib is actually trying to convert that number into nanoseconds before the year 2000, which is on 1707-09-22 (which they also comment in the code is the real minimum value).

So thankfully its not a broader issue about time conversions. Perhaps that if statement should live somewhere in our code as well.

@tien-vo
Copy link
Author

tien-vo commented Sep 18, 2023

That's a satisfactory enough answer to me. Thanks for looking into this! I'd prefer not to force an arbitrary value, but to do the actual conversion as cdflib is doing right now, so no change is necessary. But some people might prefer to use the CDF binaries like I do to browse the files pre-processing because it's faster. So maybe some documentation on this issue will suffice?

@jeandet
Copy link
Contributor

jeandet commented Sep 19, 2023

I just discovered the same issue with pycdfpp, I asked to NASA CDF support ML. The answer I got was that TT2000 has 3 predefined special values:

  • FILLED_TT2000_VALUE = -9223372036854775808 (0x8000000000000000) encoded as 9999-12-31T23:59:59.999999999
  • ILLEGAL_TT2000_VALUE = -9223372036854775805 (0x8000000000000003) encoded as 9999-12-31T23:59:59.999999999
  • DEFALUT_TT2000_PADVALUE = -9223372036854775807 (0x8000000000000001) encoded as 0000-01-01T00:00:00.000000000

Since it's not much documented, I leave it here.

The main issue is that we can't represent all these values with python 😅, with numpy datetime64 and ns resolution the are out of reach and python datetime dosen't accept year 0. Not sure what to do with pycdfpp...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants