Need field for specifying session start time accuracy #321

rly · 2019-10-31T20:30:30Z

The NWBFile dataset session_start_time is designed to represent the real-world start time of the session. Based on discussions with @oruebel, I see the value in this field is that you can tell what time of day a dataset was collected, if it occurred at the same time as an earthquake, etc.

But, most systems are not going to store times totally accurately with respect to the world clock at atomic resolution. How do we represent the inaccuracy of the session start time? Most datasets supply precision down to the second. Some datasets, for one reason or another, only know what day that the dataset was collected and thus enter 00:00:00 for the time, e.g. for this DataJoint to NWB converter:
https://github.com/vathes/DJ-NWB-Li-Daie-2015-2016/blob/ae69335a23e6a3c0c3809a54de5480baf8848712/pipeline/export/datajoint_to_nwb.py#L23

@oruebel and I think we should add a float attribute on session_start_time called accuracy that represents the +/- amount of time to which the session_start_time is estimated to be inaccurate. e.g., if the user thinks the value is accurate down to the second, then they should enter 0.5 for the accuracy. If they only know the day, then they should enter noon of that day as the start time and 43200 (12x60x60) for the accuracy. It would be optional, the default value would be np.nan, and we should encourage users to specify this value in tutorials and our best practices document.

This is a minor issue, but we should address it, or else, the session_start_time field will not have a consistent usage and meaning across files.

The text was updated successfully, but these errors were encountered:

rly · 2019-10-31T20:31:05Z

@ajtritt @bendichter what do you think about this idea?

t-b · 2019-10-31T20:39:57Z

session_start_time is supposed to be an ISO8601 string since NeurodataWithoutBorders/pynwb#641 got merged.

I'm using something like

DATASET "/session_start_time" {
   DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_UTF8;
         CTYPE H5T_C_S1;
      }
   DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
   DATA {
      "2019-09-23T19:10:03.435Z"
   }
}
}

for NWBv1 currently. And I will use the same for NWBv2. The accuracy could be just implemented by only writing out the digits you trust or?

rly · 2019-10-31T21:10:03Z

That would work if accuracy aligned with the segments of an ISO8601 string, but not if you know that the recording happened around 9:30am +/- 10 minutes because your acquisition computer is not connected to the internet and the clock is off by about that much.

t-b · 2019-10-31T21:32:30Z

@rly ISO8601 supports durations so you could say that the starting time was 9:20 to 9:40.

But I'm actually not sure that this is a viable route as most date parsing tools only support dates and times and not durations. E.g. https://dateutil.readthedocs.io/en/stable/parser.html does not seem to support durations.

So why not have a accuracy attribute. We could also call it resolution as we have that already. And having it as fractional seconds (or NaN) sounds good as well.

rly · 2019-10-31T22:00:14Z

@t-b Ah, I see what you mean. Sorry for misreading.

Pandas supports ISO8601 durations (link), and we already import pandas, so we could use that to parse the durations. Representing an inaccuracy range of 10 minutes as either "P0DT0H10M0S" or 600.0 would be fine with me.

Although we do have resolution already, I think that does not mean the same thing as accuracy here. Perhaps inaccuracy_range is a better term.

rly · 2019-11-20T02:31:40Z

Just to be explicit, the proposal on the table is now:

Support a new data type in the schema, perhaps called isoduration (we already have isodatetime). The value would be an ISO 8601 extended formatted string for durations, e.g., "P0DT0H10M0S"
In PyNWB, this would be parsed using pandas or the isodate Python module.
The dataset session_start_time of NWBFile would change to:

- name: session_start_time
  dtype: isodatetime
  doc: ...
  attributes:
  - name: uncertainty
    dtype: isoduration
    doc: The uncertainty of the session start time. The actual session start time may be within 
      plus or minus the uncertainty value from the reported session start time. For example, if 
      it is known only that the session start time was sometime on July 15, 2019 in UTC-7, the 
      session start time should be July 15, 2019 at noon in UTC-7 ("2019-07-15T12:00:00-07:00"),
      and the uncertainty should be 12 hours ("P0DT12H0M0S").
    required: false

oruebel · 2019-11-20T21:32:55Z

CC @ln-vidrio

bendichter · 2019-11-20T21:36:38Z

I feel like the natural solution to the session start time accuracy issue is to let users just supply a date or a time without ms. I think the accuracy thing is overkill and will rarely be used correctly.

t-b · 2019-11-20T22:12:30Z

I would prefer to use isodatetime module for parsing instead of pandas as pandas supports also non-ISO8601 durations.

rly · 2019-11-20T23:36:13Z

@bendichter Yeah, I also think this would rarely be used correctly. 99.9% of cases will be either:

the user supplies the time down to the second based on the computer's clock, and they don't know how accurate that is relative to the world clock.
the user supplies the time down to the day because they don't remember what time they ran the experiment and have no record of it.

In both cases, I think most users would not bother to enter an uncertainty value, and if they don't know the time or the second, it is most likely not important to the study.

Also, acquisition systems do not know the accuracy of their clocks, so any files they create will have unknown uncertainty.

I like @bendichter and @t-b 's suggestion of just supporting datetimes with reduced accuracy. ISO 8601 allows it - "For reduced accuracy, any number of values may be dropped from any of the date and time representations". You can't encode uncertainty values of +/- 3 hours, but it will cover 99.9% of use cases. Unfortunately, all datetime parsers confound missing values with zero values. We could write our own modified datetime reader/writer to differentiate those cases, so the uncertainty is documented in the file (non read/write uses of the datetimes would use 0s for missing values), but that could get messy. Thoughts?

Thanks for all the feedback! I consider this a very minor issue. I say: if there is a need from users, then let's move forward with one option or the other (I lean toward creating a modified datetime string reader/writer). In the meantime, I'll close the issue.

nandchandravadia · 2019-12-11T01:38:46Z

Hi,

Is it possible to make session_start_time optional? Since we plan on releasing our data publicly, one issue we currently have is with regards to protecting PHI (patient health information) by not putting the actual date of the experiment. Instead, we have been putting an arbitrary date (e.g., 1-1-1900), which is obviously incorrect and can be misleading. Another possibility would be to allow the user to specify the desired resolution of the session_start_time. In our case, we would just input the year and month.

bendichter · 2023-06-16T15:36:17Z

@rly I don't really see this as a minor issue, as it comes us for us all the time in different ways:

A number of dandisets use midnight clearly as a place-holder. It would be better to just have the date, indicating that time time is unknown.
It is common that we can pull out date but not time from metadata, which requires us to fill in data which is often just a midnight placeholder.
It is common that we can pull out datetime, but not timezone, which is filled as the local timezone, which may or may not be correct.
If you don't really know the time, and you are just filling in midnight then the timezone is completely pointless and just extra work for no reason. On top of that, the syntax for timezone is not straightforward.

In the end, these requirements force us to enter data which is often wrong. It would be so much better and simpler to allow:

datetime with timezone
datetime without timezone
date

If using date, the timestamp_reference_time can still use midnight of that day.

rly · 2023-07-01T08:52:34Z

This sounds good to me. @oruebel any thoughts?

bendichter · 2023-07-04T14:37:06Z

@lawrence-mbf What do you think? Do you think it would be possible to relax this on the MatNWB side as well?

oruebel · 2023-07-04T19:18:40Z

The option to allow datetime with timezone and date sound reasonable. I'm not sure what the option to allow datetime without timezone adds.

CodyCBakerPhD · 2023-07-05T06:23:26Z

I'm not sure what the option to allow datetime without timezone adds.

I get the logic of the principle of how it could be seen as required (experiment had to take place somewhere; if institution is in the NWB file you could even infer what timezone it is in from that)

I think what we take more issue with is that requiring it means the APIs like PyNWB (IDK what MatNWB does) then auto-attach it to the datetime.datetime object if it is missing, and it sets it to the timezome on the local device creating the NWB file, which from experience is not always the same as the device the experiment was performed on, nor is it always performed by the same lab, or necessarily in the same timezone

The syntax for attaching a proper ZoneInfo object to a datetime.datetime is also not entirely trivial or super commonly well known (though we use it liberally throughout the NeuroConv galleries in an attempt to educate) - though helper functions could of course be made for this

So overall it might just be better to loosen it to be an optional detail to include along with the other loosened resolutions of the session_start_time, so that fewer potentially false assumptions are made at time of file creation. Either that or, if we want to keep it required, stop assuming that it's the 'local timezone' according to Python.

oruebel · 2023-07-05T08:25:43Z

I think what we take more issue with is that requiring it means the APIs like PyNWB (IDK what MatNWB does) then auto-attach it to the datetime.datetime object if it is missing

Thanks for the clarification @CodyCBakerPhD, that makes sense. It sounds like independent of this schema issue, there is an issue in PyNWB in that we: a) zone information should not be automatically added and b) specifying date information should be simpler. For a) we'd probably need to add/update logic in the NWBFile class in PyNWB. For b) we could look at the ZoneInfo module that was added in Python 3.9 to see if that helps simplify the code.

If we decide to allow "datetime without timezone" then I would still suggest that we look at enhancing PyNWB (NeuroConv, GUIDE etc.) to make it easier and encourage the use of proper timezone information.

lawrence-mbf · 2023-07-05T14:04:59Z

This should be doable on MatNWB side. datetime can be represented as all three forms.

bendichter · 2023-07-06T23:00:04Z

Linked PRs:

rly self-assigned this Nov 20, 2019

rly closed this as completed Nov 20, 2019

bendichter reopened this Dec 11, 2019

bendichter mentioned this issue Jun 16, 2023

allow a datetime.date object to be used instead of datetime.datetime for isodatetime data hdmf-dev/hdmf#874

Merged

6 tasks

rly closed this as completed in hdmf-dev/hdmf#874 Jul 10, 2023

lawrence-mbf mentioned this issue Jul 11, 2023

[Feature]: Support Truncated Time Zones and Times for isodatetime NeurodataWithoutBorders/matnwb#512

Closed

2 tasks

bendichter mentioned this issue May 23, 2024

timezone-less NWB files are failing DANDI-side validation dandi/dandi-archive#1944

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need field for specifying session start time accuracy #321

Need field for specifying session start time accuracy #321

rly commented Oct 31, 2019 •

edited

Loading

rly commented Oct 31, 2019

t-b commented Oct 31, 2019

rly commented Oct 31, 2019

t-b commented Oct 31, 2019

rly commented Oct 31, 2019

rly commented Nov 20, 2019

oruebel commented Nov 20, 2019

bendichter commented Nov 20, 2019

t-b commented Nov 20, 2019

rly commented Nov 20, 2019

nandchandravadia commented Dec 11, 2019 •

edited

Loading

bendichter commented Jun 16, 2023

rly commented Jul 1, 2023

bendichter commented Jul 4, 2023

oruebel commented Jul 4, 2023 •

edited by CodyCBakerPhD

Loading

CodyCBakerPhD commented Jul 5, 2023

oruebel commented Jul 5, 2023

lawrence-mbf commented Jul 5, 2023

bendichter commented Jul 6, 2023 •

edited

Loading

Need field for specifying session start time accuracy #321

Need field for specifying session start time accuracy #321

Comments

rly commented Oct 31, 2019 • edited Loading

rly commented Oct 31, 2019

t-b commented Oct 31, 2019

rly commented Oct 31, 2019

t-b commented Oct 31, 2019

rly commented Oct 31, 2019

rly commented Nov 20, 2019

oruebel commented Nov 20, 2019

bendichter commented Nov 20, 2019

t-b commented Nov 20, 2019

rly commented Nov 20, 2019

nandchandravadia commented Dec 11, 2019 • edited Loading

bendichter commented Jun 16, 2023

rly commented Jul 1, 2023

bendichter commented Jul 4, 2023

oruebel commented Jul 4, 2023 • edited by CodyCBakerPhD Loading

CodyCBakerPhD commented Jul 5, 2023

oruebel commented Jul 5, 2023

lawrence-mbf commented Jul 5, 2023

bendichter commented Jul 6, 2023 • edited Loading

rly commented Oct 31, 2019 •

edited

Loading

nandchandravadia commented Dec 11, 2019 •

edited

Loading

oruebel commented Jul 4, 2023 •

edited by CodyCBakerPhD

Loading

bendichter commented Jul 6, 2023 •

edited

Loading