Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Required NWBFile fields are not required by MatNWB #588

Open
2 tasks done
bendichter opened this issue Sep 9, 2024 · 4 comments · May be fixed by #600
Open
2 tasks done

[Bug]: Required NWBFile fields are not required by MatNWB #588

bendichter opened this issue Sep 9, 2024 · 4 comments · May be fixed by #600
Assignees

Comments

@bendichter
Copy link
Contributor

bendichter commented Sep 9, 2024

What happened?

identifier, session_description and session_start_time are required by the NWB schema but this is not being enforced by MatNWB.

The official source of truth here is the NWB schema. All neuro-specific types are in /core, and generic types are in the submodule /hdmf-common-schema. These yaml files define all neurodata types used in NWB using the NWB schema language defined here. The information you want is here, where the fields you mentioned are defined in the schema within the NWBFile neurodata type. We are looking for a quantity field of these datasets, which is not defined for any of them, so we go with the default, 1, as defined here in the schema language documentation. This means that all three fields are required.

Steps to Reproduce

Here, session_start_time is missing

nwb = NwbFile( ...
'session_description', 'mouse in open exploration',...
'identifier', 'Mouse5_Day3', ...
'general_experimenter', 'Last, First', ... % optional
'general_session_id', 'session_1234', ... % optional
'general_institution', 'University of My Institution', ... % optional
'general_related_publications', {'DOI:10.1016/j.neuron.2016.12.011'}); % optional
nwbExport(nwb, 'test.nwb')

MatNWB will happily instantiate this object and write it even though this is not a valid NWB file and cannot be opened by PyNWB

Error Message

No response

Operating System

macOS

Matlab Version

2024a

Code of Conduct

@ehennestad
Copy link
Collaborator

Add methods to deal with missing required properties: d7eba0d

@ehennestad
Copy link
Collaborator

ehennestad commented Sep 17, 2024

Here are a few edge cases I could not quite figure out how to deal with:

For TimeSeries:
timestamps and starting_time are optional datasets. However the rate attribute of starting_time is required, but only if timestamps are missing?

For ImageSeries:
data is required, but not when external_files are specified, ref this:
NeurodataWithoutBorders/nwb-schema#462

@ehennestad
Copy link
Collaborator

ehennestad commented Sep 17, 2024

Failing tests after enforcing required properties on export

  • tests.unit.anonTest/testAnonDataset
  • tests.unit.linkTest/testExternalResolution
  • tests.unit.multipleConstrainedTest/testRoundabout

@ehennestad ehennestad self-assigned this Sep 17, 2024
@bendichter
Copy link
Contributor Author

For TimeSeries:
timestamps and starting_time are optional datasets. However the rate attribute of starting_time is required, but only if timestamps are missing?

The rule is: TimeStamps must have (timestamps) xor (starting_time & rate). If starting_time exists then rate must also exist, which is why it is a required property of starting_time. The schema language does not have a way of expressing an xor condition, so we don't have a way to formally indicate that you must have EITHER timestamps OR starting_time. That constraint must be specified manually by the API.

For ImageSeries:
data is required, but not when external_files are specified, ref this:
NeurodataWithoutBorders/nwb-schema#462

Yes, this is a tricky edge-case. Since ImageSeries isa child of TimeSeries, the data field is required (children cannot un-require a field). However, when you are using external mode it does not make sense to add data. This would ideally be fixed by reorganizing the parent classes in a way where you wouldn't require data when you don't want to, but the current work-around is to create an empty or near-empty dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants