Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renamed failing test, added prospective use case for NWBI warning. #1162

Merged
merged 5 commits into from
Nov 24, 2022

Conversation

TheChymera
Copy link
Contributor

@TheChymera TheChymera commented Nov 21, 2022

Addressing #1158

  • Properly rename the test which is now testing that we properly fail on a critical error
  • Formulate a new test for checking that we properly display non-critical errors (i.e. WARNINGs)

The issue with the latter is that for validation via DANDI there is only one none-critical type of error, namely check_data_orientation. Though going by the comment that too appears to be temporary.

@CodyCBakerPhD (1) do you know how I could create a test case for this if this will likely be long-term-stable behaviour or (2) is there anything else we could add as BEST_PRACTICE_VIOLATION (apparently the second-highest criterion for NWBI errors) so that we can check warning reporting functionality?

@TheChymera TheChymera marked this pull request as draft November 21, 2022 11:48
@codecov
Copy link

codecov bot commented Nov 21, 2022

Codecov Report

Base: 88.22% // Head: 88.23% // Increases project coverage by +0.01% 🎉

Coverage data is based on head (d0ef688) compared to base (b11835a).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1162      +/-   ##
==========================================
+ Coverage   88.22%   88.23%   +0.01%     
==========================================
  Files          73       73              
  Lines        8804     8817      +13     
==========================================
+ Hits         7767     7780      +13     
  Misses       1037     1037              
Flag Coverage Δ
unittests 88.23% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
dandi/files/bases.py 76.72% <ø> (ø)
dandi/cli/tests/test_cmd_validate.py 100.00% <100.00%> (ø)
dandi/tests/fixtures.py 98.25% <100.00%> (+0.06%) ⬆️
dandi/download.py 88.04% <0.00%> (-0.34%) ⬇️
dandi/cli/cmd_validate.py 73.86% <0.00%> (+2.27%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@CodyCBakerPhD
Copy link
Contributor

The code at L504-L509 was designed to be extremely minimal on the DANDI side; in particular, prior to the introduction of dandi warning levels (this new severity thing) the Inspector was only capable of returning items that ought to prevent validation of files (CRITICAL and above with the "dandi" config). But the Inspector is capable of running many more (40+) checks for warnings/hints.

In the relevant dandi-side code:

                for error in inspect_nwb(
                    nwbfile_path=self.filepath,
                    skip_validate=True,
                    config=load_config(filepath_or_keyword="dandi"),
                    importance_threshold=Importance.CRITICAL,
                ):

The idea being, previously, that someone could merely run nwbinspector prior to or separate from dandi validate in order to get complete feedback. But now that DANDI is going more in this direction, I guess the goal is to output all output from the Inspector.

So to achieve this, all you need to do is not specify the importance_threshold argument above (default is to include all levels), and you'll get the entire output stream.

For reference, you can find the config maintained here: https://github.com/NeurodataWithoutBorders/nwbinspector/blob/dev/src/nwbinspector/internal_configs/dandi.inspector_config.yaml

@CodyCBakerPhD
Copy link
Contributor

CodyCBakerPhD commented Nov 21, 2022

Formulate a new test for checking that we properly display non-critical errors (i.e. WARNINGs)

Note that a bare-minimal NWBFile (one with only a sesssion_start_time and a couple other fields; see https://github.com/NeurodataWithoutBorders/nwbinspector/blob/dev/src/nwbinspector/tools.py#L16-L18 for a nice helper for generating...) will trigger many such 'hints' regarding various missing metadata (BEST_PRACTICE_SUGGESTIONS).

BEST_PRACTICE_VIOLATIONS (your 'warnings' I guess) would likely require adding a 'bad' type of data object at some point in the file; typically a TimeSeries in acquisition is used for simplicity. One example could be a mismatch between the zero-axis shape and the length of the timestamps field.

@TheChymera
Copy link
Contributor Author

TheChymera commented Nov 21, 2022

@CodyCBakerPhD thank you for the description of the mechanics, I think I understand what the main issue is now :) Sadly, as soon as I set importance_threshold to Importance.BEST_PRACTICE_SUGGESTION that also exposes additional errors (of higher severity), the following being the command outputs I get:

dandi/cli/tests/test_cmd_validate.py::test_validate_nwb_path_grouping BEST_PRACTICE_SUGGESTION
ERROR
/tmp/pytest-of-chymera/pytest-9/simple21/simple2.nwb: 2 issues detected.
  [NWBI.check_description] Description is missing.
  [NWBI.During io.read() - <class 'TypeError'>: 'InspectorMessage' object is not iterable] Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/nwbinspector/nwbinspector.py", line 538, in inspect_nwb
    for inspector_message in run_checks(nwbfile=nwbfile, checks=checks):
  File "/usr/lib/python3.10/site-packages/nwbinspector/nwbinspector.py", line 576, in run_checks
    for x in output:
TypeError: 'InspectorMessage' object is not iterable

/tmp/pytest-of-chymera/pytest-9/simple21/simple2.nwb: 2 issues detected.
  [NWBI.check_description] Description is missing.
  [NWBI.During io.read() - <class 'TypeError'>: 'InspectorMessage' object is not iterable] Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/nwbinspector/nwbinspector.py", line 538, in inspect_nwb
    for inspector_message in run_checks(nwbfile=nwbfile, checks=checks):
  File "/usr/lib/python3.10/site-packages/nwbinspector/nwbinspector.py", line 576, in run_checks
    for x in output:
TypeError: 'InspectorMessage' object is not iterable

Do you perhaps have an inkling what this might be about, or should I upload a separate branch with the actual code that produces it?

In any case I can't trigger any BEST_PRACTICE_VIOLATION errors thus far, since there appears to be only the one check that triggers it for DANDI. I'll go ahead and tackle that next, but the fact that higher-severity errors get exposed specifically when a lower-severity threshold is specified is in any case also worth exploring, though admittedly less urgent, from the DANDI end at least.

@TheChymera
Copy link
Contributor Author

@CodyCBakerPhD I have added a50c4fe which should exemplify the status of the code that produces the issue mentioned above. I'll switch it back to BEST_PRACTICE_VIOLATION in the next one. Regarding that, do you perhaps know how I can create an NWB file which has a timeseries and therefore can produce the BEST_PRACTICE_VIOLATION importance level without at the same time missing a subject? The code examples I found for timeseries do not have a subject, and I wasn't able to add a timeseries to an NWB object with subject info:

@pytest.fixture(scope="session")
def simple4_nwb(
    simple1_nwb_metadata: Dict[str, Any], tmp_path_factory: pytest.TempPathFactory
) -> str:
    """
    With, subject, subject_id, species, but including data orientation ambiguity,
    the only currently non-critical issue in the dandi schema for nwbinspector validation:
    https://github.com/NeurodataWithoutBorders/nwbinspector/blob/
        54ac2bc7cdcb92802b9251e29f249f155fb1ff52
        /src/nwbinspector/internal_configs/dandi.inspector_config.yaml#L10
    """
    from datetime import datetime, timezone
    start_time = datetime(2017, 4, 3, 11, tzinfo=timezone.utc)
    create_date = datetime(2017, 4, 15, 12, tzinfo=timezone.utc)
    time_series=pynwb.TimeSeries(
            name="test_time_series",
            unit="test_units",
            data=np.zeros(shape=(2, 100)),
            rate=1.0,
    )
    nwbfile = NWBFile(
        session_description="demonstrate external files",
        identifier="NWBE4",
        session_start_time=start_time,
        file_create_date=create_date,
        age="P1D/",
        sex="O",
        species="Mus musculus",
    )
    nwbfile.add_acquisition(time_series)
    filename = str(tmp_path_factory.mktemp("simple4") / "simple4.nwb")
    with pynwb.NWBHDF5IO(filename, "w") as io:
        io.write(nwbfile, cache_spec=False)
    return filename

gives me:

E   TypeError: NWBFile.__init__: unrecognized argument: 'age', unrecognized argument: 'sex', unrecognized argument: 'species'

@CodyCBakerPhD
Copy link
Contributor

Subjects are added as

    nwbfile = NWBFile(
        session_description="demonstrate external files",
        identifier="NWBE4",
        session_start_time=start_time,
        subject=Subject(
            age="P1D/",
            sex="O",
            species="Mus musculus",
       )
    )

I'm looking into those other issues now

also improved test function docstrings
there must be a better way of putting longer links in docstrings...
@TheChymera
Copy link
Contributor Author

@CodyCBakerPhD thanks again for your help :) I think I finally got a test case nailed down. As for the aforementioned threshold issue, that's not strictly related to this PR, for now we're setting importance_threshold=Importance.BEST_PRACTICE_VIOLATION, though at some point in the future it would be cool if we could go even lower and get all the hints through as well. I added a comment to that effect.

@TheChymera TheChymera marked this pull request as ready for review November 22, 2022 13:43
dandi/tests/fixtures.py Outdated Show resolved Hide resolved
@TheChymera TheChymera merged commit 0439caf into master Nov 24, 2022
@TheChymera TheChymera deleted the error_fix branch November 24, 2022 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants