[FIX] Rewrite inheritance principle #946

Lestropie · 2021-12-05T23:57:30Z

Draft PR in search of feedback.

This change relates to #102, #259, #482; am escalating due to re-invigoration of BEP016, in particular bids-standard/bids-bep016#32.

IMO, there are multiple issues with the current description of the inheritance principle:

Some details are reiterated in different ways, which shouldn't be necessary if the principle is stated unambiguously;
Despite the availability & use of filesystem tree examples, many aspects of the principle are described through hypothetical modifications rather than presenting such;
Some instances of point 2 result in reference to either files that don't exist or would need to be renamed in order to satisfy the change suggested;
There is a persistent blurring between official description of the principle that should be interpreted literally, and recommendation / guidance.
Inheritance is described with respect to the lower-level JSON file, rather than with respect to a data file to which multiple JSONs may be applicable.
The logic by which software tools should classify which JSONs are or are not applicable to a given data file is not adequately clear.

Given the breadth of these issues I think a re-write is warranted. I am completely open to proposals for modifications or proposal of outright alternatives, but the ball needs to start rolling somewhere.

Full disclosure: as per #259, I would like to pursue the prospect of modifying the inheritance principle itself, not just the description of such; specifically removal of the preclusion of having multiple applicable JSONs at one level of the hierarchy. However here I am focusing solely on modifying the text describing the principle as it currently stands, which would be applicable to a patch update. Changing the principle itself should IMO be deferred to a minor or potentially major update; in addition, the isolation of that specific aspect and modification thereof should be somewhat facilitated by the clarification of the current state of the principle following these changes.

Remi-Gau · 2021-12-06T07:09:24Z

pinging @yarikoptic and @VisLab as they both have expressed interest in this issue in the past

Remi-Gau · 2021-12-06T07:15:16Z

src/02-common-principles.md

+Any metadata file (such as `.json`, `.bvec` or `.tsv`) MAY be defined at any
+directory level. For any given data file, any metadata file at that directory
+level or higher that does not include any entities absent from the name of the
+data file and possesses the same suffix are applicable to that data file. Such
+files are loaded from the top of the directory hierarchy downwards, such that
+values from the top level are inherited by all data files at lower levels to
+which it is applicable unless overridden by a value for the same key present
+in another metadata file at a lower level (though it is RECOMMENDED to minimise
+the extent of such overrides). There is no notion of "unsetting" a
+key/value pair.


Suggested change

Any metadata file (such as `.json`, `.bvec` or `.tsv`) MAY be defined at any

directory level. For any given data file, any metadata file at that directory

level or higher that does not include any entities absent from the name of the

data file and possesses the same suffix are applicable to that data file. Such

files are loaded from the top of the directory hierarchy downwards, such that

values from the top level are inherited by all data files at lower levels to

which it is applicable unless overridden by a value for the same key present

in another metadata file at a lower level (though it is RECOMMENDED to minimise

the extent of such overrides). There is no notion of "unsetting" a

key/value pair.

- Any metadata file (such as `.json`, `.bvec` or `.tsv`) MAY be defined at any directory level.

- For a given data file, any metadata file at that directory level or higher

is applicable to that data file if:

- the metadata and the data filenames possess the same suffix,

- the metadata filename does not include any entity absent from the data filename.

- Such files are loaded from the top of the directory hierarchy downwards,

such that values from the top level are inherited by all data files

at lower levels to which it is applicable unless overridden

by a value for the same key present in another metadata file at a lower level

(though it is RECOMMENDED to minimise the extent of such overrides).

- There is no notion of "unsetting" a key/value pair.

I think this is an improvement. However, there is still some confusion because the notion of a 'key' makes sense for json but I don't think it makes sense for tsv files. I have no idea how to apply this to bvec files.

Yes, I'd contemplated bullet-pointing this myself; or potentially even enumerating so that datasets violating specific requirements / software code can be cross-referenced more precisely?

@VisLab: Good catch. There's going to need to be a hard split between JSON and other metadata files. For .bvec / .bval / .tsv, it I think needs to be simply selection of whichever applicable file is lowest in the directory tree; anything above it gets ignored, there's no reason to describe the contents of such as being "overwritten" by the content of files lower down. Only edge case that comes to mind would be .tsv files that happen to contain exactly the same columns (including their titles), but even then it's hard to justify choosing between a concatenation and a replacement. Are there any other metadata types that are worth being familiar with for the sake of getting the language / logic of the inheritance principle right?

Good idea about using an ordered list and referring to those points in the example.

So am I understanding that for JSON the resolution is top-down, with top (closest to the root) having precedence.
For the others it is bottom up, with the bottom (farthest from the root) having precedence?

The notions of "lowest" and "highest" level in the tree isn't necessarily clear. Does lowest mean farthest from the root?

I honestly thought only JSON files got inherited in the sense that there could be multiple applicable ones for the same suffix.

I think the only case of "official" of TSV inheritance we have around is having a single task-foo_events.tsv in the root of the directory in case all subjects / runs had the same design:

see from the BIDS examples

╰─⠠⠵ ls -l */*task*tsv -rw-rw-r-- 1 remi remi 143 Nov 10 11:33 ds114/task-covertverbgeneration_events.tsv -rw-rw-r-- 1 remi remi 280 Nov 10 11:33 ds114/task-fingerfootlips_events.tsv -rw-rw-r-- 1 remi remi 143 Nov 10 11:33 ds114/task-overtverbgeneration_events.tsv -rw-rw-r-- 1 remi remi 127 Nov 10 11:33 ds114/task-overtwordrepetition_events.tsv -rw-rw-r-- 1 remi remi 1054 Nov 10 11:33 eeg_ds000117/task-facerecognition_channels.tsv -rw-rw-r-- 1 remi remi 1807 Nov 10 08:42 synthetic/task-nback_events.tsv

We probably need to expand the discussion of events inheritance. Inherited events.tsv files are needed at several levels.

For example, because BIDS forces multi-modality data to be separated into separate directories, simultaneously recorded MEG/EEG or fMRI/EEG will have the same event files for each run.

In the following example from hed-examples, EEG and MEG were recorded simultaneously, but separated into separate modality directories.

ds_003654s: task-FacePerception_events.json sub-02: sub-002_task-FacePerception_events.json sub-002_task-FacePerception_run-1_events.tsv eeg: sub-002_task-FacePerception_run-1_eeg.set meg: sub-002_task-FacePerception_run-1_meg.fif . . .

Right now we are assuming that multiple events.json files can be applicable to a given events.tsv files and that the keys are resolved from directory root down in the JSON files. I think this is the right behavior for JSON files, but it needs to be clarified in the specification.

We have only made a single events.tsv applicable to any given recording, but that events.tsv file could appear anywhere in the directory hierarchy.

Events are a special case, and I think it might be feasible/useful to allow multiple events files at different levels to apply to a given run. The events are the union of the rows/join of columns. This might be too complicated.

The issue also comes up with derivatives. Here the derivatives might have their own set of events. If the events from the raw data also apply do they need to always be copied into the events files? There was some discussion of this in the past, but there didn't seem to be a clear resolution or consensus on this when the discussion occurred.

I also have a general question about inheritance. Is the matching of entity components and their names case sensitive?

The notions of "lowest" and "highest" level in the tree isn't necessarily clear. Does lowest mean farthest from the root?

I always have to cross-reference elsewhere in the spec to remind myself which is which. Does anyone know if there's a robust reference-able definition somewhere? Or does this need to be defined upfront in the spec?

Is the matching of entity components and their names case sensitive?

I think the case sensitivity issues was solved in #858

Co-authored-by: Remi Gau <[email protected]>

- Change primary rules from dot points to enumerations, and reference rules by number where applicable. - Move from subsequent text into this list rules relating to improper placement of metadata files within the directory structure, and not permitting multiple applicable files at one level of the hierarchy. - Add "corollaries" section incorporating text relating to interpretation and consequences of the rules, separating them from subsequent examples.

As the inheritance principle makes reference to tabular files and JSON key-value dictionaries, but there are no such references in the opposite direction, these files should be defined prior to introduction of the Inheritance Principle.

Lestropie · 2021-12-13T00:04:25Z

Lots of changes (not expecting this to be a quick PR). I think that it is by necessity transitioning from the original text being along the lines of "here's how you should do this thing" to what is really required here for the sake of stringent enforcement and validation being "here's the rules", with the downstream consequences of such expressed in a more user-friendly way separate to the rules themselves (what's currently called "corollaries" here could conceivably be expanded quite a lot).

Note that while @Remi-Gau's change proposal above was merged, I've explicitly unresolved the conversation as there is a lot of content there unrelated to that specific proposal that have not been resolved. Would suggest branching separate chains of thought into separate comment threads tied to the relevant lines; this one is invariably going to be messy, best to mitigate it where we can.

Having a metadata file applicable to multiple data files can occur not only at one level of the directory hierarchy, but additionally to many data files at lower levels of the hierarchy.

VisLab · 2021-12-13T12:57:31Z

I have reviewed the commits and the bullet points make things much clearer. Thank you!

yarikoptic

I really like it, well done. IMHO should be taken out of the draft! ;)

Left some minor comments

src/02-common-principles.md

yarikoptic · 2021-12-18T04:38:08Z

src/02-common-principles.md

+1.  There MUST NOT be multiple metadata files applicable to a data file at one level
+    of the directory hierarchy.


FWIW: it is such a nice concise rule which helps to avoid ambiguity and the workaround in Example 3 is quite cute ;)

OH -- I remembered what bothered me in my original #102 and outlined in Edit 1 there. Citing an example from there:

I placed myself into a corner with an example of having e.g.

sub-1_task-task1_run-1_bold.json and sub-1_task-task1_acq-X_run-1_bold.json

per subject (should be ok), and then trying to aggregate over them while retaining also _acq- if defined.

and there are many other entities/use-cases which would fall into similar situation. What I am thinking is to add a clarification as subitem here or a separate rule:

A metadata file at the data files level of hierarchy MUST not be considered for inheritance if there is a matching in entities data file.

This would make sub-1_task-task1_run-1_bold.json not a to be considered for providing metadata to enrich sub-1_task-task1_acq-X_run-1_bold.json, and examples below would stay valid (we might want to add an example for this rule though)

it is such a nice concise rule which helps to avoid ambiguity and the workaround in Example 3 is quite cute ;)

Well, actually (from first post):

Full disclosure: as per #259, I would like to pursue the prospect of modifying the inheritance principle itself, not just the description of such; specifically removal of the preclusion of having multiple applicable JSONs at one level of the hierarchy.

😬

The proposed changes here do at least I think demonstrate what would need to change in order to facilitate such. Any text I propose for such will need to be very clear for both users and software creators exactly what is permitted vs. not permitted and in what order JSONs should be loaded. But as I said in the original post, this is not a change in descriptive language but a change in prescriptive permissible structure, and so IMO necessitates either a minor or major version change rather than just a patch.

Gotcha. Given the currently loose formulation of the principle, depending on the changes to it, I think it might be feasible to get it done within minor revision. So it would be great to see this PR be finalized/merged soon.

src/02-common-principles.md

Resolves bids-standard#946 with bids-standard#962. Co-authored-by: Yaroslav Halchenko <[email protected]>

sappelhoff

This is wonderful work, thanks a lot! I left a few comments

src/02-common-principles.md

sappelhoff · 2021-12-21T10:24:13Z

src/02-common-principles.md

+When reading image `sub-01/func/sub-01_task-rest_acq-default_bold.nii.gz`, only
+metadata file `task-rest_bold.json` is read; file
+`sub-01/func/sub-01_task-rest_acq-longtr_bold.json` is inapplicable as it contains
+entity "`acq-longtr`" that is absent from the image path (rule 2.3). When reading image


Rule 2.c, see my comment above.

I see that below this would be "5.b.ii" instead of 5.2.2, so maybe we need to discuss my suggestion - as 5.b.ii is kind of ugly.

Revised by 7bc7ad5, but not yet resolving in case the enumeration formatting needs to be discussed further.

Looking at it again, I don't find it ugly anymore :-) Thanks for making the change.

src/02-common-principles.md

Co-authored-by: Stefan Appelhoff <[email protected]>

The rule regarding the fact that it is not possible to "unset" a key-value pair from a JSON file from higher in the filesystem hierarchy is here moved to the "corollaries" section. This is because this behaviour is a natural consequence of loading consecutive JSON files using a simple merge operation, and the absence of an equivalent to Python' "None" in the JSON specification.

Lestropie · 2022-01-11T02:37:37Z

Invested parties please also see Lestropie#3, which proposes an alteration to the contents of this PR.

sappelhoff · 2022-01-23T15:09:06Z

@Lestropie is there anything you wanted to discuss or get into this PR? Or should we do a final review?

Lestropie · 2022-01-23T23:42:19Z

@sappelhoff I'm done tinkering, content as it currently stands. Will have a go at #259 in a separate PR as discussed.

"linkchecker" CI test failed, naively it looks like it might have been a one-off failure that just requires a re-run since it doesn't seem to me to relate to the proposed changes?

sappelhoff · 2022-01-24T08:31:13Z

"linkchecker" CI test failed, naively it looks like it might have been a one-off failure that just requires a re-run since it doesn't seem to me to relate to the proposed changes?

That issue was solved in master, so I merged those changes back here - should all be fine now.

I'm done tinkering, content as it currently stands.

awesome, let's try to get a final approval by a bunch of people and merge this!

VisLab · 2022-01-24T19:05:37Z

There is a problem with the bullet point numbering when I do the view file on GitHub. The first set of bullet points uses 2 (with subcategories i, ii, and iii), but the text refers to 2 (a, b, c).

Similarly in the corollaries, rule three refers to "per rule 5.2"

effigies

Overall looks good. Two minor issues.

effigies · 2022-01-19T00:21:58Z

src/02-common-principles.md

+
+1.  It is permissible for a single metadata file to be applicable to multiple data
+    files at that level of the hierarchy or below. Where such metadata content is consistent
+    across multiple data files, it is RECOMMENDED to store metadata in this


RECOMMENDED

This is a best practice written in the language of a validatable rule. Some prefer this approach, others prefer to have their sidecars per-data-file. I think we should not make this recommendation, as validation will be tricky and annoying to users who would prefer the alternative.

I'm okay with taking out the recommended here.

This is a best practice written in the language of a validatable rule.

This seems to extend beyond the RFC2119 definition. Is this an established BIDS-specific interpretation that is documented somewhere?

Would the same criticism not also apply to 5.2?

For JSON files, key-values are loaded from files from the top of the directory hierarchy downwards, such that key-values from the top level are inherited by all data files at lower levels to which it is applicable unless overridden by a value for the same key present in another metadata file at a lower level (though it is RECOMMENDED to minimise the extent of such overrides).

Issuing a validator warning regarding the presence of individual key-value overrides may be unexpected.

Is this an established BIDS-specific interpretation that is documented somewhere?

AFAIK this is not documented anywhere, but among maintainers we have often discussed that RECOMMENDED corresponds to a "warning" level in the validator, and MUST corresponds to an "error" level in the validator. The main point of this consideration is to not design a specification that we cannot reasonably validate.

Having that said, there are many cases in the spec where we RECOMMEND or require (MUST), where validation is currently not happening and might be difficult to implement --- as you correctly point out in your second point.

Personally I am fine with having some recommendations that we cannot "warn" about (see especially 5.b here, but maybe also the point raised by Chris.).

src/02-common-principles.md

…lues

Lestropie · 2022-01-31T03:08:21Z

There is a problem with the bullet point numbering when I do the view file on GitHub.

I think the most important correspondence here is going to be not with a GitHub rendering, but with the website / PDF renderings. I couldn't find an existing example; perhaps if someone could duplicate this branch on the main repo, then set up ReadTheDocs to render that branch, we could check what the current behaviour is there?

sappelhoff · 2022-01-31T09:19:26Z

perhaps if someone could duplicate this branch on the main repo, then set up ReadTheDocs to render that branch, we could check what the current behaviour is there?

We should have all the needed checks available from the CI:

pdf
html

src/02-common-principles.md

Co-authored-by: Stefan Appelhoff <[email protected]>

sappelhoff · 2022-02-01T18:14:00Z

Thanks for leading this effort @Lestropie! I think it's a huge improvement in terms of clarity.

Relates to discussion in bids-standard#259, and was also raised in bids-standard#946. Instead of enforcing a unique order of metadata file inheritance via filesystem hierarchy only, the logic behind the inheritance principle is generalised to permit multiple files per directory whilst still ensuring that the order of metadata file loading is still unique.

Lestropie added 2 commits December 6, 2021 10:38

[FIX] Rewrite inheritance principle

c230301

[FIX] Fix trailing space src/02_common_principles.md bids-standard#946

2202697

Remi-Gau added the inheritance label Dec 6, 2021

Remi-Gau reviewed Dec 6, 2021

View reviewed changes

Lestropie and others added 3 commits December 13, 2021 10:11

Inheritance principle: Use bullet points

0abc930

Co-authored-by: Remi Gau <[email protected]>

Lestropie added 2 commits December 13, 2021 11:04

Inheritance principle: Formatting fixes

399a907

Inheritance principle: Tweak to corollary

8091e3c

Having a metadata file applicable to multiple data files can occur not only at one level of the directory hierarchy, but additionally to many data files at lower levels of the hierarchy.

yarikoptic approved these changes Dec 18, 2021

View reviewed changes

Common principles: Harmonize "file name" -> "filename"

1839b58

Resolves bids-standard#946 with bids-standard#962. Co-authored-by: Yaroslav Halchenko <[email protected]>

Lestropie added a commit to Lestropie/bids-specification that referenced this pull request Dec 19, 2021

02-common-principles.md: Formatting fixes for bids-standard#946

8549e00

Lestropie added a commit to Lestropie/bids-specification that referenced this pull request Dec 19, 2021

02-common-principles.md: Formatting fixes for bids-standard#946

e91cb98

Lestropie force-pushed the rewrite_inheritance_principle branch from 8549e00 to e91cb98 Compare December 19, 2021 23:19

Lestropie added a commit to Lestropie/bids-specification that referenced this pull request Dec 19, 2021

02-common-principles.md: Formatting fixes for bids-standard#946

24e6489

Lestropie force-pushed the rewrite_inheritance_principle branch from e91cb98 to 24e6489 Compare December 19, 2021 23:21

02-common-principles.md: Formatting fixes for bids-standard#946

01aa790

Lestropie force-pushed the rewrite_inheritance_principle branch from 24e6489 to 01aa790 Compare December 20, 2021 00:43

Lestropie mentioned this pull request Dec 20, 2021

Inheritance principle: Include enumeration indices in source Lestropie/bids-specification#2

Open

Lestropie marked this pull request as ready for review December 20, 2021 01:13

Inheritance principle: Fix JSON formatting

7ba50a6

sappelhoff reviewed Dec 21, 2021

View reviewed changes

Inheritance principle: Minor changes and formatting

7bc7ad5

Co-authored-by: Stefan Appelhoff <[email protected]>

Lestropie force-pushed the rewrite_inheritance_principle branch from 8e808a1 to 7bc7ad5 Compare January 5, 2022 06:18

Lestropie mentioned this pull request Jan 11, 2022

Filesystem paths for diffusion models bids-standard/bids-bep016#32

Closed

Merge branch 'master' into rewrite_inheritance_principle

97d319b

sappelhoff requested review from Remi-Gau, VisLab, bendhouseart, effigies, ericearl and tsalo January 24, 2022 08:35

Remi-Gau approved these changes Jan 24, 2022

View reviewed changes

effigies reviewed Jan 25, 2022

View reviewed changes

VisLab approved these changes Jan 26, 2022

View reviewed changes

Inheritance principle: Revise EchoTime and RepetitionTime exemplar va…

33cc25a

…lues

sappelhoff reviewed Jan 31, 2022

View reviewed changes

src/02-common-principles.md Outdated Show resolved Hide resolved

sappelhoff reviewed Jan 31, 2022

View reviewed changes

src/02-common-principles.md Outdated Show resolved Hide resolved

Lestropie and others added 2 commits February 1, 2022 11:51

Inheritance principle: Use American spelling

7af3e05

Co-authored-by: Stefan Appelhoff <[email protected]>

Inheritance principle: Fix enumeration cross-reference

525ab87

Co-authored-by: Stefan Appelhoff <[email protected]>

effigies merged commit ec8f98a into bids-standard:master Feb 1, 2022

bids-maintenance added a commit that referenced this pull request Feb 1, 2022

[DOC] Auto-generate changelog entry for PR #946

c0cacdc

Lestropie mentioned this pull request Feb 7, 2022

[ENH] Inheritance principle: Relaxation allowing multiple files per directory #1003

Closed

sappelhoff mentioned this pull request Apr 9, 2022

make inheritance applicability explicit for scan.json #789

Closed

Remi-Gau mentioned this pull request Jun 9, 2022

Inheritance principle: clarify the procedure of which files would be considered #102

Closed

Lestropie mentioned this pull request Apr 26, 2024

[ENH] Describe Inheritance Principle in Common Principles #1807

Merged

Lestropie mentioned this pull request May 21, 2024

Incomplete metadata consensus Lestropie/IP-freely#7

Open

-Any metadata file (such as `.json`, `.bvec` or `.tsv`) MAY be defined at any
-directory level. For any given data file, any metadata file at that directory
-level or higher that does not include any entities absent from the name of the
-data file and possesses the same suffix are applicable to that data file. Such
-files are loaded from the top of the directory hierarchy downwards, such that
-values from the top level are inherited by all data files at lower levels to
-which it is applicable unless overridden by a value for the same key present
-in another metadata file at a lower level (though it is RECOMMENDED to minimise
-the extent of such overrides). There is no notion of "unsetting" a
-key/value pair.
+- Any metadata file (such as `.json`, `.bvec` or `.tsv`) MAY be defined at any directory level.
+- For a given data file, any metadata file at that directory level or higher
+is applicable to that data file if:
+  - the metadata and the data filenames possess the same suffix,
+  - the metadata filename does not include any entity absent from the data filename.
+- Such files are loaded from the top of the directory hierarchy downwards,
+such that values from the top level are inherited by all data files
+at lower levels to which it is applicable unless overridden
+by a value for the same key present in another metadata file at a lower level
+(though it is RECOMMENDED to minimise the extent of such overrides).
+- There is no notion of "unsetting" a key/value pair.

		1. There MUST NOT be multiple metadata files applicable to a data file at one level
		of the directory hierarchy.

[FIX] Rewrite inheritance principle #946

[FIX] Rewrite inheritance principle #946

Conversation

Lestropie commented Dec 5, 2021

Remi-Gau commented Dec 6, 2021

Remi-Gau Dec 6, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Lestropie commented Dec 13, 2021

VisLab commented Dec 13, 2021

yarikoptic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sappelhoff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Lestropie Jan 5, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Lestropie commented Jan 11, 2022

sappelhoff commented Jan 23, 2022

Lestropie commented Jan 23, 2022

sappelhoff commented Jan 24, 2022

VisLab commented Jan 24, 2022

effigies left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Lestropie commented Jan 31, 2022

sappelhoff commented Jan 31, 2022

sappelhoff commented Feb 1, 2022

Remi-Gau Dec 6, 2021 •

edited

Loading

Lestropie Jan 5, 2022 •

edited

Loading