Filesystem paths for diffusion models #32

Lestropie · 2021-11-22T05:46:07Z

This Issue addresses the fundamental question of the file names (& potentially sub-directory placement) of diffusion model derivatives should be specified.

This is highly important for DWI models since the outcomes of fitting a model to the data typically does not result in the generation of a single output file, but many files, and quite often the individual files are not very interpretable in isolation.

Single fixed file suffix

This is the strategy that was initially proposed for BEP016 before I took a major hacksaw to things in [ENH] Restructuring of DWI derivatives #5, but this particular question was not exhaustively considered due to the large sweeping changes there.
```
<pipeline_name>/
    sub-<participant_label>/
        dwi/
            <source_keywords>[_space-<space>][_desc-<label>]_model-<label>[_parameter-<parameter>]_diffmodel.nii[.gz]
            <source_keywords>[_space-<space>][_desc-<label>]_model-<label>_diffmodel.json
```
Advantage: very easy for the BIDS validator to check derivative data for conformity, since there's a single suffix across all possible diffusion models.

Disadvantage: file names do become a bit longer than for other options, since for multiple files in the directory you will have "_model-<label> ... _diffmodel" appearing.
Unique file suffix per model

This is what is currently shown as of [ENH] Restructuring of DWI derivatives #5.
```
<pipeline_name>/
    sub-<participant_label>/
        dwi/
            <source_keywords>[_space-<space>][_desc-<label>]_parameter-<intparam1>_<model>.nii[.gz]
            <source_keywords>[_space-<space>][_desc-<label>]_parameter-<intparam2>_<model>.nii[.gz]
            <source_keywords>[_space-<space>][_desc-<label>]_<model>.json

            [<source_keywords>[_space-<space>][_desc-<label>]_parameter-<extparam1>_<model>.nii[.gz]]
            [<source_keywords>[_space-<space>][_desc-<label>]_parameter-<extparam1>_<model>.json]
            [<source_keywords>[_space-<space>][_desc-<label>]_parameter-<extparam2>_<model>.nii[.gz]]
            [<source_keywords>[_space-<space>][_desc-<label>]_parameter-<extparam2>_<model>.json]
```
Advantage: Primary identifying data contents of files are shown at end of file basename, whereas if as per 1. above this suffix is simply "diffmodel" you then have to read back through the prior key-value fields to find the "_model-" item to find out what you're looking at; fine for a machine, slightly annoying for a human.

Disadvantage: Slightly harder for the validator. The current text suggests that the specified model suffixes are compulsory only if it is one of those models that is used. This means that if the validator encounters a suffix it doesn't know, it can only produce a warning, not an error. This therefore has potentially very large ramifications for BIDS derivatives more generally (assuming it is intended that such be used for BIDS derivatives).

Use filesystem hierarchy

It may be possible for a BIDS derivatives dwi/ directory to become exceptionally full if one performs a lot of operations and generates a lot of derivative data. The ability to "group" those files corresponding to a specific model fit into a sub-directory of the modality directory would make things easier to navigate for users.

Consider an example App that both performs DWI pre-processing and fits the tensor model. Currently this might look like:

<pipeline_name>/
    sub-<participant_label>/
        dwi/
            sub-<participant_label>_desc-preproc_dwi.nii[.gz]
            sub-<participant_label>_desc-preproc_dwi.bvals
            sub-<participant_label>_desc-preproc_dwi.bvecs
            sub-<participant_label>_desc-preproc_dwi.json
            sub-<participant_label>_parameter-all_dti.nii[.gz]
            sub-<participant_label>_parameter-bzero_dti.nii[.gz]
            sub-<participant_label>_parameter-fa_dti.nii[.gz]
            sub-<participant_label>_parameter-md_dti.nii[.gz]
            sub-<participant_label>_parameter-ad_dti.nii[.gz]
            sub-<participant_label>_parameter-rd_dti.nii[.gz]

The model derivatives could instead be placed into their own sub-directory. For example (noting that there are a number of possible alternatives here):

<pipeline_name>/
    sub-<participant_label>/
        dwi/
            sub-<participant_label>_desc-preproc_dwi.nii[.gz]
            sub-<participant_label>_desc-preproc_dwi.bvals
            sub-<participant_label>_desc-preproc_dwi.bvecs
            sub-<participant_label>_desc-preproc_dwi.json
            dti/
                sub-<participant_label>_dti.nii[.gz]
                sub-<participant_label>_bzero.nii[.gz]
                sub-<participant_label>_fa.nii[.gz]
                sub-<participant_label>_md.nii[.gz]
                sub-<participant_label>_ad.nii[.gz]
                sub-<participant_label>_rd.nii[.gz]

This might be beneficial in the longer term as derivatives become more widespread & complex, in order to not clog up a basic modality directory structure with large amounts of unrelated data.

As far as the validator is concerned, this would mean that it would know to give freedom to file basename suffixes where a file resides in a sub-directory of the modality directory. It could also check that the names of those sub-directories conform. E.g. Consider the following where there are multiple tensor fits performed using different methods:

<pipeline_name>/
    sub-<participant_label>/
        dwi/
            sub-<participant_label>_desc-preproc_dwi.nii[.gz]
            sub-<participant_label>_desc-preproc_dwi.bvals
            sub-<participant_label>_desc-preproc_dwi.bvecs
            sub-<participant_label>_desc-preproc_dwi.json
            desc-linear_dti/
                sub-<participant_label>_dti.nii[.gz]
                sub-<participant_label>_bzero.nii[.gz]
                sub-<participant_label>_fa.nii[.gz]
                sub-<participant_label>_md.nii[.gz]
                sub-<participant_label>_ad.nii[.gz]
                sub-<participant_label>_rd.nii[.gz]
            desc-loglinear_dti/
                sub-<participant_label>_dti.nii[.gz]
                sub-<participant_label>_bzero.nii[.gz]
                sub-<participant_label>_fa.nii[.gz]
                sub-<participant_label>_md.nii[.gz]
                sub-<participant_label>_ad.nii[.gz]
                sub-<participant_label>_rd.nii[.gz]
            desc-iwls_dti/
                sub-<participant_label>_dti.nii[.gz]
                sub-<participant_label>_bzero.nii[.gz]
                sub-<participant_label>_fa.nii[.gz]
                sub-<participant_label>_md.nii[.gz]
                sub-<participant_label>_ad.nii[.gz]
                sub-<participant_label>_rd.nii[.gz]

The sub-directories here should not I think include "sub-<participant_label>", but should at least confirm to the basic key-value structure with a final identifier (please let me know if the BIDS community has settled on nomenclature for such things so that we can speak the same language).

The text was updated successfully, but these errors were encountered:

Lestropie · 2021-11-30T03:42:03Z

One possibility to consider regarding option 1 is that if suffix "_model" were used, this would potentially be applicable across a very wide range of model derivatives across many modalities (potentially even beyond MRI).

Lestropie · 2021-11-30T04:44:50Z

Another thought regarding option 1: With the current state of the proposal (which is option 2), there is no stringent need to separate between intrinsic and extrinsic parameters; the only difference is whether or not the data are compulsory in order to interpret the model. However if option 1 were instead adopted, one would then have to question whether it makes sense to have something like an FA parametric map stored with a file name "_model.nii[.gz]", or whether an explicit separation between intrinsic and extrinsic parameters would be necessary at the suffix.

(P.S. "Intrinsic" vs. "Extrinsic" isn't necessarily the best nomenclature and could be changed; it's maybe something more like "model parameters" vs. "model-derived parameters")

Lestropie · 2021-12-01T23:46:59Z

Will consider implementing a PR (following merge of #40) showing what option 1 might look like in reality.

Lestropie · 2021-12-02T04:49:39Z

Making an attempt at reverting back to solution 1 reminded me of one of the issues faced there.

With each unique model having a pre-defined suffix and set of parameters (i.e. option 2 & the current state of the draft), the data representation of each image is established in the specification, and so just a single JSON file can be used to store all metadata related to the model.

With option 1, with every DWI model falling under the suffix of "_model", there are two options:

Dictate in the specification the names of parameters for specific models, and those parameter names would need to appear in the file names somewhere (e.g. "_param-"). The specification would then inform as to the data representation used for each of those parameters.
Each parameter image has associated with it a JSON file, and within that JSON file there is a field that specifies from a dictionary the data representation of that image.

Personally I would prefer option 2. That would actually allow a specification that is agnostic to the nature of specific DWI models, and (with adequate foresight) any new DWI model could be stored in a BIDS Derivatives compliant way without ever having to modify the specification to integrate that model.

The problem however is the inheritance principle. The data representation being unique for each model parameter means that there needs to be a unique JSON for each model parameter image. There is however other information that apply to the model as a whole, rather than to specific parameters; e.g. model fitting parameters, mask image, that sort of thing. Rather than duplicate this information across all model parameter JSONs, it would be preferable to instead have one JSON for the model as a whole that provides this information, and one JSON per parameter that provides information relevant to that parameter only. But that would require multiple applicable JSONs at one level of the filesystem hierarchy, which is not permitted.

The way I see it, there are two possible solutions to this:

Modify the inheritance principle. My suspicion is that the reason behind not permitting multiple JSONs to be applicable in a single directory is about forbidding a situation where there is ambiguity about either which JSONs are applicable to a given image, or in what order they should be loaded (and therefore in the presence of conflicting contents, which takes precedence). I do however think that the mandate here could be more specific to the problem, in a way that prevents such ambiguity but that permits what we would require here.

For instance, here's a DWI model with two applicable JSONs in a single directory:
```
<pipeline_name>/
    sub-<participant_label>/
        dwi/
            <source_keywords>[_space-<space>][_desc-<label>]_parameter-<param1>_model.nii[.gz]
            <source_keywords>[_space-<space>][_desc-<label>]_parameter-<param1>_model.json
            <source_keywords>[_space-<space>][_desc-<label>]_parameter-<param2>_model.nii[.gz]
            <source_keywords>[_space-<space>][_desc-<label>]_parameter-<param2>_model.json
            <source_keywords>[_space-<space>][_desc-<label>]_model.json
```
There is no ambiguity here: for either image, <source_keywords>[_space-<space>][_desc-<label>]_model.json should be loaded first (if desired), and contains all information relating to the fitting of the model to the DWI data; <source_keywords>[_space-][_desc-]_parameter-<param#>_model.json` is loaded second and contains information relating to the derivation of that parameter from the model, and to the data representation of that image.

A hypothetical case like this would however be ambiguous:
```
<pipeline_name>/
    sub-<participant_label>/
        dwi/
            <source_keywords>_space-X_desc-Y_model.nii[.gz]
            <source_keywords>_space-X_desc-Y_model.json
            <source_keywords>_space-X_model.json
            <source_keywords>_desc-Y_model.json
```
If interested in all JSON contents related to the image above, there is no reasonable rule regarding which of the two other JSONs should be loaded first.

So the appropriate rule would be that, for any image, multiple JSONs within a single directory must have a strict ordering defined by subsets of the filename entities, with the load order being the reverse of such.

Utilize directory structure, as per Option 3 in the first post of this thread.

This might look something like:

<pipeline_name>/
    sub-<participant_label>/
        dwi/
            sub-<participant_label>_desc-preproc_dwi.nii[.gz]
            sub-<participant_label>_desc-preproc_dwi.bvals
            sub-<participant_label>_desc-preproc_dwi.bvecs
            sub-<participant_label>_desc-preproc_dwi.json
            sub-<participant_label>_desc-tensor_model/
                sub-<participant_label>_parameter-all_model.nii[.gz]
                sub-<participant_label>_parameter-all_model.json
                sub-<participant_label>_parameter-bzero_model.nii[.gz]
                sub-<participant_label>_parameter-bzero_model.json
                sub-<participant_label>_parameter-fa_mdp.nii[.gz]
                sub-<participant_label>_parameter-fa_mdp.json
                sub-<participant_label>_parameter-md_mdp.nii[.gz]
                sub-<participant_label>_parameter-md_mdp.json
                sub-<participant_label>_parameter-ad_mdp.nii[.gz]
                sub-<participant_label>_parameter-ad_mdp.json
                sub-<participant_label>_parameter-rd_mdp.nii[.gz]
                sub-<participant_label>_parameter-rd_mdp.json
            sub-<participant_label>_desc-tensor_model.json

Or, as per option 3, it could instead be tarballed:

<pipeline_name>/
    sub-<participant_label>/
        dwi/
            sub-<participant_label>_desc-preproc_dwi.nii[.gz]
            sub-<participant_label>_desc-preproc_dwi.bvals
            sub-<participant_label>_desc-preproc_dwi.bvecs
            sub-<participant_label>_desc-preproc_dwi.json
            sub-<participant_label>_desc-tensor_model.tar[.gz]
            sub-<participant_label>_desc-tensor_model.json

Lestropie · 2021-12-02T21:53:53Z

I originally posted about the above issue in bids-standard/bids-specification#259.

I think what I need to do is first write a proposal PR for bids-standard/bids-specification#482 to reduce ambiguity about the current state of the inheritance principle, and then write a second PR proposing specific language for bids-standard/bids-specification#259, i.e. what the language would need to be in order to facilitate what I'm looking for here.

francopestilli · 2022-01-09T20:05:54Z

It looks like this one is a priority. If we move on with this we are left with a single PR is that right?

Lestropie · 2022-01-11T00:26:21Z

This is the primary one on which I want to get feedback from BIDS maintainers before proceeding much further. bids-standard/bids-specification#946 is ready for review; I then need to, on top of that, propose a solution for bids-standard/bids-specification#259 in order to facilitate option 1, which I think is my personal long-term preference. It was indicated in a comment here that maintainers may be open to doing so as a minor rather than major revision, though this would require dialogue with validator maintainers as well.

Lestropie · 2024-05-15T02:34:38Z

With rejection of bids-standard/bids-specification#1003 and bids-standard/bids-specification#1280 and acceptance of bids-standard/bids-specification#1602, proceeding with option 1 "single file suffix" "_dwimap" and no modification to either inheritance or directory structure.

This was referenced Nov 22, 2021

Major derivatives decisions #29

Open

Storage of non-conforming derivatives #33

Closed

Lestropie mentioned this issue Dec 2, 2021

BEP016: Initial draft of common filename suffix "model" #46

Merged

This was referenced Dec 2, 2021

BEP016: Rename "dti" to "tensor" #47

Merged

[FIX] Rewrite inheritance principle bids-standard/bids-specification#946

Merged

Lestropie mentioned this issue Feb 7, 2022

[ENH] Inheritance principle: Relaxation allowing multiple files per directory bids-standard/bids-specification#1003

Closed

Lestropie mentioned this issue Jun 20, 2022

Decisions on BIDS derivatives structure #50

Closed

yarikoptic mentioned this issue Aug 10, 2022

Inheritance: Further Relaxation allowing multiple files per directory bids-standard/bids-specification#1195

Open

Lestropie mentioned this issue Sep 21, 2022

Debating need for advanced inheritance #70

Closed

Lestropie mentioned this issue Jun 29, 2023

Change modality directories to entity format bids-standard/bids-2-devel#55

Open

Lestropie mentioned this issue Jan 10, 2024

BEP016 finalization #89

Open

4 tasks

Lestropie closed this as completed May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filesystem paths for diffusion models #32

Filesystem paths for diffusion models #32

Lestropie commented Nov 22, 2021

Lestropie commented Nov 30, 2021

Lestropie commented Nov 30, 2021

Lestropie commented Dec 1, 2021

Lestropie commented Dec 2, 2021

Lestropie commented Dec 2, 2021

francopestilli commented Jan 9, 2022

Lestropie commented Jan 11, 2022

Lestropie commented May 15, 2024

Filesystem paths for diffusion models #32

Filesystem paths for diffusion models #32

Comments

Lestropie commented Nov 22, 2021

Lestropie commented Nov 30, 2021

Lestropie commented Nov 30, 2021

Lestropie commented Dec 1, 2021

Lestropie commented Dec 2, 2021

Lestropie commented Dec 2, 2021

francopestilli commented Jan 9, 2022

Lestropie commented Jan 11, 2022

Lestropie commented May 15, 2024