[WIP] introduce BIDSLayoutV2 #863

erdalkaraca · 2022-05-28T08:38:25Z

In agreement with @gkiar, the following approach allows for a better migration to pybids-core:

introduce a new interface BIDSLayoutV2 which uses ancpBIDS to mimic functionality from the BIDSLayout legacy interface
ancpBIDS will be dynamically imported at run-time and raise a warning if it is not installed
users can decide which interface to use and experiment with BIDSLayoutV2 without breaking their existing code

Once we get sufficient user feedback such as bugs or critical implementation gaps, the BIDSLayoutV2 can be renamed to BIDSLayout while the old implementation may exist as BIDSLayoutLegacy within the next releases.

erdalkaraca · 2022-05-28T08:40:17Z

Note:
there is a new test module: test_layout_v2.py

I have not yet fixed all unit tests in that module as this is work-in-progress

codecov · 2022-05-28T08:43:03Z

Codecov Report

Base: 86.24% // Head: 83.96% // Decreases project coverage by -2.28% ⚠️

Coverage data is based on head (7f08cdb) compared to base (de95cb5).
Patch coverage: 12.09% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #863      +/-   ##
==========================================
- Coverage   86.24%   83.96%   -2.29%     
==========================================
  Files          32       33       +1     
  Lines        3904     4028     +124     
  Branches      947      966      +19     
==========================================
+ Hits         3367     3382      +15     
- Misses        346      455     +109     
  Partials      191      191

Impacted Files	Coverage Δ
bids/layout/layout_v2.py	`7.62% <7.62%> (ø)`
bids/layout/__init__.py	`100.00% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

adelavega · 2022-05-31T15:24:39Z

Awesome. I think a full integration test using BIDSLayoutV2 would be most informative.

That is, is those of use that use BIDSLayout heavily try to use BIDSLayoutV2 instead in our more complex apps. I will try to set aside some time for this soon.

bids/layout/__init__.py

define BIDSLayoutV2 as function if ancpbids package not installed Co-authored-by: Chris Markiewicz <[email protected]>

gkiar · 2022-06-16T14:44:08Z

Hey @erdalkaraca are you around this week to discuss?

…ple, run)

effigies · 2022-07-22T16:00:26Z

To me this feels like a big enough change to warrant a 1.0 release when we can make it happen. I won't have any time to work on this until at the absolute earliest August 22, but I do think this is a pretty important thing to push on as people are able.

Could have a ~30 minute call next week if people are around and want to discuss strategy.

erdalkaraca · 2022-07-22T17:20:07Z

Depends on ANCPLabOldenburg/ancp-bids#60

adelavega

Hello!

I finally had time to run these tests myself. Honestly, all in all the API differences between ancpbids and pybids seem minimal at this point, and I think we can get this done soon.

The main issue I identified were:

ancpbids seems to load all derivatives eagerly, even if derivatives=True is not set. This actually is fairly important as you may specifically not want to load derivatives. We may want to discuss if its better to "filter" derivatives prior to instantiation of a layout, or at the time of a query. I think the latter is more flexible and generic, but I think there may be times you really don't want to index a derivative.
ancpbids currently uses the "literal" name of entities (e.g. "ses" instead of "session")
some __repr__s are different and could be improved. Don't need to be identical but it's something to make a decision on
get_fieldmap is not working. Looks like a bug in ancpbids's getattr method

the rest seem like small issue that are very doable to fix, and i've commented on all known test failrues

adelavega · 2022-11-18T20:06:43Z

bids/layout/tests/test_layout_v2.py

+        )
+        assert len(unvalidated.get()) == 4
+        with pytest.raises(ValueError):
+            unvalidated.get(desc="preproc")


Looks like ancpbids returns files here, because desc is not a valid entity.

By default, BIDSLayout does not allow unrecognized entities, and raises an error.

You can allow them using invalid_filters='allow', but even then, since it doesn't match, it returns empty lis: []

adelavega · 2022-11-18T20:16:01Z

bids/layout/tests/test_layout_v2.py

+
+    def test_dataset_missing_generatedby_fails_validation(self):
+        dataset_path = Path("ds005_derivs", "format_errs", "no_pipeline_description")
+        with pytest.raises(BIDSDerivativesValidationError):


Expected error:

*** bids.exceptions.BIDSDerivativesValidationError: Every valid BIDS-derivatives dataset must have a GeneratedBy.Name field set inside 'dataset_description.json'. Example: {'GeneratedBy': [{'Name': 'Example pipeline'}]}

That seems like an easy shim

adelavega · 2022-11-18T20:27:42Z

bids/layout/tests/test_layout_v2.py

+    target = 'sub-01/ses-1/func/sub-01_ses-1_task-rest_acq-fullbrain_run-1_bold.nii.gz'
+    target = target.split('/')
+    result = layout_7t_trt.get_metadata(
+        join(layout_7t_trt.root, *target), include_entities=True)


include_entities doesn't seem to work due to:

ANCPLabOldenburg/ancp-bids#66

adelavega · 2022-11-18T20:29:38Z

bids/layout/tests/test_layout_v2.py

+    with pytest.raises(TargetError) as exc:
+        layout_7t_trt.get(target='unicorn')
+    msg = str(exc.value)
+    assert 'subject' in msg and 'reconstruction' in msg and 'proc' in msg


The difference here is ancpbids is using the literal entity value, and pybids uses the full name.

ancpbids:

"Unknown target 'unicorn'. Valid targets are: ['task', 'acq', 'run', 'sub', 'ses']"

pybids:

"Unknown target 'unicorn'. Valid targets are: ['subject', 'session', 'sample', 'task', 'acquisition', 'ceagent', 'staining', 'tracer', 'reconstruction', 'direction', 'run', 'proc', 'modality', 'echo', 'flip', 'inv', 'mt', 'part', 'recording', 'space', 'chunk', 'suffix', 'scans', 'fmap', 'datatype', 'extension', 'EchoTime2', 'EchoTime1', 'IntendedFor', 'CogAtlasID', 'EchoTime', 'EffectiveEchoSpacing', 'PhaseEncodingDirection', 'RepetitionTime', 'SliceEncodingDirection', 'SliceTiming', 'TaskName', 'StartTime', 'SamplingFrequency', 'Columns']"

I also see that pybids lists all of the metadata keys, not just core entities. This may cause problems elsewhere (although I'm not suggesting we need to keep this logic)

pybids returning also metadata keys is a bit confusing as they are no entities in terms of the spec... not sure, but maybe there is an overlap between BIDS spec "entity" and sqlalchemy DB table name "Entity"

I will raise an issue about this in pybids-refactor, but it is useful functionality to be able to query by meta-data keys

adelavega · 2022-11-18T20:35:47Z

bids/layout/tests/test_layout_v2.py

+
+
+def test_get_bvals_bvecs(layout_ds005):
+    dwifile = layout_ds005.get(subject="01", datatype="dwi")[0]


Entities associated with the targetfile

ancpbids: [{'key': 'sub', 'value': '01'}]
pybids: {'datatype': 'dwi', 'extension': '.nii.gz', 'subject': '01', 'suffix': 'dwi'}

the ancpbids repr also lists: {'name': 'sub-01_dwi.nii.gz', 'extension': '.nii.gz', 'suffix': 'dwi'}

Not sure why ancpbids misses the datatype entity? Is it related to the new schema?

BTW, this also makes me realize we currently treat extension as an "entity" even though it's technically not. This will for sure cause problems in many pipeline that use the BIDSImageFile.entities attribute (cc: @effigies)

Okay so in general datatype seems to not be a valid entity in ancpbids

Okay so in general datatype seems to not be a valid entity in ancpbids

Yes, datatype is not listed as an entity in the schema. Maybe it makes sense to treat it the same way as extension/suffix, i.e. as an additional property of the Artifact class.

adelavega · 2022-11-18T20:50:21Z

bids/layout/tests/test_layout_v2.py

+    bold_files = layout_ds005.get(suffix='bold', run=1, subject='01', session='*')
+    assert not bold_files
+    bold_files = layout_ds005.get(suffix='bold', run=1, subject='01')
+    assert len(bold_files) == 1


This fails because ancpbids returns:

[ {'name': 'sub-01_task-mixedgamblestask_run[...]', 'extension': '.nii.gz', 'suffix': 'bold'}, {'name': 'sub-01_task-mixedgamblestask_run[...]', 'extension': '.nii.gz', 'suffix': 'bold'}, {'name': 'sub-01_task-mixedgamblestask_run[...]', 'extension': '.json', 'suffix': 'bold'} ]

which map to these three filenames:

[ 'sub-01_task-mixedgamblestask_run-01_bold.nii.gz', 'sub-01_task-mixedgamblestask_run-01_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz', 'sub-01_task-mixedgamblestask_run-01_space-MNI152NLin2009cAsym_desc-preproc_bold.json' ]

Looks like pybids was not indexing the last two files as they were "not valid", whereas ancpbids does index them, and returns them here. This seems like a pretty big change in behavior we'll have to address, as a lot of applications may be using the fact that invalid files are not indexed as a form of filtering on files.

Also, ancpbids's repr makes the two first files look identical, I would suggest not truncating so fast, as the filename can be quite important

Note that layout_ds005 is instantiated without derivatives=True so IMO those files should not be returned (unless we want to deliberately change this behavior)

Hm, syntactically, the files comply to the BIDS naming scheme and are considered "valid".

This is only due to derivatives being indexed by ancpbids by default it turns out

adelavega · 2022-11-18T20:56:15Z

bids/layout/tests/test_layout_v2.py

+def test_layout_with_derivs(layout_ds005_derivs):
+    assert layout_ds005_derivs.root == join(get_test_data_path(), 'ds005')
+    assert isinstance(layout_ds005_derivs.files, dict)
+    assert len(layout_ds005_derivs.derivatives) == 1


Again, the issue here is all derivatives are loaded, whereas in the pybids tests only one set of derivs (events) is deliberately loaded

adelavega · 2022-11-18T20:58:26Z

bids/layout/tests/test_layout_v2.py

+    assert dd['Name'] == 'Mixed-gambles task'
+    dd = layout_ds005_derivs.get_dataset_description('all', True)
+    assert isinstance(dd, list)
+    assert len(dd) == 2


Again, the issue here is loading 2 vs 3 derivs.

I think the fact that the object return by ancpbids is different is acceptable, although a breaking change.

ancpbids:

[{'name': 'dataset_description.json', 'Name': 'Mixed-gambles task', 'BIDSVersion': '1.0.0rc2', 'License': 'This dataset is made available u[...]', 'ReferencesAndLinks': 'Tom, S.M., Fox, C.R., Trepel, C.[...]'}, {'name': 'dataset_description.json', 'Name': 'Mixed-gambles task', 'BIDSVersion': '1.0.0rc2', 'License': 'This dataset is made available u[...]', 'ReferencesAndLinks': 'Tom, S.M., Fox, C.R., Trepel, C.[...]'}, {'name': 'dataset_description.json', 'Name': 'fMRIPrep - fMRI PREProcessing wo[...]', 'BIDSVersion': '1.4.0', 'DatasetType': 'derivative', 'License': 'CC0', 'HowToAcknowledge': 'Please cite our paper (https://d[...]'}]

pybids;

{'PipelineDescription': {'Name': 'events'}, 'BIDSVersion': '1.0.0rc2', 'License': 'This dataset is made available under the Public Domain Dedication and License \nv1.0, whose full text can be found at \nhttp://www.opendatacommons.org/licenses/pddl/1.0/. \nWe hope that all users will follow the ODC Attribution/Share-Alike \nCommunity Norms (http://www.opendatacommons.org/norms/odc-by-sa/); \nin particular, while not legally required, we hope that all users \nof the data will acknowledge the OpenfMRI project and NSF Grant \nOCI-1131441 (R. Poldrack, PI) in any publications.', 'Name': 'Mixed-gambles task', 'ReferencesAndLinks': 'Tom, S.M., Fox, C.R., Trepel, C., Poldrack, R.A. (2007). The neural basis of loss aversion in decision-making under risk. Science, 315(5811):515-8'}

adelavega · 2022-11-18T21:01:07Z

bids/layout/tests/test_layout_v2.py

+    l = layout_ds005
+    # Raise error with suggestions
+    with pytest.raises(ValueError, match='session'):
+        l.get(subject='12', ses=True, invalid_filters='error')


Two issues:

Literal vs display name (e.g. ses vs session)

invalid_filters doesn't seem to be implemented

adelavega · 2022-11-18T21:01:46Z

bids/layout/tests/test_layout_v2.py

+    for run in (1, "1", "01"):
+        res = layout_ds005.get(subject="01", task="mixedgamblestask",
+                               run=run, extension=".nii.gz")
+        assert len(res) == 1


Again, derivatives are loaded when they should not be

adelavega · 2022-11-18T23:28:15Z

Update: all tests in test_layout_on_example are passing w/ ancpbids.

adelavega · 2022-11-19T00:32:19Z

Additional observations from trying to test functionality in other modules:

Missing functionality:

BIDSLayoutFile.get_df
- get_dict
- get_entities
- path and relpath (ancpbids Artifact only has 'name')
layout.regex_search
.get function on derivatives (probably it can access in different way)
BIDSLayout.build_path write_to_file and copy_file
As @gkiar pointed out BIDSLayout.files has a meaning of all files that are valid (if validation=True), but this is a different meaning in ancpbids. I also recommend renaming to extra_files, and adding a shim or another way to access all files in a flat list.
validation seems to produce slightly different results-- and one difference is pybids simply skips invalid files, ancpbids seems to throw an error. @erdalkaraca? In particular this causes issues on these files on ds005: `models/ds-005_type interceptonlyrunlevel_model.json'

There's also many functions that look like get_{entity} but are not and need to be added (I'm open to adding w/ a new, less confusing name):

get_collections -- load variable collections

Indexing discrepancies

participants.tsv is not indexed (as @gkiar previously said, this is ill defined in the schema)

@erdalkaraca WDYT about functions like get_collections? Should we rename them, or add exceptions to __getattr__?

erdalkaraca · 2022-12-05T21:33:59Z

Additional observations from trying to test functionality in other modules:

Missing functionality:

BIDSLayoutFile.get_df

get_dict

get_entities

path and relpath (ancpbids Artifact only has 'name')

layout.regex_search

.get function on derivatives (probably it can access in different way)

BIDSLayout.build_path write_to_file and copy_file

As @gkiar pointed out BIDSLayout.files has a meaning of all files that are valid (if validation=True), but this is a different meaning in ancpbids. I also recommend renaming to extra_files, and adding a shim or another way to access all files in a flat list.

validation seems to produce slightly different results-- and one difference is pybids simply skips invalid files, ancpbids seems to throw an error. @erdalkaraca? In particular this causes issues on these files on ds005: `models/ds-005_type interceptonlyrunlevel_model.json'

There's also many functions that look like get_{entity} but are not and need to be added (I'm open to adding w/ a new, less confusing name):

get_collections -- load variable collections

Indexing discrepancies

participants.tsv is not indexed (as @gkiar previously said, this is ill defined in the schema)

@erdalkaraca WDYT about functions like get_collections? Should we rename them, or add exceptions to __getattr__?

I am not really familiar with the variable concept, but sounds like a higher level API. What is the source of the variables that pybids extracts those from?

adelavega · 2022-12-05T22:45:05Z

@erdalkaraca yes it is a higher-level API that requires reading in the _events.tsv files.

It is independent than the querying/indexing functionality. I will admit in hindsight its a bit messy to have this type of functionality mixed in with the core querying functionality in the same object. That said, we could address that later in a backwards breaking change.

In the meantime, we will implement these functions to allow for deeper integration testing.

I see you already saw, but we will continue development in https://github.com/bids-standard/pybids-refactor/ so that we can have multiple PRs by different authors, and then merge that fork to pybids

erdalkaraca · 2022-12-06T08:59:03Z

@erdalkaraca yes it is a higher-level API that requires reading in the _events.tsv files.

It is independent than the querying/indexing functionality. I will admit in hindsight its a bit messy to have this type of functionality mixed in with the core querying functionality in the same object. That said, we could address that later in a backwards breaking change.

In the meantime, we will implement these functions to allow for deeper integration testing.

I see you already saw, but we will continue development in https://github.com/bids-standard/pybids-refactor/ so that we can have multiple PRs by different authors, and then merge that fork to pybids

It is possible to mixin the functionality from a different class to better separate APIs (monkey-patching, plugin) without a breaking change.

adelavega · 2022-12-06T22:16:13Z

@erdalkaraca yes it is a higher-level API that requires reading in the _events.tsv files.
It is independent than the querying/indexing functionality. I will admit in hindsight its a bit messy to have this type of functionality mixed in with the core querying functionality in the same object. That said, we could address that later in a backwards breaking change.
In the meantime, we will implement these functions to allow for deeper integration testing.
I see you already saw, but we will continue development in https://github.com/bids-standard/pybids-refactor/ so that we can have multiple PRs by different authors, and then merge that fork to pybids

It is possible to mixin the functionality from a different class to better separate APIs (monkey-patching, plugin) without a breaking change.

Yes, that might be the way to go. Maybe your BIDSLayout object should be called BIDSBaseLayout or something to indicate its the core indexer/querying engine, vs the additional high level API we will add to make it backwards compatible.

erdalkaraca added 2 commits May 28, 2022 10:30

new BIDSLayoutV2 interface to be used in parallel with legacy BIDSLayout

9378c8f

Merge branch 'bids-standard:master' into master

80ccdfb

adelavega mentioned this pull request May 31, 2022

FIX: Match only within relative path when indexer is validating #859

Merged

effigies reviewed May 31, 2022

View reviewed changes

bids/layout/__init__.py Outdated Show resolved Hide resolved

erdalkaraca and others added 3 commits May 31, 2022 23:24

Update bids/layout/__init__.py

8fe9ad7

define BIDSLayoutV2 as function if ancpbids package not installed Co-authored-by: Chris Markiewicz <[email protected]>

get_<entity>() with fuzzy matching entity name

7b4881d

Merge branch 'bids-standard:master' into master

324bd9d

erdalkaraca mentioned this pull request Jun 11, 2022

Pre-submission enquiry openjournals/joss#1090

Closed

erdalkaraca added 2 commits June 18, 2022 21:20

WIP: unit tests stabilization, added missing functionality

c1a5a98

WIP: return int instead of str for index values of entities (for exam…

f50f657

…ple, run)

effigies added this to the 1.0.0 milestone Jul 22, 2022

Remi-Gau mentioned this pull request Oct 22, 2022

look into ancp bids PennLINC/CuBIDS#233

Open

Merge branch 'bids-standard:master' into master

7f08cdb

effigies mentioned this pull request Nov 18, 2022

Lightweight BIDS Layouts for all brainhackorg/global2022#87

Open

2 tasks

adelavega requested changes Nov 18, 2022

View reviewed changes

This was referenced Dec 5, 2022

Datatype entity is missed bids-standard/pybids-refactor#11

Open

Invalid filters are ignored / not parameterizable bids-standard/pybids-refactor#12

Open

adelavega mentioned this pull request Feb 7, 2023

Poor performance of get_subjects() even when using indexing without metadata #940

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] introduce BIDSLayoutV2 #863

[WIP] introduce BIDSLayoutV2 #863

erdalkaraca commented May 28, 2022

erdalkaraca commented May 28, 2022

codecov bot commented May 28, 2022 •

edited

Loading

adelavega commented May 31, 2022

gkiar commented Jun 16, 2022

effigies commented Jul 22, 2022

erdalkaraca commented Jul 22, 2022

adelavega left a comment •

edited

Loading

adelavega Nov 18, 2022

adelavega Nov 18, 2022

adelavega Nov 18, 2022

adelavega Nov 18, 2022

erdalkaraca Dec 5, 2022

adelavega Dec 5, 2022

adelavega Nov 18, 2022

adelavega Nov 19, 2022

erdalkaraca Dec 5, 2022

adelavega Nov 18, 2022

adelavega Nov 18, 2022

erdalkaraca Dec 5, 2022

adelavega Dec 5, 2022

adelavega Nov 18, 2022

adelavega Nov 18, 2022

adelavega Nov 18, 2022

adelavega Nov 18, 2022

adelavega commented Nov 18, 2022

adelavega commented Nov 19, 2022 •

edited

Loading

erdalkaraca commented Dec 5, 2022

adelavega commented Dec 5, 2022

erdalkaraca commented Dec 6, 2022

adelavega commented Dec 6, 2022



		def test_get_bvals_bvecs(layout_ds005):
		dwifile = layout_ds005.get(subject="01", datatype="dwi")[0]

[WIP] introduce BIDSLayoutV2 #863

Are you sure you want to change the base?

[WIP] introduce BIDSLayoutV2 #863

Conversation

erdalkaraca commented May 28, 2022

erdalkaraca commented May 28, 2022

codecov bot commented May 28, 2022 • edited Loading

Codecov Report

adelavega commented May 31, 2022

gkiar commented Jun 16, 2022

effigies commented Jul 22, 2022

erdalkaraca commented Jul 22, 2022

adelavega left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adelavega commented Nov 18, 2022

adelavega commented Nov 19, 2022 • edited Loading

erdalkaraca commented Dec 5, 2022

adelavega commented Dec 5, 2022

erdalkaraca commented Dec 6, 2022

adelavega commented Dec 6, 2022

codecov bot commented May 28, 2022 •

edited

Loading

adelavega left a comment •

edited

Loading

adelavega commented Nov 19, 2022 •

edited

Loading