-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] introduce BIDSLayoutV2 #863
base: master
Are you sure you want to change the base?
Conversation
Note: I have not yet fixed all unit tests in that module as this is work-in-progress |
Codecov ReportBase: 86.24% // Head: 83.96% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #863 +/- ##
==========================================
- Coverage 86.24% 83.96% -2.29%
==========================================
Files 32 33 +1
Lines 3904 4028 +124
Branches 947 966 +19
==========================================
+ Hits 3367 3382 +15
- Misses 346 455 +109
Partials 191 191
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Awesome. I think a full integration test using That is, is those of use that use |
define BIDSLayoutV2 as function if ancpbids package not installed Co-authored-by: Chris Markiewicz <[email protected]>
Hey @erdalkaraca are you around this week to discuss? |
To me this feels like a big enough change to warrant a 1.0 release when we can make it happen. I won't have any time to work on this until at the absolute earliest August 22, but I do think this is a pretty important thing to push on as people are able. Could have a ~30 minute call next week if people are around and want to discuss strategy. |
Depends on ANCPLabOldenburg/ancp-bids#60 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello!
I finally had time to run these tests myself. Honestly, all in all the API differences between ancpbids and pybids seem minimal at this point, and I think we can get this done soon.
The main issue I identified were:
- ancpbids seems to load all derivatives eagerly, even if
derivatives=True
is not set. This actually is fairly important as you may specifically not want to load derivatives. We may want to discuss if its better to "filter" derivatives prior to instantiation of a layout, or at the time of a query. I think the latter is more flexible and generic, but I think there may be times you really don't want to index a derivative. - ancpbids currently uses the "literal" name of entities (e.g. "ses" instead of "session")
- some __repr__s are different and could be improved. Don't need to be identical but it's something to make a decision on
get_fieldmap
is not working. Looks like a bug in ancpbids's getattr method
the rest seem like small issue that are very doable to fix, and i've commented on all known test failrues
) | ||
assert len(unvalidated.get()) == 4 | ||
with pytest.raises(ValueError): | ||
unvalidated.get(desc="preproc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like ancpbids returns files here, because desc
is not a valid entity.
By default, BIDSLayout
does not allow unrecognized entities, and raises an error.
You can allow them using invalid_filters='allow'
, but even then, since it doesn't match, it returns empty lis: []
|
||
def test_dataset_missing_generatedby_fails_validation(self): | ||
dataset_path = Path("ds005_derivs", "format_errs", "no_pipeline_description") | ||
with pytest.raises(BIDSDerivativesValidationError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expected error:
*** bids.exceptions.BIDSDerivativesValidationError: Every valid BIDS-derivatives dataset must have a GeneratedBy.Name field set inside 'dataset_description.json'.
Example: {'GeneratedBy': [{'Name': 'Example pipeline'}]}
That seems like an easy shim
target = 'sub-01/ses-1/func/sub-01_ses-1_task-rest_acq-fullbrain_run-1_bold.nii.gz' | ||
target = target.split('/') | ||
result = layout_7t_trt.get_metadata( | ||
join(layout_7t_trt.root, *target), include_entities=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include_entities
doesn't seem to work due to:
with pytest.raises(TargetError) as exc: | ||
layout_7t_trt.get(target='unicorn') | ||
msg = str(exc.value) | ||
assert 'subject' in msg and 'reconstruction' in msg and 'proc' in msg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference here is ancpbids
is using the literal entity value, and pybids uses the full name.
ancpbids:
"Unknown target 'unicorn'. Valid targets are: ['task', 'acq', 'run', 'sub', 'ses']"
pybids:
"Unknown target 'unicorn'. Valid targets are: ['subject', 'session', 'sample', 'task', 'acquisition', 'ceagent', 'staining', 'tracer', 'reconstruction', 'direction', 'run', 'proc', 'modality', 'echo', 'flip', 'inv', 'mt', 'part', 'recording', 'space', 'chunk', 'suffix', 'scans', 'fmap', 'datatype', 'extension', 'EchoTime2', 'EchoTime1', 'IntendedFor', 'CogAtlasID', 'EchoTime', 'EffectiveEchoSpacing', 'PhaseEncodingDirection', 'RepetitionTime', 'SliceEncodingDirection', 'SliceTiming', 'TaskName', 'StartTime', 'SamplingFrequency', 'Columns']"
I also see that pybids lists all of the metadata keys, not just core entities. This may cause problems elsewhere (although I'm not suggesting we need to keep this logic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pybids returning also metadata keys is a bit confusing as they are no entities in terms of the spec... not sure, but maybe there is an overlap between BIDS spec "entity" and sqlalchemy DB table name "Entity"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will raise an issue about this in pybids-refactor, but it is useful functionality to be able to query by meta-data keys
|
||
|
||
def test_get_bvals_bvecs(layout_ds005): | ||
dwifile = layout_ds005.get(subject="01", datatype="dwi")[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Entities associated with the targetfile
ancpbids: [{'key': 'sub', 'value': '01'}]
pybids: {'datatype': 'dwi', 'extension': '.nii.gz', 'subject': '01', 'suffix': 'dwi'}
the ancpbids repr also lists: {'name': 'sub-01_dwi.nii.gz', 'extension': '.nii.gz', 'suffix': 'dwi'}
Not sure why ancpbids
misses the datatype
entity? Is it related to the new schema?
BTW, this also makes me realize we currently treat extension
as an "entity" even though it's technically not. This will for sure cause problems in many pipeline that use the BIDSImageFile.entities
attribute (cc: @effigies)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so in general datatype
seems to not be a valid entity in ancpbids
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so in general
datatype
seems to not be a valid entity inancpbids
Yes, datatype is not listed as an entity in the schema. Maybe it makes sense to treat it the same way as extension/suffix, i.e. as an additional property of the Artifact class.
bold_files = layout_ds005.get(suffix='bold', run=1, subject='01', session='*') | ||
assert not bold_files | ||
bold_files = layout_ds005.get(suffix='bold', run=1, subject='01') | ||
assert len(bold_files) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fails because ancpbids returns:
[
{'name': 'sub-01_task-mixedgamblestask_run[...]', 'extension': '.nii.gz', 'suffix': 'bold'},
{'name': 'sub-01_task-mixedgamblestask_run[...]', 'extension': '.nii.gz', 'suffix': 'bold'},
{'name': 'sub-01_task-mixedgamblestask_run[...]', 'extension': '.json', 'suffix': 'bold'}
]
which map to these three filenames:
[
'sub-01_task-mixedgamblestask_run-01_bold.nii.gz',
'sub-01_task-mixedgamblestask_run-01_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
'sub-01_task-mixedgamblestask_run-01_space-MNI152NLin2009cAsym_desc-preproc_bold.json'
]
Looks like pybids was not indexing the last two files as they were "not valid", whereas ancpbids does index them, and returns them here. This seems like a pretty big change in behavior we'll have to address, as a lot of applications may be using the fact that invalid files are not indexed as a form of filtering on files.
Also, ancpbids's repr makes the two first files look identical, I would suggest not truncating so fast, as the filename can be quite important
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that layout_ds005
is instantiated without derivatives=True
so IMO those files should not be returned (unless we want to deliberately change this behavior)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, syntactically, the files comply to the BIDS naming scheme and are considered "valid".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only due to derivatives
being indexed by ancpbids by default it turns out
def test_layout_with_derivs(layout_ds005_derivs): | ||
assert layout_ds005_derivs.root == join(get_test_data_path(), 'ds005') | ||
assert isinstance(layout_ds005_derivs.files, dict) | ||
assert len(layout_ds005_derivs.derivatives) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, the issue here is all derivatives are loaded, whereas in the pybids tests only one set of derivs (events
) is deliberately loaded
assert dd['Name'] == 'Mixed-gambles task' | ||
dd = layout_ds005_derivs.get_dataset_description('all', True) | ||
assert isinstance(dd, list) | ||
assert len(dd) == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, the issue here is loading 2 vs 3 derivs.
I think the fact that the object return by ancpbids is different is acceptable, although a breaking change.
ancpbids:
[{'name': 'dataset_description.json', 'Name': 'Mixed-gambles task', 'BIDSVersion': '1.0.0rc2', 'License': 'This dataset is made available u[...]', 'ReferencesAndLinks': 'Tom, S.M., Fox, C.R., Trepel, C.[...]'}, {'name': 'dataset_description.json', 'Name': 'Mixed-gambles task', 'BIDSVersion': '1.0.0rc2', 'License': 'This dataset is made available u[...]', 'ReferencesAndLinks': 'Tom, S.M., Fox, C.R., Trepel, C.[...]'}, {'name': 'dataset_description.json', 'Name': 'fMRIPrep - fMRI PREProcessing wo[...]', 'BIDSVersion': '1.4.0', 'DatasetType': 'derivative', 'License': 'CC0', 'HowToAcknowledge': 'Please cite our paper (https://d[...]'}]
pybids;
{'PipelineDescription': {'Name': 'events'}, 'BIDSVersion': '1.0.0rc2', 'License': 'This dataset is made available under the Public Domain Dedication and License \nv1.0, whose full text can be found at \nhttp://www.opendatacommons.org/licenses/pddl/1.0/. \nWe hope that all users will follow the ODC Attribution/Share-Alike \nCommunity Norms (http://www.opendatacommons.org/norms/odc-by-sa/); \nin particular, while not legally required, we hope that all users \nof the data will acknowledge the OpenfMRI project and NSF Grant \nOCI-1131441 (R. Poldrack, PI) in any publications.', 'Name': 'Mixed-gambles task', 'ReferencesAndLinks': 'Tom, S.M., Fox, C.R., Trepel, C., Poldrack, R.A. (2007). The neural basis of loss aversion in decision-making under risk. Science, 315(5811):515-8'}
l = layout_ds005 | ||
# Raise error with suggestions | ||
with pytest.raises(ValueError, match='session'): | ||
l.get(subject='12', ses=True, invalid_filters='error') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two issues:
- Literal vs display name (e.g.
ses
vssession
) - invalid_filters doesn't seem to be implemented
for run in (1, "1", "01"): | ||
res = layout_ds005.get(subject="01", task="mixedgamblestask", | ||
run=run, extension=".nii.gz") | ||
assert len(res) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, derivatives are loaded when they should not be
Update: all tests in |
Additional observations from trying to test functionality in other modules: Missing functionality:
There's also many functions that look like
Indexing discrepancies
@erdalkaraca WDYT about functions like |
I am not really familiar with the variable concept, but sounds like a higher level API. What is the source of the variables that pybids extracts those from? |
@erdalkaraca yes it is a higher-level API that requires reading in the It is independent than the querying/indexing functionality. I will admit in hindsight its a bit messy to have this type of functionality mixed in with the core querying functionality in the same object. That said, we could address that later in a backwards breaking change. In the meantime, we will implement these functions to allow for deeper integration testing. I see you already saw, but we will continue development in https://github.com/bids-standard/pybids-refactor/ so that we can have multiple PRs by different authors, and then merge that fork to |
It is possible to mixin the functionality from a different class to better separate APIs (monkey-patching, plugin) without a breaking change. |
Yes, that might be the way to go. Maybe your |
In agreement with @gkiar, the following approach allows for a better migration to pybids-core:
Once we get sufficient user feedback such as bugs or critical implementation gaps, the BIDSLayoutV2 can be renamed to BIDSLayout while the old implementation may exist as BIDSLayoutLegacy within the next releases.