-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: enable partial metadata indexing #560
Conversation
Codecov Report
@@ Coverage Diff @@
## master #560 +/- ##
==========================================
+ Coverage 82.97% 83.08% +0.11%
==========================================
Files 23 23
Lines 2966 2980 +14
Branches 749 753 +4
==========================================
+ Hits 2461 2476 +15
Misses 323 323
+ Partials 182 181 -1
Continue to review full report at Codecov.
|
bids/layout/layout.py
Outdated
@@ -184,10 +184,13 @@ class BIDSLayout(object): | |||
in the root argument is reindexed. If False, indexing will be | |||
skipped and the existing database file will be used. Ignored if | |||
database_path is not provided. | |||
index_metadata : bool | |||
index_metadata : bool or dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I would rather keep this simple, and if users want to do partial indexing, make them set this to False
, and then use the index_metadata
function later.
The docstring for the layout is already getting pretty long, and I think this is not a super common use case.
As an aside, maybe in the docs there should be a section on things you can do to improve BIDSLayout
performance, and using partial indexing would be one of them.
bids/layout/index.py
Outdated
filters['regex_search'] = True | ||
# ensure extension argument is a list | ||
if isinstance(filters.get(ext_key), str): | ||
filters[ext_key] = [filters[ext_key]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the listify
function from bids.utils
without the need for the if
clause
filters[ext_key] = [filters[ext_key]] | |
filters[ext_key] = listify(filters[ext_key]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, thank you for the suggestion.
bids/layout/index.py
Outdated
else: | ||
ext_key = 'extension' | ||
msg = ( | ||
"You should explicitly set the extension argument. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the extension
key need to be set to that regex? It seems in BIDSLayout.__init__
setting it to None works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my use of pybids
setting a key to None
means do not match any files that contain this key, so my interpretation of extension=None
would be to match files without an extension. When I set extensions to None and ran the tests, when this test would fail since none of the nifti files were indexed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah I guess there's a difference between setting a key to None
versus not setting a key at all. So what happens if you don't set it at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just saying because if filters
is None, then kwargs
does not have extension as a key, yet all meta-data is indexed, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea, looks like it indexes the correct information, allowing me to remove the weird logic around what should be specified in the extension
key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be nice because then we could cut this whole if
clause out.
bids/layout/index.py
Outdated
"""Index metadata for all files in the BIDS dataset. """ | ||
if filters: | ||
default_ext = ['[.]+'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default_ext = ['[.]+'] |
Minor: put this close to where it's used
bids/layout/index.py
Outdated
if filters.get('extension'): | ||
ext_key = 'extension' | ||
elif filters.get('extensions'): | ||
ext_key = 'extensions' | ||
else: | ||
ext_key = 'extension' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following suggestion replaces this whole section
bids/layout/index.py
Outdated
# ensure we are returning objects | ||
filters['return_type'] = 'object' | ||
# until 0.11.0, user can specify extension or extensions | ||
if filters.get('extension'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplify this a bit. I was going to suggest just renaming extensions to extension if you found it to simplify it further, but then you wouldn't get the DeprecationWarning that .get issues.
if filters.get('extension'): | |
ext_key = 'extensions' if 'extensions' in filters else 'extension' | |
if not filters.get(ext_key): | |
default_ext = ['[.]+'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you, this is much cleaner
Co-Authored-By: Alejandro de la Vega <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but see minor comments.
bids/layout/index.py
Outdated
@@ -164,10 +165,21 @@ def index_files(self): | |||
"""Index all files in the BIDS dataset. """ | |||
self._index_dir(self.root, self.config) | |||
|
|||
def index_metadata(self): | |||
def index_metadata(self, **filters): | |||
"""Index metadata for all files in the BIDS dataset. """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring should mention the filters
argument explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for pointing this out, added one in.
# ensure we are returning objects | ||
filters['return_type'] = 'object' | ||
# until 0.11.0, user can specify extension or extensions | ||
ext_key = 'extensions' if 'extensions' in filters else 'extension' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.get()
will handle the extensions/extension thing for you, so you don't need to worry about it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, he only does because he has to add stuff to the extensions
argument, and its not clear which one the user will pass.
…bids into partial_metadata_index
Thanks! |
Hey @jdkent is this the correct usage:
I didn't realize when reviewing this PR that Might make sense to add that actually because this is pretty unintuitive to use (confused myself!!) |
Yeah, that's how I would use it, (similar usage here)
|
closes #558
adds the ability to do partial metadata indexing (to help speed up creating a layout).