Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: enable partial metadata indexing #560

Merged
merged 10 commits into from
Jan 9, 2020

Conversation

jdkent
Copy link
Contributor

@jdkent jdkent commented Dec 27, 2019

closes #558

adds the ability to do partial metadata indexing (to help speed up creating a layout).

@jdkent jdkent changed the title ENH: partial metadata index ENH: partial metadata indexing Dec 27, 2019
@codecov
Copy link

codecov bot commented Dec 27, 2019

Codecov Report

Merging #560 into master will increase coverage by 0.11%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #560      +/-   ##
==========================================
+ Coverage   82.97%   83.08%   +0.11%     
==========================================
  Files          23       23              
  Lines        2966     2980      +14     
  Branches      749      753       +4     
==========================================
+ Hits         2461     2476      +15     
  Misses        323      323              
+ Partials      182      181       -1
Flag Coverage Δ
#unittests 83.08% <100%> (+0.11%) ⬆️
Impacted Files Coverage Δ
bids/layout/index.py 86.72% <100%> (+0.42%) ⬆️
bids/variables/entities.py 88.29% <0%> (ø) ⬆️
bids/layout/layout.py 85.03% <0%> (+0.17%) ⬆️
bids/variables/kollekshuns.py 84.24% <0%> (+0.21%) ⬆️
bids/variables/variables.py 88.13% <0%> (+0.25%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 35e1296...8e46903. Read the comment docs.

@jdkent jdkent changed the title ENH: partial metadata indexing ENH: enable partial metadata indexing Dec 27, 2019
@@ -184,10 +184,13 @@ class BIDSLayout(object):
in the root argument is reindexed. If False, indexing will be
skipped and the existing database file will be used. Ignored if
database_path is not provided.
index_metadata : bool
index_metadata : bool or dict
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I would rather keep this simple, and if users want to do partial indexing, make them set this to False, and then use the index_metadata function later.

The docstring for the layout is already getting pretty long, and I think this is not a super common use case.

As an aside, maybe in the docs there should be a section on things you can do to improve BIDSLayout performance, and using partial indexing would be one of them.

filters['regex_search'] = True
# ensure extension argument is a list
if isinstance(filters.get(ext_key), str):
filters[ext_key] = [filters[ext_key]]
Copy link
Collaborator

@adelavega adelavega Jan 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the listify function from bids.utils without the need for the if clause

Suggested change
filters[ext_key] = [filters[ext_key]]
filters[ext_key] = listify(filters[ext_key])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thank you for the suggestion.

else:
ext_key = 'extension'
msg = (
"You should explicitly set the extension argument. "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the extension key need to be set to that regex? It seems in BIDSLayout.__init__ setting it to None works

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my use of pybids setting a key to None means do not match any files that contain this key, so my interpretation of extension=None would be to match files without an extension. When I set extensions to None and ran the tests, when this test would fail since none of the nifti files were indexed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah I guess there's a difference between setting a key to None versus not setting a key at all. So what happens if you don't set it at all?

Copy link
Collaborator

@adelavega adelavega Jan 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saying because if filters is None, then kwargs does not have extension as a key, yet all meta-data is indexed, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea, looks like it indexes the correct information, allowing me to remove the weird logic around what should be specified in the extension key.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be nice because then we could cut this whole if clause out.

"""Index metadata for all files in the BIDS dataset. """
if filters:
default_ext = ['[.]+']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
default_ext = ['[.]+']

Minor: put this close to where it's used

Comment on lines 175 to 180
if filters.get('extension'):
ext_key = 'extension'
elif filters.get('extensions'):
ext_key = 'extensions'
else:
ext_key = 'extension'
Copy link
Collaborator

@adelavega adelavega Jan 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following suggestion replaces this whole section

# ensure we are returning objects
filters['return_type'] = 'object'
# until 0.11.0, user can specify extension or extensions
if filters.get('extension'):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify this a bit. I was going to suggest just renaming extensions to extension if you found it to simplify it further, but then you wouldn't get the DeprecationWarning that .get issues.

Suggested change
if filters.get('extension'):
ext_key = 'extensions' if 'extensions' in filters else 'extension'
if not filters.get(ext_key):
default_ext = ['[.]+']

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, this is much cleaner

bids/layout/index.py Outdated Show resolved Hide resolved
Co-Authored-By: Alejandro de la Vega <[email protected]>
Copy link
Collaborator

@tyarkoni tyarkoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but see minor comments.

@@ -164,10 +165,21 @@ def index_files(self):
"""Index all files in the BIDS dataset. """
self._index_dir(self.root, self.config)

def index_metadata(self):
def index_metadata(self, **filters):
"""Index metadata for all files in the BIDS dataset. """
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring should mention the filters argument explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for pointing this out, added one in.

# ensure we are returning objects
filters['return_type'] = 'object'
# until 0.11.0, user can specify extension or extensions
ext_key = 'extensions' if 'extensions' in filters else 'extension'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.get() will handle the extensions/extension thing for you, so you don't need to worry about it here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, he only does because he has to add stuff to the extensions argument, and its not clear which one the user will pass.

bids/layout/index.py Outdated Show resolved Hide resolved
@tyarkoni
Copy link
Collaborator

tyarkoni commented Jan 9, 2020

Thanks!

@tyarkoni tyarkoni merged commit d115964 into bids-standard:master Jan 9, 2020
@adelavega
Copy link
Collaborator

adelavega commented Jun 6, 2020

Hey @jdkent is this the correct usage:

bids_layout = BIDSLayout(bids_dir, validate=False, index_metadata=False)

indexer = BIDSLayoutIndexer(bids_layout)
indexer.index_metadata(extension='nii.gz', datatype='func')

I didn't realize when reviewing this PR that index_metadata was not a method of BIDSLayout0, when I made the comment to keep the top level API simple.

Might make sense to add that actually because this is pretty unintuitive to use (confused myself!!)

@jdkent
Copy link
Contributor Author

jdkent commented Jun 6, 2020

Yeah, that's how I would use it, (similar usage here)

   # reading in derivatives and bids inputs as queryable database like objects
    layout = BIDSLayout(bids_dir,
                        derivatives=derivatives_pipeline_dir,
                        index_metadata=False,
                        database_file=database_path,
                        reset_database=reset_database)

    # only index bold file metadata
    if reset_database:
        indexer = BIDSLayoutIndexerPatch(layout)
        metadata_filter = {
            'extension': ['nii', 'nii.gz', 'json'],
            'suffix': 'bold',
        }
        indexer.index_metadata(**metadata_filter)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

partial metadata indexing?
3 participants