Add in vitro support #431

CodyCBakerPhD · 2024-01-30T18:04:23Z

Motivation

From DANDI discussion: dandi/helpdesk#115 (comment) and ensuing internal Slack discussion with DANDI team from around that time...

There is a group wanting to upload NWB-formatted in vitro data to the archive; since so much of the current validation structure assumes in vivo, a workaround is needed to control certain requirements in this case

I propose that the simplest workaround is to specify the subject_id using the pattern in_vitro_< some identifier >; the NWB Inspector can then recognize this pattern and decide to automatically suppress certain nonsensical checks from being run, while allowing a human-readable filename ('sub-in-vitro-< some identifier >_ses-...') on the archive, making it obvious how the file content differs from the norm. As previously discussed, the existence of a 'participant'/'subject' and some ID for it is non-negotiable on the archive side since it's a core part of the metadata, and though the 'subject' of the in vitro experiments is merely a protein, that can still count to this effect

If this becomes more common, the better longer-term solution would likely be an extension of the core Subject data type specific to in vitro setups, but since they have already created their files this is currently a more costly solution

cc: @yarikoptic @satra @rly @oruebel @bendichter

If this sounds OK, I'll write some Best Practice docs on this PR and send instructions to the uploader on how to proceed from here

yarikoptic · 2024-01-30T21:50:12Z

I would have chosen "shorter" and without any delimiters invitroNUMERICID, e.g. invitro1, invitro2 etc since we are not aiming here for identifiers with some global scope, so that

filenames are shorter
no ambiguity due to us replacing _ with - since _ is used to separate entities etc.

but overall we simply cannot rely on any id there to carry semantic load -- it is merely an identifier primarily for a human consumption. Thus I would have been more specific in the identifier, e.g. protein1. The actual question would come to how to describe it in nwb and our dandischema model https://github.com/dandi/dandi-schema/blob/5b50a110421a370f1da0be66d4f46f898a6f25cb/dandischema/models.py#L1177 ???

CodyCBakerPhD · 2024-01-30T22:05:55Z

The actual question would come to how to describe it in nwb and our dandischema model

If this link works: https://dandiarchive.slack.com/archives/GMRLT5RQ8/p1698938094950049?thread_ts=1698670803.322679&cid=GMRLT5RQ8

it should lead to the original discussion where the conclusion is to 'hack with some data elements'; the DANDI Participant model has the same issues with its field as the NWB Subject in that it potentially has many optional fields that are meaningless for this particular case, but we take that now to just mean 'do not specify them'

I agree that the proper solution is to (i) extend the consideration of a Participant on DANDI side, (ii) make NWB extension compatible with DANDI schema. But allocating resources to those are non-trivial and might not be worth if it if this is just a one-time thing

I would have chosen "shorter" and without any delimiters invitroNUMERICID, e.g. invitro1, invitro2 etc since we are not aiming here for identifiers with some global scope, so that

So in this case, you would prefer invitroCaMPARI3?

yarikoptic · 2024-01-31T02:45:52Z

I agree that the proper solution is to (i) extend the consideration of a Participant on DANDI side, (ii) make NWB extension compatible with DANDI schema. But allocating resources to those are non-trivial and might not be worth if it if this is just a one-time thing

the last famous words for a hack which became too popular at the end ;) yes -- it would take (bigger) effort but we should do that anyways looking forward

I would have chosen "shorter" and without any delimiters invitroNUMERICID, e.g. invitro1, invitro2 etc since we are not aiming here for identifiers with some global scope, so that

So in this case, you would prefer invitroCaMPARI3?

Yes, I think it is ok.

CodyCBakerPhD · 2024-01-31T03:15:41Z

the last famous words for a hack which became too popular at the end ;) yes -- it would take (bigger) effort but we should do that anyways looking forward

Agreed; I would be happy to spearhead this going into the next cycle

Yes, I think it is ok.

Great, I made that change and added related docs

Need further thoughts and/or approval from @bendichter @rly @oruebel to push this backdoor forward for the user

rly · 2024-02-02T09:40:06Z

I agree that the proper solution is to (i) extend the consideration of a Participant on DANDI side, (ii) make NWB extension compatible with DANDI schema. But allocating resources to those are non-trivial and might not be worth if it if this is just a one-time thing

the last famous words for a hack which became too popular at the end ;) yes -- it would take (bigger) effort but we should do that anyways looking forward

I agree.

Some in vitro data (most?) come from tissue slices from animals, so the normal Subject can apply there. What do you think about being more specific in the naming here, like "proteinCaMPARI3"? Pro: slice work will not accidentally use the "invitroID" hack instead of a proper Subject. Con: maybe there are other non-protein in vitro data that we would want to use this hack for? The only other case I can think of is organoids but that's a whole different kind of subject which should probably have different required metadata than proteins.

CodyCBakerPhD · 2024-02-02T15:57:49Z

Con: maybe there are other non-protein in vitro data that we would want to use this hack for?

We have gotten that before actually; a group wanted to upload electrode impedance measurements from a glass vial as a calibration/demonstration of device construction. Not that I'd say that's altogether super necessary data, or requires dandi infrastructure to share, but just saying it was requested previously

CodyCBakerPhD · 2024-02-02T16:00:20Z

proteinCaMPARI3

I'm OK with this; @yarikoptic are you OK with protein<protein ID> being the pattern?

yarikoptic · 2024-02-02T17:25:27Z

sure - looks good to me. Ideally then whoever prepares BIDS dataset with this fills out some extra columns within participants.tsv to provide details on that protein.

CodyCBakerPhD · 2024-02-02T18:26:01Z

@rly Swapped to "protein"

rly · 2024-02-02T18:32:44Z

We have gotten that before actually; a group wanted to upload electrode impedance measurements from a glass vial as a calibration/demonstration of device construction. Not that I'd say that's altogether super necessary data, or requires dandi infrastructure to share, but just saying it was requested previously

Interesting. That makes sense. I guess these are both characterizations of sensors, so we could generalize the naming to "sensorX". But we have only two examples, and "proteinX" is nicely specific and would work. Best not to overthink this hack.

tests/test_inspector.py

Co-authored-by: Ryan Ly <[email protected]>

CodyCBakerPhD added 3 commits January 30, 2024 12:12

add injection and test

38e6edd

extend form to allow other subject ID value for DANDI organization

ce6b1ed

adjust test docstring

5635db8

CodyCBakerPhD added category: enhancement improvements of code or code behavior priority: high impacts proper operation or use of feature important to most users labels Jan 30, 2024

CodyCBakerPhD requested review from rly, bendichter and oruebel January 30, 2024 18:04

CodyCBakerPhD self-assigned this Jan 30, 2024

expand docs and changelog

9d97d1d

remove mock until we move to new pynwb permanently

d5f26a3

yarikoptic mentioned this pull request Jan 31, 2024

Homogenize "Subject" Nomenclature bids-standard/bids-2-devel#14

Open

swap from invitro to protein

8fdc022

rly reviewed Feb 2, 2024

View reviewed changes

tests/test_inspector.py Outdated Show resolved Hide resolved

Update tests/test_inspector.py

bf4be6b

Co-authored-by: Ryan Ly <[email protected]>

CodyCBakerPhD requested a review from rly February 2, 2024 18:39

rly approved these changes Feb 2, 2024

View reviewed changes

CodyCBakerPhD enabled auto-merge (squash) February 2, 2024 18:41

CodyCBakerPhD merged commit de341fa into dev Feb 2, 2024
24 checks passed

CodyCBakerPhD deleted the add_in_vitro branch February 2, 2024 18:48

CodyCBakerPhD mentioned this pull request Feb 2, 2024

BF: do or to "" to ensure that we do not end up treating None as str #433

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add in vitro support #431

Add in vitro support #431

CodyCBakerPhD commented Jan 30, 2024 •

edited

Loading

yarikoptic commented Jan 30, 2024

CodyCBakerPhD commented Jan 30, 2024

yarikoptic commented Jan 31, 2024

CodyCBakerPhD commented Jan 31, 2024

rly commented Feb 2, 2024

CodyCBakerPhD commented Feb 2, 2024

CodyCBakerPhD commented Feb 2, 2024

yarikoptic commented Feb 2, 2024

CodyCBakerPhD commented Feb 2, 2024

rly commented Feb 2, 2024

Add in vitro support #431

Add in vitro support #431

Conversation

CodyCBakerPhD commented Jan 30, 2024 • edited Loading

Motivation

yarikoptic commented Jan 30, 2024

CodyCBakerPhD commented Jan 30, 2024

yarikoptic commented Jan 31, 2024

CodyCBakerPhD commented Jan 31, 2024

rly commented Feb 2, 2024

CodyCBakerPhD commented Feb 2, 2024

CodyCBakerPhD commented Feb 2, 2024

yarikoptic commented Feb 2, 2024

CodyCBakerPhD commented Feb 2, 2024

rly commented Feb 2, 2024

CodyCBakerPhD commented Jan 30, 2024 •

edited

Loading