Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add in vitro support #431

Merged
merged 7 commits into from
Feb 2, 2024
Merged

Add in vitro support #431

merged 7 commits into from
Feb 2, 2024

Conversation

CodyCBakerPhD
Copy link
Collaborator

@CodyCBakerPhD CodyCBakerPhD commented Jan 30, 2024

Motivation

From DANDI discussion: dandi/helpdesk#115 (comment) and ensuing internal Slack discussion with DANDI team from around that time...

There is a group wanting to upload NWB-formatted in vitro data to the archive; since so much of the current validation structure assumes in vivo, a workaround is needed to control certain requirements in this case

I propose that the simplest workaround is to specify the subject_id using the pattern in_vitro_< some identifier >; the NWB Inspector can then recognize this pattern and decide to automatically suppress certain nonsensical checks from being run, while allowing a human-readable filename ('sub-in-vitro-< some identifier >_ses-...') on the archive, making it obvious how the file content differs from the norm. As previously discussed, the existence of a 'participant'/'subject' and some ID for it is non-negotiable on the archive side since it's a core part of the metadata, and though the 'subject' of the in vitro experiments is merely a protein, that can still count to this effect

If this becomes more common, the better longer-term solution would likely be an extension of the core Subject data type specific to in vitro setups, but since they have already created their files this is currently a more costly solution

cc: @yarikoptic @satra @rly @oruebel @bendichter

If this sounds OK, I'll write some Best Practice docs on this PR and send instructions to the uploader on how to proceed from here

@CodyCBakerPhD CodyCBakerPhD added category: enhancement improvements of code or code behavior priority: high impacts proper operation or use of feature important to most users labels Jan 30, 2024
@CodyCBakerPhD CodyCBakerPhD self-assigned this Jan 30, 2024
@yarikoptic
Copy link
Contributor

I would have chosen "shorter" and without any delimiters invitroNUMERICID, e.g. invitro1, invitro2 etc since we are not aiming here for identifiers with some global scope, so that

  • filenames are shorter
  • no ambiguity due to us replacing _ with - since _ is used to separate entities etc.

but overall we simply cannot rely on any id there to carry semantic load -- it is merely an identifier primarily for a human consumption. Thus I would have been more specific in the identifier, e.g. protein1. The actual question would come to how to describe it in nwb and our dandischema model https://github.com/dandi/dandi-schema/blob/5b50a110421a370f1da0be66d4f46f898a6f25cb/dandischema/models.py#L1177 ???

@CodyCBakerPhD
Copy link
Collaborator Author

The actual question would come to how to describe it in nwb and our dandischema model

If this link works: https://dandiarchive.slack.com/archives/GMRLT5RQ8/p1698938094950049?thread_ts=1698670803.322679&cid=GMRLT5RQ8

it should lead to the original discussion where the conclusion is to 'hack with some data elements'; the DANDI Participant model has the same issues with its field as the NWB Subject in that it potentially has many optional fields that are meaningless for this particular case, but we take that now to just mean 'do not specify them'

I agree that the proper solution is to (i) extend the consideration of a Participant on DANDI side, (ii) make NWB extension compatible with DANDI schema. But allocating resources to those are non-trivial and might not be worth if it if this is just a one-time thing

I would have chosen "shorter" and without any delimiters invitroNUMERICID, e.g. invitro1, invitro2 etc since we are not aiming here for identifiers with some global scope, so that

So in this case, you would prefer invitroCaMPARI3?

@yarikoptic
Copy link
Contributor

I agree that the proper solution is to (i) extend the consideration of a Participant on DANDI side, (ii) make NWB extension compatible with DANDI schema. But allocating resources to those are non-trivial and might not be worth if it if this is just a one-time thing

the last famous words for a hack which became too popular at the end ;) yes -- it would take (bigger) effort but we should do that anyways looking forward

I would have chosen "shorter" and without any delimiters invitroNUMERICID, e.g. invitro1, invitro2 etc since we are not aiming here for identifiers with some global scope, so that

So in this case, you would prefer invitroCaMPARI3?

Yes, I think it is ok.

@CodyCBakerPhD
Copy link
Collaborator Author

the last famous words for a hack which became too popular at the end ;) yes -- it would take (bigger) effort but we should do that anyways looking forward

Agreed; I would be happy to spearhead this going into the next cycle

Yes, I think it is ok.

Great, I made that change and added related docs

Need further thoughts and/or approval from @bendichter @rly @oruebel to push this backdoor forward for the user

@rly
Copy link
Contributor

rly commented Feb 2, 2024

I agree that the proper solution is to (i) extend the consideration of a Participant on DANDI side, (ii) make NWB extension compatible with DANDI schema. But allocating resources to those are non-trivial and might not be worth if it if this is just a one-time thing

the last famous words for a hack which became too popular at the end ;) yes -- it would take (bigger) effort but we should do that anyways looking forward

I agree.

Some in vitro data (most?) come from tissue slices from animals, so the normal Subject can apply there. What do you think about being more specific in the naming here, like "proteinCaMPARI3"? Pro: slice work will not accidentally use the "invitroID" hack instead of a proper Subject. Con: maybe there are other non-protein in vitro data that we would want to use this hack for? The only other case I can think of is organoids but that's a whole different kind of subject which should probably have different required metadata than proteins.

@CodyCBakerPhD
Copy link
Collaborator Author

Con: maybe there are other non-protein in vitro data that we would want to use this hack for?

We have gotten that before actually; a group wanted to upload electrode impedance measurements from a glass vial as a calibration/demonstration of device construction. Not that I'd say that's altogether super necessary data, or requires dandi infrastructure to share, but just saying it was requested previously

@CodyCBakerPhD
Copy link
Collaborator Author

proteinCaMPARI3

I'm OK with this; @yarikoptic are you OK with protein<protein ID> being the pattern?

@yarikoptic
Copy link
Contributor

sure - looks good to me. Ideally then whoever prepares BIDS dataset with this fills out some extra columns within participants.tsv to provide details on that protein.

@CodyCBakerPhD
Copy link
Collaborator Author

@rly Swapped to "protein"

@rly
Copy link
Contributor

rly commented Feb 2, 2024

We have gotten that before actually; a group wanted to upload electrode impedance measurements from a glass vial as a calibration/demonstration of device construction. Not that I'd say that's altogether super necessary data, or requires dandi infrastructure to share, but just saying it was requested previously

Interesting. That makes sense. I guess these are both characterizations of sensors, so we could generalize the naming to "sensorX". But we have only two examples, and "proteinX" is nicely specific and would work. Best not to overthink this hack.

tests/test_inspector.py Outdated Show resolved Hide resolved
@CodyCBakerPhD CodyCBakerPhD enabled auto-merge (squash) February 2, 2024 18:41
@CodyCBakerPhD CodyCBakerPhD merged commit de341fa into dev Feb 2, 2024
24 checks passed
@CodyCBakerPhD CodyCBakerPhD deleted the add_in_vitro branch February 2, 2024 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: enhancement improvements of code or code behavior priority: high impacts proper operation or use of feature important to most users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants