-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restructuring organization for participant level grouping #43
Comments
Since this proposal is for 2.0, would this issue perhaps be a better fit for bids-standard/bids-2-devel? BTW, I know that there are a couple of issues there that also propose massive restructuring (e.g., #28, #37). |
i don't have the authorization to transfer, but i think it would be a good place for this to go. |
to me the distinction between participant and aggregate seems equally
contrived. for example, we aggregate across individual images to analyze a
timeseries, aggregate across runs or sessions within participant, etc. I
agree that raw vs. derived is also a bit contrived, but seems to fit better
with the usual researcher's workflow. perhaps better to think of it as they
do in Psych-DS, where there are "source data" i.e. data that came directly
from the measurement instrument, and then various levels of derivation from
that, some of which are "primary" (e.g. nifti images derived from dicoms)
and others are derived (e.g. fmriprep outputs). it seems necessary that
some of these concepts will necessarily be contrived, since they are meant
to reflect as well as possible the usual scientist's workflow. the bigger
challenge is that as BIDS expands, there is a broader set of scientists
with a broader range of workflows, so the "usual" scientist becomes a
contrived notion as well.
…On Mon, Aug 17, 2020 at 8:13 AM Satrajit Ghosh ***@***.***> wrote:
In BIDS thus far the notion of source data and derived data is a little
contrived/vague. For example a multi-echo T1-weighted recon comes out of
the scanner from a MEMPRAGE sequence is considered source data, while the
FA image that comes out is not considered source data.
As scanners and other instruments get more advanced and start generating
what we traditionally call derivatives (think GPU based processing on the
scanner), this will lead to questions of where data goes.
To simplify consideration, the possibility I would like the BIDS community
to consider is to separate data not by source vs derivatives, but by
participant vs aggregate. As examples:
Participant
1. source dicoms
2. freesurfer recon
3. fmriprep output
4. meg windows around individual stimuli
5. average ERP response
...
Aggregate
1. Templates
2. group statistical maps
3. (partial) correlations
...
This makes it, in my opinion, simpler to consider with regard to both
metadata and with respect to provenance.
Would love to hear thoughts on this potential reframing.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#43>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGUVEC3CS7ITDKTAV4A4MDSBFCIXANCNFSM4QBX7S6Q>
.
--
Russell A. Poldrack
Albert Ray Lang Professor of Psychology
Building 420
Stanford University
Stanford, CA 94305
[email protected]
http://www.poldracklab.org/
|
@poldrack - this phrase is exactly the reason i wrote this.
we don't treat these things consistently (see the MEMPRAGE and FA example above). and with new tools developing that do significant processing in the scanner itself (e.g., label regions and compute volumes), we would have to as part of the source processing make determinations as to where things would go.
but these are still individual-specific. perhaps |
I believe that BEP001 does propose symlinking scanner-computed "derivatives" (like FA maps) from the "raw" dataset to the derivatives folder. This isn't a complete solution, but it does explicitly support derivatives coming directly from the scanner. |
I agree with @poldrack that any organization we try to impose is going to be intuitive for some applications and problematic for others. I don't feel I have a good sense of which of these two schemes would be preferable, and I'd suggest that we stack these kinds of proposals and then at some point do a UX survey/study asking people what they (think they) prefer. That said, as a practical matter, I think we should try to maintain backwards compatibility with BIDS 1.0 wherever possible, unless we have a really good reason not to. So, e.g., if 80% of users say that @satra's proposal would make their life considerably easier, then sure, let's break the BIDS 1.0 structure. But if, say, 55% prefer @satra's proposal and 45% prefer the existing scheme, I'd argue that that doesn't really justify having to introduce major changes to the entire tooling ecosystem, break people's habits, etc. |
@tsalo - i think using symlinks is not a good option moving forward as storage providers move more towards object stores (so won't work on s3 for example). @tyarkoni - in general i have always seen bids as a view, and a darn useful organized view, on a more complex underlying information flow model. so yes, there is no perfect view, just a pragmatic one that addresses a large set of use cases. i really like the idea of doing some A/B testing, but in general before we even implement something like this, i would like a discussion of considerations as to how many folks would find the view useful. so here are some use cases where the participant-centered view can be useful.
ps. i haven't yet commented on the hierarchy principle issue, but will do so sometime soon. it's a complex issue and relates to this proposal as well. |
Sorry -- my reply came out long, but I think the issue is touching on many of largely orthogonal issues and should be broken into separate ones. So I added some sectioning raw-vs-derived -- everything is derived!
I can only repeat an idiom I think BIDS should just accept and promote: any BIDS data(set) is derived data(set). Accepting it would IMHO resolve aforementioned contradiction. participant-vs-aggregate -- orthogonal issue, can be BIDS 1.x compatible
I think it is largely an orthogonal aspect to raw-vs-derived (again -- everything in BIDS is derived IMHO ;)). Even though hardware ATM does not produce "aggregates", I do not see why it hypothetically couldn't and my wild prediction would be that at some point it might produce population templates per study etc. So I would have added it as an additional "feature" explicitly (again annotated for in Composition -- yet for BIDS to standardize a bit moreAnother aspect which I think is discussed above without giving it an explicit name is "composition": we have not reached an ultimate agreement and thus have not provided a definite guidance on how BIDS datasets are composed together. Yes -- it was improved significantly with common derivatives adding a 2nd "alternative" composition in common-principles. But IMHO Note that a "study" even emerged naturally while preparing fmriprep Nature protocols paper, where there was Linking+Provenance -- platform specific features should be avoided in BIDS but "acknowledged"Decision on how to "compose" would affect "provenance" and thus possibility/fragility to any type of "linking" across datasets/modules. E.g. under YODA principles, all necessary components for dataset generation should be reachable "under" that dataset boundary/directory. So you could make a cut at Aforementioned composition talks about "dataset(s)" level. Discussions on "symlinking" (e.g. relevant non-completed discussion in BEP001) probably could be addressed by
Many of aforementioned aspects do not even need to wait for 2.0 IMHO, i.e. could be introduced in backward compatible way. |
+1 for "everything is derived"! The others are more subtle to simply give a positive or negative vote. |
on the one hand I agree that technically everything is derived. on the
other hand, most researchers in the field a comfortable with the idea of
"raw" data - i.e. the data as they are delivered by the measurement
device. We need to balance technical accuracy in the terminology with
usability - I think this issue definitely requires more discussion with
users in addition to developers.
…On Thu, Aug 27, 2020 at 12:04 AM Robert Oostenveld ***@***.***> wrote:
+1 for "everything is derived"!
The others are more subtle to simply give a positive or negative vote.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#43 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGUVEFBTLIWBMWC47FGWILSCYAQXANCNFSM4QBX7S6Q>
.
--
Russell A. Poldrack
Albert Ray Lang Professor of Psychology
Building 420
Stanford University
Stanford, CA 94305
[email protected]
http://www.poldracklab.org/
|
@satra
and others slated for BIDS 2.0 in https://github.com/orgs/bids-standard/projects/10 . If not resolved/sufficiently covered by other issues -- what specific changes would you propose? |
@yarikoptic - i think the intent of this issue was primarily asking if some aspects of organization are participant/session/cohort/group specific. some of it would indeed benefit from simplify having the provenance, but others would need some notion of separating grouping of derivatives, e.g. something like a group average connectome would be different from individual connectomes. i think you note all of these in your response above, but i'm not sure they are mapped to specific other issues. |
In BIDS thus far the notion of source data and derived data is a little contrived/vague. For example a multi-echo T1-weighted recon comes out of the scanner from a MEMPRAGE sequence is considered source data, while the FA image that comes out is not considered source data.
As scanners and other instruments get more advanced and start generating what we traditionally call derivatives (think GPU based processing on the scanner), this will lead to questions of where data goes.
To simplify consideration, the possibility I would like the BIDS community to consider is to separate data not by source vs derivatives, but by participant vs
aggregatenon-individual. As examples:Participant
...
AggregateNon-individual...
This makes it, in my opinion, simpler to consider with regard to both metadata and with respect to provenance.
Would love to hear thoughts on this potential reframing.
The text was updated successfully, but these errors were encountered: