-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Clarify that data files must be uniquely identified by entities/suffix #1508
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to think out loud of corner cases where duplicates could be viewed as useful: A project might rely on two pieces of software, one reading TIFF files only, the other one reading JPEG files only. It might seem useful to keep both formats of the same image in the dataset. The purpose would not only be to "cache" both formats to avoid re-generating, but also "archiving" both formats just in case the conversion software is lost.
But then, just use different names, on the plus side you'd know which image is the original the other image has been converted from:
sub-01_ses-01_sample-A_original.tif
sub-01_ses-01_sample-A_converted.jpg
I really cannot find a useful case, on the contrary duplicate file names just bring confusion.
I feel like this doesn't quite hit the mark for me, and I agree with @DimitriPapadopoulos that the focus on identical data seems too narrow. Maybe it's worth introducing terms:
Data files MUST be uniquely identified by entities+suffix. Only metadata files (format-specified or sidecar) files may share entities and suffix with a data file. We can then go on to say that this implies that data files MUST NOT have the same entities and suffix, and using shared entities and suffix to indicate the same data encoded in multiple formats is not supported. I don't know if it's worth bringing in the additional concept of:
The same rules apply to associated data as data, and the photo is a good example here. |
were you thinking of adding those in the definitions? |
I think we need some way of distinguishing these concepts to explain what overlaps are permitted and what aren't, but I'm not committed to these terms or necessarily making them "terms" in the sense of getting schema entries (though maybe). If we can write one or two paragraphs that make it clear, then cool. But I guess first: Do you all agree with this breakdown? |
I agree with this breakdown. I suspect examples will help. I would suggest having the definitions in the spec first but give ourselves before we move them into the schema (in another pr) |
I do, except for this special case of a "Data file" that I can think of: The EEGLAB file format has But that's just something to keep in mind if we were to make lists of extensions and classify them under Chris' three categories ... which I don't see any need for. (same argument also goes for
I think that'd make sense. These files are not really data files in their own right (one wouldn't have a BIDS dataset with just such files), so it'd be good to distinguish them. |
How about this:
|
sounds great to me ✅ feel free to push a commit --- else I'll push a commit with you as a co-author tomorrow @effigies |
Codecov ReportPatch and project coverage have no change.
Additional details and impacted files@@ Coverage Diff @@
## master #1508 +/- ##
=======================================
Coverage 87.80% 87.80%
=======================================
Files 14 14
Lines 1287 1287
=======================================
Hits 1130 1130
Misses 157 157 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve of my own text...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve of @effigies approving his own text
related to #1487