-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] consistent units description between EEG/MEG/iEEG. Clarify (derived) SI units + prefixes #391
Conversation
The common principles say that the prefixed unit MUST be specified, not that the prefix MUST be specified. I read the current specification as: if the units are in SI without prefix (e.g. m, V, T, but also T/m) they do not have to be specified. If the units are in prefixed SI (e.g. mm, µV, but also fT/cm) they MUST be specified. Also, if units are non-SI, like Gauss (instead of T) or inch (instead of m), they MUST be specified. Note that the unit being non-standard is orthogonal to the unit being prefixed. I.e. there exist also prefixed non-standard units like milli-Gauss, abbreviated as mG. |
yes, I also read it like that. But I see no point in having to specify |
But in that case you don’t know what unit e.g. a measure of distance (as coded in the form, qform or homogenous matrix in a nifti file) is expressed in. I.e. you would not be able to distinguish an elephant from a mouse brain based on the size. Perhaps more important than discussing whether it should be specified in the json, is whether it should be specified somewhere at all. E.g. if the raw data file specifies it explicitly (as e.g. with EDF), I don't see the necessity to have it duplicated in the metadata. But if the raw data file does not specify it (as e.g. with NIFTI afaik), I believe that the dataset producer/curator should specify the units (and hence in that case it must go into the metadata). So rather than focussing on the table in the appendix, I suggest to focus on the problem it relates to (data measured in units that are unknown to the future user of that data). |
I am not sure we are talking about the same thing. I say that it's enough (but necessary) to specify the unit in the TSV alone ... without an additional entry in the sidecar JSON ... as long as the unit is standard (standard SI or derived + optional standard standard prefix). Whereas the spec currently says that:
I want to relax point 2. |
I am also not sure ... I don't care too much whether a unit that is not specified in the datafile goes in the TSV or in the JSON, as long as the data provider specifies it somewhere. The JSON allows less (obvious) flexibility for units in case there are multiple "things" in the datafile (like channels) with potentially different units, whereas a TSV caters easily for the multiple channels, so between them it is a convenience argument. I see no reason to specify it twice (regardless of whether it is prefixed or not, or standard or not). If units are not specified in the datafile itself, nor in the corresponding JSON or a corresponding TSV, I would say that the user of the data can assume units to be SI without some disaster happening. If units are not SI, i.e. either prefixed, or nonstandard, or prefixed nonstandard, it MUST be shared in the datafile or in the metadata. |
a couple of thoughts:
the work we are doing with NIDM is to ensure that there is consistency of attributes, i.e. when as a machine i look up the key i think it would be prudent in BIDS to ensure that units are handled with care and not left to the whims of the data provider. so where there is an opportunity to specify units it should be a in controlled space (like an associated TSV or JSON with a specific key units, and should follow a consistent parseable form - see 2)
|
Re MIXF: #151 |
in electrophys extensions the units are stored as a TSV column in example:
With the current changes in bids-validator, custom units are no longer accepted -- that was issue 2 in my original post NOTE: if we have a normal SI unit, an accompanying description in a
Do you have an example file of units being in the header of a TSV file? With header I assume you mean the first row of the table? I don't think I have seen something like this. --> I expect units to be handled like I outlined in the beginning of this post.
True, as described at this point in the spec. I don't think the validator accounts for that right now, which is a point for improvement. cc @rwblair ... just to add to a list you might have.
Let's keep the discussion for that in #151, because this PR would neither oppose nor favor the adoption of MIXF formatting. This leaves me with my issue 1 from my original post. All changes in this PR are "housekeeping". EXCEPT for these lines right above my comment anchor: https://github.com/bids-standard/bids-specification/pull/391/files#r365739390 Perhaps my changes are not very clear and I am open to suggestions on the wording. What I want to say is:
|
Regarding Unicode, I'm not sure we can put the genie back in the bottle in BIDS 1.x, because there already are Unicode code points explicitly permitted: ° and µ If we would like to restrict the expansion of Unicode points, then I think we might want to attend to bids-standard/bids-validator#897, which is proposing superscript characters ( Is it worth explicitly describing how derived units should be formatted, including spaces, parentheses and multiplication/division/exponentiation symbols? |
Also, I've just read this, and I'm extremely confused. What does the following mean, practically?
What would a TSV that specifies SI units look like? Or do you mean that the spec can say "Columns with this name have unit XYZ, unless an alternative unit is specified in |
Here is an example. See the We have an SI unit with a prefix taken from the BIDS appendix: |
Regarding previous comment from @sappelhoff and adding some historical perspective: if EEG/MEG/iEEG files would have a single channel type (e.g. EEG) with data expressed in a single unit (e.g. µV), then it would have been convenient to have this only once specified in a JSON file. But typical EEG/MEG/iEEG files often have channels of different types and thereby different units (e.g. EEG and trigger, or MEG and EEG and trigger, or EEG and AUX channels), therefore in developing the EEG/MEG/iEEG specification it was decided this information is more conveniently represented in tabular format and hence it goes in the (channels) TSV file. I think that the JSON/TSV distinction should not distract from the "clarify derived units and standard prefixes" issue, but the resolution should indeed be clear wrt the concrete implementation/documentation. |
With regards to the proposed changes in this PR: I consider them an improvement, even though (apparently) not all confusion is solved with this. I do not see conflicts wrt the current standard (i.e. technically there is no change in the standard) and therefore I recommend to merge this. I also propose that the confusion remaining after merging this is discussed in a separate issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR looks mostly clarification, so good to go for me after addressing my question.
@sappelhoff I'm updating the validator to not require age and sex unit definitions in participants.json, as it currently throws a warning for these. Is there any wonkiness that we know about regarding the validator that would cause it to be in conflict with this PR? |
sounds fine to me. 👍 |
@robertoostenveld can you merge this PR then? there has been no opposition to it and we have two approving reviews.
yes, #151 is the main issue for that. See also this comment bids-standard/bids-examples#175 (comment) from another issue thread. |
@sappelhoff and @robertoostenveld - since age is a SHOULD and therefore no way to guarantee the units, should we say age is an exception. also should we update the specs to say that if you encode age in anything other than years, you have to include the unit in the participants.json file. regarding the other common demographics (e.g., height, weight), should an exception be made, or should SI be insisted on. seems like this should be a community discussion. and if an exception is made, again units should be included. finally should all durations also be excepted? for a different project we adopted the period convention from iso8601. |
@satra I think your most recent point in #391 (comment) deserves its own issue. I am going to rebase and merge this PR now as a minor improvement of wording. More clarification work (and definitions) will be done in #411 |
fe8bc9e
This PR is on units and is related to bids-standard/bids-validator#893
While working on this PR I stumbled over two more discussion points. @robertoostenveld @jasmainak
standard prefixes MUST be further described in JSON?
In my opinion we should change this. Standard prefixes like those described in the appendix V should NOT HAVE TO be specified in the JSON.
If the prefixes are non-standard, then the unit is non-standard and then it would HAVE TO BE described in the JSON, as outlined in the paragraph in the spec before.
Non-standard units are allowed (if defined in JSON)
I think that's a good thing, but with the current changes to the bids-validator, we are stopping to allow this bids-standard/bids-validator#777 bids-standard/bids-validator#897. cc @rwblair