Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] consistent units description between EEG/MEG/iEEG. Clarify (derived) SI units + prefixes #391

Merged
merged 5 commits into from
Mar 16, 2020

Conversation

sappelhoff
Copy link
Member

This PR is on units and is related to bids-standard/bids-validator#893

  1. Add Wikipedia links to SI and SI-derived units where I thought they are helpful
  2. Make clear in iEEG description of units that derived SI units are permissible in BIDS
  3. Copy over iEEG description of units to EEG and MEG for consistency across the spec

While working on this PR I stumbled over two more discussion points. @robertoostenveld @jasmainak

standard prefixes MUST be further described in JSON?

In case prefixes are added to SI or non-SI units (e.g., mm), the prefixed units MUST be specified in the JSON file

In my opinion we should change this. Standard prefixes like those described in the appendix V should NOT HAVE TO be specified in the JSON.

If the prefixes are non-standard, then the unit is non-standard and then it would HAVE TO BE described in the JSON, as outlined in the paragraph in the spec before.

Non-standard units are allowed (if defined in JSON)

In case there are valid reasons to deviate from SI units or SI
derived units, the units MUST be specified in the sidecar JSON file.

I think that's a good thing, but with the current changes to the bids-validator, we are stopping to allow this bids-standard/bids-validator#777 bids-standard/bids-validator#897. cc @rwblair

@robertoostenveld
Copy link
Collaborator

robertoostenveld commented Jan 10, 2020

In case prefixes are added to SI or non-SI units (e.g., mm), the prefixed units MUST be specified in the JSON file

In my opinion we should change this. Standard prefixes like those described in the appendix V should NOT HAVE TO be specified in the JSON.

The common principles say that the prefixed unit MUST be specified, not that the prefix MUST be specified. I read the current specification as: if the units are in SI without prefix (e.g. m, V, T, but also T/m) they do not have to be specified. If the units are in prefixed SI (e.g. mm, µV, but also fT/cm) they MUST be specified. Also, if units are non-SI, like Gauss (instead of T) or inch (instead of m), they MUST be specified.

Note that the unit being non-standard is orthogonal to the unit being prefixed. I.e. there exist also prefixed non-standard units like milli-Gauss, abbreviated as mG.

@sappelhoff
Copy link
Member Author

sappelhoff commented Jan 10, 2020

I read the current specification as: if the units are in SI without prefix (e.g. m, V, T, but also T/m) they do not have to be specified. If the units are in prefixed SI (e.g. mm, µV, but also fT/cm) they MUST be specified.

yes, I also read it like that. But I see no point in having to specify mm or µV (i.e., the m or µ prefixes) as long as they are taken from our list of "standard prefixes" --> https://bids-specification.readthedocs.io/en/stable/99-appendices/05-units.html#prefixes

@robertoostenveld
Copy link
Collaborator

robertoostenveld commented Jan 10, 2020

But in that case you don’t know what unit e.g. a measure of distance (as coded in the form, qform or homogenous matrix in a nifti file) is expressed in. I.e. you would not be able to distinguish an elephant from a mouse brain based on the size.

Perhaps more important than discussing whether it should be specified in the json, is whether it should be specified somewhere at all. E.g. if the raw data file specifies it explicitly (as e.g. with EDF), I don't see the necessity to have it duplicated in the metadata. But if the raw data file does not specify it (as e.g. with NIFTI afaik), I believe that the dataset producer/curator should specify the units (and hence in that case it must go into the metadata).

So rather than focussing on the table in the appendix, I suggest to focus on the problem it relates to (data measured in units that are unknown to the future user of that data).

@sappelhoff
Copy link
Member Author

But in that case you don’t know what unit e.g. a measure of distance (as coded in the form, qform or homogenous matrix in a nifti file) is expressed in. I.e. you would not be able to distinguish an elephant from a mouse brain based on the size.

I am not sure we are talking about the same thing.

I say that it's enough (but necessary) to specify the unit in the TSV alone ... without an additional entry in the sidecar JSON ... as long as the unit is standard (standard SI or derived + optional standard standard prefix).

Whereas the spec currently says that:

  1. any SI unit does not need a specific JSON entry
  2. but as soon as a prefix (even a standard one) is added, a JSON entry becomes mandatory

I want to relax point 2.

@robertoostenveld
Copy link
Collaborator

I am not sure we are talking about the same thing.

I am also not sure ...

I don't care too much whether a unit that is not specified in the datafile goes in the TSV or in the JSON, as long as the data provider specifies it somewhere. The JSON allows less (obvious) flexibility for units in case there are multiple "things" in the datafile (like channels) with potentially different units, whereas a TSV caters easily for the multiple channels, so between them it is a convenience argument.

I see no reason to specify it twice (regardless of whether it is prefixed or not, or standard or not).

If units are not specified in the datafile itself, nor in the corresponding JSON or a corresponding TSV, I would say that the user of the data can assume units to be SI without some disaster happening. If units are not SI, i.e. either prefixed, or nonstandard, or prefixed nonstandard, it MUST be shared in the datafile or in the metadata.

@satra
Copy link
Collaborator

satra commented Jan 10, 2020

a couple of thoughts:

  1. across different modalities in BIDS there is little consistency of where units are stored.
  • in NWB for iEEG, and NIfTI for MR, the units are in the file itself;
  • for TSVs the units are sometimes in the header (really bad idea imo) and sometimes in an associated json.
  • some units are enforced (e.g., RepetitionTime is in seconds and Age is in years)

the work we are doing with NIDM is to ensure that there is consistency of attributes, i.e. when as a machine i look up the key RepetitionTime from BIDS. i know the units associated with the values.

i think it would be prudent in BIDS to ensure that units are handled with care and not left to the whims of the data provider. so where there is an opportunity to specify units it should be a in controlled space (like an associated TSV or JSON with a specific key units, and should follow a consistent parseable form - see 2)

  1. I thought there was a discussion and a plan to adopt the MIXF formatting for units (i can't seem to find it right now). i would very much stay away from unicode characters, and strongly recommend ascii for units - https://people.csail.mit.edu/jaffer/MIXF/MIXF-10

@effigies
Copy link
Collaborator

Re MIXF: #151

@sappelhoff
Copy link
Member Author

sappelhoff commented Jan 13, 2020

across different modalities in BIDS there is little consistency of where units are stored.

in electrophys extensions the units are stored as a TSV column in channels.tsv files. If a "custom" (i.e., non-SI) unit were to be used in the channels.tsv file, we would expect an accompanying channels.json file with a key-value pair

example:

"unit": { 
    "Levels": {
        "custom_unit": "description of custom unit"
    }
}

With the current changes in bids-validator, custom units are no longer accepted -- that was issue 2 in my original post

NOTE: if we have a normal SI unit, an accompanying description in a channels.json file is OPTIONAL, specifying it in channels.tsv is enough.


for TSVs the units are sometimes in the header (really bad idea imo) and sometimes in an associated json.

Do you have an example file of units being in the header of a TSV file? With header I assume you mean the first row of the table? I don't think I have seen something like this.

--> I expect units to be handled like I outlined in the beginning of this post.


some units are enforced (e.g., RepetitionTime is in seconds and Age is in years)

True, as described at this point in the spec. I don't think the validator accounts for that right now, which is a point for improvement. cc @rwblair ... just to add to a list you might have.


I thought there was a discussion and a plan to adopt the MIXF formatting for units

Let's keep the discussion for that in #151, because this PR would neither oppose nor favor the adoption of MIXF formatting.


This leaves me with my issue 1 from my original post.

All changes in this PR are "housekeeping". EXCEPT for these lines right above my comment anchor: https://github.com/bids-standard/bids-specification/pull/391/files#r365739390

Perhaps my changes are not very clear and I am open to suggestions on the wording.

What I want to say is:

  1. SI units can be specified in a TSV ... an accompanying description of column levels in a channels.json is NOT REQUIRED
    1. this includes SI units that are either derived and/or prefixed by one of the prefixes taken from our appendix V
  2. if units are custom, they REQUIRE an accompanying description in a channels.json

@effigies
Copy link
Collaborator

Regarding Unicode, I'm not sure we can put the genie back in the bottle in BIDS 1.x, because there already are Unicode code points explicitly permitted: ° and µ

If we would like to restrict the expansion of Unicode points, then I think we might want to attend to bids-standard/bids-validator#897, which is proposing superscript characters ([⁻⁰¹²³⁴⁵⁶⁷⁸⁹]) for exponents.

Is it worth explicitly describing how derived units should be formatted, including spaces, parentheses and multiplication/division/exponentiation symbols?

@effigies
Copy link
Collaborator

Also, I've just read this, and I'm extremely confused. What does the following mean, practically?

SI units can be specified in a TSV

What would a TSV that specifies SI units look like? Or do you mean that the spec can say "Columns with this name have unit XYZ, unless an alternative unit is specified in channels.json"?

@sappelhoff
Copy link
Member Author

sappelhoff commented Jan 15, 2020

What does the following mean, practically?

Here is an example. See the units column of this *_channels.tsv file:

image

We have an SI unit with a prefix taken from the BIDS appendix: µV. This would be fine IMO and would NOT REQUIRE a *_channels.json further explaining the units column. (however, obviously, such a *_channels.json file could optionally exist to explain some other arbitrary column).

@robertoostenveld
Copy link
Collaborator

Regarding previous comment from @sappelhoff and adding some historical perspective: if EEG/MEG/iEEG files would have a single channel type (e.g. EEG) with data expressed in a single unit (e.g. µV), then it would have been convenient to have this only once specified in a JSON file. But typical EEG/MEG/iEEG files often have channels of different types and thereby different units (e.g. EEG and trigger, or MEG and EEG and trigger, or EEG and AUX channels), therefore in developing the EEG/MEG/iEEG specification it was decided this information is more conveniently represented in tabular format and hence it goes in the (channels) TSV file.

I think that the JSON/TSV distinction should not distract from the "clarify derived units and standard prefixes" issue, but the resolution should indeed be clear wrt the concrete implementation/documentation.

@robertoostenveld
Copy link
Collaborator

With regards to the proposed changes in this PR: I consider them an improvement, even though (apparently) not all confusion is solved with this. I do not see conflicts wrt the current standard (i.e. technically there is no change in the standard) and therefore I recommend to merge this.

I also propose that the confusion remaining after merging this is discussed in a separate issue.

jasmainak
jasmainak previously approved these changes Jan 17, 2020
Copy link
Collaborator

@jasmainak jasmainak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks mostly clarification, so good to go for me after addressing my question.

@rwblair
Copy link
Member

rwblair commented Jan 23, 2020

@sappelhoff I'm updating the validator to not require age and sex unit definitions in participants.json, as it currently throws a warning for these.

Is there any wonkiness that we know about regarding the validator that would cause it to be in conflict with this PR?

@sappelhoff
Copy link
Member Author

@sappelhoff I'm updating the validator to not require age and sex unit definitions in participants.json, as it currently throws a warning for these.

sounds fine to me. 👍

@sappelhoff
Copy link
Member Author

With regards to the proposed changes in this PR: I consider them an improvement, even though (apparently) not all confusion is solved with this. I do not see conflicts wrt the current standard (i.e. technically there is no change in the standard) and therefore I recommend to merge this.

@robertoostenveld can you merge this PR then? there has been no opposition to it and we have two approving reviews.

I also propose that the confusion remaining after merging this is discussed in a separate issue.

yes, #151 is the main issue for that. See also this comment bids-standard/bids-examples#175 (comment) from another issue thread.

@satra
Copy link
Collaborator

satra commented Mar 5, 2020

@sappelhoff and @robertoostenveld - since age is a SHOULD and therefore no way to guarantee the units, should we say age is an exception. also should we update the specs to say that if you encode age in anything other than years, you have to include the unit in the participants.json file.

regarding the other common demographics (e.g., height, weight), should an exception be made, or should SI be insisted on. seems like this should be a community discussion. and if an exception is made, again units should be included.

finally should all durations also be excepted? for a different project we adopted the period convention from iso8601.

@sappelhoff
Copy link
Member Author

@satra I think your most recent point in #391 (comment) deserves its own issue.

I am going to rebase and merge this PR now as a minor improvement of wording. More clarification work (and definitions) will be done in #411

@sappelhoff sappelhoff changed the title FIX: Clarify that derived SI units and standard prefixes are allowed [FIX] consistent units description between EEG/MEG/iEEG. Clarify (derived) SI units + prefixes Mar 16, 2020
@sappelhoff sappelhoff merged commit 7b26ee8 into bids-standard:master Mar 16, 2020
@sappelhoff sappelhoff deleted the units branch March 16, 2020 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants