[ENH] Add 'type' field to json sidecar #221

nbeliy · 2019-05-07T07:50:26Z

The tabular files like scans.tsv can contain a lot of user-defined useful information. However there no restriction on the format of such information. So if one need to extract it from files, he need to hardcode each column in its string.

However if a type of entry is stored in the corresponding sidecar json file, the extraction scripts will know how to parce the values of columns. Also it will allow the validator to control the values in tabular files.

Possible types could be:
text, int, uint, float, date, time and datetime

sappelhoff · 2019-05-07T11:21:50Z

This is usually being dealt with through upstream tools such as (pandas)[https://pandas.pydata.org/] for python. You read the columns and the value types become one that can hold all others.

For example cells of type "int" become represented as "float", if there are some cells of type "float" that cannot be represented as "int". To me this seems unproblematic.

Can you perhaps provide some examples to illustrate your problem? And also examples how your JSON and TSV files would look with the change that you are proposing?

nbeliy · 2019-05-07T11:41:51Z

It is not a really a problem.
I try to construct an SQL database from a bids dataset. The values from tsv will enter into table, so I can get a list of recordings for subject of weight between 80 and 90 kg (a purely virtual example).

But I don't wont to create the table manually, but by scanning the tsv file and the json file. Given column "weight" I will check its description in json, and create SQL column of corresponding type.

The tsv will be unchanged, and json will become something like:

{
  "test": {
    "LongName": "Education level",
    "Description": "Education level, self-rated by participant",
    "Type": "uint",
    "Levels": {
      "1": "Finished primary school",
      "2": "Finished secondary school",
      "3": "Student at university",
      "4": "Has degree from university"
    }
  },
  "bmi": {
    "LongName": "Body mass index",
    "Units": "kilograms per squared meters",
    "Type": "float",
    "TermURL": "http://purl.bioontology.org/ontology/SNOMEDCT/60621009"
  }
}

I think it is a very minor but helpful change

sappelhoff · 2019-05-07T12:13:32Z

I don't see how it solves a problem that couldn't be solved otherwise, but I agree that it's a minor change that might be helpful in some situations. Let's see what the community says about it.

In any case, I would change the suggested "Type" field to "DataType".

sappelhoff added enhancement New feature or request opinions wanted Please read and offer your opinion on this matter labels May 7, 2019

This was referenced Aug 21, 2020

Enhance "JSON" documentations with new columns: "requirement level", "data type" #533

Closed

Sidecar fields should specify their data type (e.g. float or int) if not a string #350

Closed

sappelhoff mentioned this issue Sep 17, 2020

[FIX] Clarify data types and requirement levels for all JSON files #605

Merged

5 tasks

sappelhoff closed this as completed in #605 Sep 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Add 'type' field to json sidecar #221

[ENH] Add 'type' field to json sidecar #221

nbeliy commented May 7, 2019

sappelhoff commented May 7, 2019

nbeliy commented May 7, 2019

sappelhoff commented May 7, 2019

[ENH] Add 'type' field to json sidecar #221

[ENH] Add 'type' field to json sidecar #221

Comments

nbeliy commented May 7, 2019

sappelhoff commented May 7, 2019

nbeliy commented May 7, 2019

sappelhoff commented May 7, 2019