Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add 'type' field to json sidecar #221

Closed
nbeliy opened this issue May 7, 2019 · 3 comments · Fixed by #605
Closed

[ENH] Add 'type' field to json sidecar #221

nbeliy opened this issue May 7, 2019 · 3 comments · Fixed by #605
Labels
enhancement New feature or request opinions wanted Please read and offer your opinion on this matter

Comments

@nbeliy
Copy link

nbeliy commented May 7, 2019

The tabular files like scans.tsv can contain a lot of user-defined useful information. However there no restriction on the format of such information. So if one need to extract it from files, he need to hardcode each column in its string.

However if a type of entry is stored in the corresponding sidecar json file, the extraction scripts will know how to parce the values of columns. Also it will allow the validator to control the values in tabular files.

Possible types could be:
text, int, uint, float, date, time and datetime

@sappelhoff
Copy link
Member

This is usually being dealt with through upstream tools such as (pandas)[https://pandas.pydata.org/] for python. You read the columns and the value types become one that can hold all others.

For example cells of type "int" become represented as "float", if there are some cells of type "float" that cannot be represented as "int". To me this seems unproblematic.

Can you perhaps provide some examples to illustrate your problem? And also examples how your JSON and TSV files would look with the change that you are proposing?

@nbeliy
Copy link
Author

nbeliy commented May 7, 2019

It is not a really a problem.
I try to construct an SQL database from a bids dataset. The values from tsv will enter into table, so I can get a list of recordings for subject of weight between 80 and 90 kg (a purely virtual example).

But I don't wont to create the table manually, but by scanning the tsv file and the json file. Given column "weight" I will check its description in json, and create SQL column of corresponding type.

The tsv will be unchanged, and json will become something like:

{
  "test": {
    "LongName": "Education level",
    "Description": "Education level, self-rated by participant",
    "Type": "uint",
    "Levels": {
      "1": "Finished primary school",
      "2": "Finished secondary school",
      "3": "Student at university",
      "4": "Has degree from university"
    }
  },
  "bmi": {
    "LongName": "Body mass index",
    "Units": "kilograms per squared meters",
    "Type": "float",
    "TermURL": "http://purl.bioontology.org/ontology/SNOMEDCT/60621009"
  }
}

I think it is a very minor but helpful change

@sappelhoff sappelhoff added enhancement New feature or request opinions wanted Please read and offer your opinion on this matter labels May 7, 2019
@sappelhoff
Copy link
Member

I don't see how it solves a problem that couldn't be solved otherwise, but I agree that it's a minor change that might be helpful in some situations. Let's see what the community says about it.

In any case, I would change the suggested "Type" field to "DataType".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request opinions wanted Please read and offer your opinion on this matter
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants