Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify usage of "dot notation" for sub-objects #841

Closed
Remi-Gau opened this issue Aug 4, 2021 · 16 comments · Fixed by #1097
Closed

Clarify usage of "dot notation" for sub-objects #841

Remi-Gau opened this issue Aug 4, 2021 · 16 comments · Fixed by #1097
Labels
discussion ongoing discussion schema Issues related to the YAML schema representation of the specification. Patch version release.

Comments

@Remi-Gau
Copy link
Collaborator

Remi-Gau commented Aug 4, 2021

The genetics object in dataset_description has 3 sub-objects:

  • Dataset
  • Database
  • Descriptors

https://bids-specification.readthedocs.io/en/stable/04-modality-specific-files/08-genetic-descriptor.html#dataset-description

Currently the schema has a single yaml file for each sub-object but there is no yaml file for the Genetics.
Moreover the nested objects is not captured by the schema.

And the way the sub-objects are mentioned in the rendred table involve a dot notation Genetics.Dataset that appears nowhere else in the spec.

Suggestion

- create yaml file for Genetics

  • rewrite the part of the the Genetics to remove the dot notation from the table

Possible issues

Not sure how to formalize the nested structure in the schema itself:

Should the sub-objects of Genetics be defined in the yaml of Genetics?

Should we keep the yaml files for each sub-object, mention in the README that this dot notation means they are sub-objects?

I don't think it makes sense to have pieces of metadata just called Dataset, Database, Descriptors when they only relate to Genetics.

Tagging @tsalo and @CPernet for ideas and feedback on this.

@Remi-Gau Remi-Gau added the schema Issues related to the YAML schema representation of the specification. Patch version release. label Aug 4, 2021
@CPernet
Copy link
Collaborator

CPernet commented Aug 5, 2021

{
  "Name": "Human Connectome Project",
  "BIDSVersion":  "1.3.0",
  "License": "CC0",
  "Authors": ["1st author", "2nd author"],
  "Funding": "P41 EB015894/EB/NIBIB NIH HHS/United States",
  "Genetics": {
     "Dataset": "https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001364.v1.p1",
     "Database": "https://www.ncbi.nlm.nih.gov/gap/",
     "Descriptors": "doi:10.1016/j.neuroimage.2013.05.041"
     }
  "DatasetDOI": "doi:myimagaing data"
}

could be changed without subfields if that's the direction we are taking ; whatever is easier

{
  "Name": "Human Connectome Project",
  "BIDSVersion":  "1.3.0",
  "License": "CC0",
  "Authors": ["1st author", "2nd author"],
  "Funding": "P41 EB015894/EB/NIBIB NIH HHS/United States",
  "GeneticsDataset": "https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001364.v1.p1",
   "GeneticsDatabase": "https://www.ncbi.nlm.nih.gov/gap/",
   "GeneticsDescriptors": "doi:10.1016/j.neuroimage.2013.05.041"
  "DatasetDOI": "doi:myimagaing data"
}

@Remi-Gau
Copy link
Collaborator Author

Remi-Gau commented Aug 5, 2021

That would be ideal but would break backward compatibility, no?
So that option would be for BIDS 2.0.

@tsalo
Copy link
Member

tsalo commented Aug 5, 2021

I think we could use something like the following for Genetics.yaml:

---
name: iEEGCoordinateSystem
description: |
Defines the coordinate system for the iEEG sensors.
See
[Appendix VIII](/99-appendices/08-coordinate-systems.html)
for a list of restricted keywords for coordinate systems.
If `"Other"`, provide definition of the coordinate system in
`iEEGCoordinateSystemDescription`.
If positions correspond to pixel indices in a 2D image
(of either a volume-rendering, surface-rendering, operative photo, or
operative drawing), this MUST be `"Pixels"`.
For more information, see the section on
[2D coordinate systems](/04-modality-specific-files/04-intracranial\
-electroencephalography.html#allowed-2d-coordinate-systems).
anyOf:
- $ref: _iEEGCoordSys.yaml
- $ref: _StandardTemplateCoordSys.yaml
- $ref: _StandardTemplateDeprecatedCoordSys.yaml

Then Genetics.Database.yaml, Genetics.Dataset.yaml, and Genetics.Descriptors.yaml could be renamed to Database_genetics.yaml, Dataset_genetics.yaml, and Descriptors_genetics.yaml, respectively, much like EchoTime_fmap.yaml. Or we could put the definitions directly in Genetics.yaml.

We'll need to update the rendering code to render sub-objects in tables though.

@Remi-Gau
Copy link
Collaborator Author

Remi-Gau commented Aug 6, 2021

Thanks for the pointer: I will give this a try.

@Remi-Gau
Copy link
Collaborator Author

Remi-Gau commented Aug 6, 2021

Hum... There is a Genetics yaml file that already implements your suggestion @tsalo...

🤦

OK will only get rid of the dot notation then.

@Remi-Gau Remi-Gau changed the title Genetics subobject in the schema Remove "dot notation" Genetics subobject in the schema Aug 6, 2021
@Remi-Gau Remi-Gau changed the title Remove "dot notation" Genetics subobject in the schema Remove "dot notation" from Genetics subobject in the schema Aug 6, 2021
@sappelhoff
Copy link
Member

I actually like the dot notation because it shows that the item after the dot is a member of the object before the dot 🤔 similar to how you refer to attributes in Python objects.

@Remi-Gau
Copy link
Collaborator Author

Remi-Gau commented Aug 6, 2021

I actually like the dot notation because it shows that the item after the dot is a member of the object before the dot thinking similar to how you refer to attributes in Python objects.

I kind of agree though we would have to clearly formalize somewhere in the schema that this is how sub-objects should be "encoded".

FYI: I raised the issue because given how those yml files are named at the moment, I had BIDS-matlab complain because it was trying to create a structure with a fieldname Genetics.Dataset which is not valid because fieldnames can't have dot in them.

@sappelhoff
Copy link
Member

sappelhoff commented Aug 6, 2021

FYI: I raised the issue because given how those yml files are named at the moment, I had BIDS-matlab complain because it was trying to create a structure with a fieldname Genetics.Dataset which is not valid because fieldnames can't have dot in them.

I see 🤔 so we need to:

  1. agree that we want the dot notation as a convention to indicate membership in an object
  2. formalize it
  3. apply it to the spec

(in order for downstream packages like bids-matlab to reliably work with this)

Let's also ask @effigies

@satra
Copy link
Collaborator

satra commented Aug 6, 2021

to me this seems like a big decision and has much wider ramifications. in particular it takes json (and json schema), which is perfectly capable of nested structures and arrays and imposes a key convention that every tool has to implement. personally i think nesting should be left, with yaml describing "Genetics" in this case as type object.

i think this specific issue is more than just the flattening of things. does the descriptor allow for multiple genetic datasets? the paper that it references does. and different omic data could sit in different databases. further this is also about consistency of terms. a genetics dataset is perhaps no different from a bids dataset, so having the terms inside be aligned with terms at the bids dataset level will allow reuse of the same terminology that's in bids without having to create a prefix (which is effectively what the dot or underscore notation is doing).

@sappelhoff sappelhoff changed the title Remove "dot notation" from Genetics subobject in the schema Remove "dot notation" from Genetics subobject in the schema? Aug 6, 2021
@sappelhoff sappelhoff added the discussion ongoing discussion label Aug 6, 2021
@tsalo
Copy link
Member

tsalo commented Aug 6, 2021

Hum... There is a Genetics yaml file that already implements your suggestion @tsalo...

🤦

😆 In my defense I was traveling yesterday, so I was operating on like 50% mental capacity.

I kind of agree though we would have to clearly formalize somewhere in the schema that this is how sub-objects should be "encoded".

When you say "encoded", do you mean how they're shown in the specification or how they're actually specified in metadata? Like making Genetics.Database an actual valid key, rather than just a visual convention for Genetics["Database"] within the metadata table?

If it's just a matter of rendering, then I think we just need a make_metadata_subobject_table() function that operates similarly to make_metadata_table() but adds rows for subobjects. Does that sound right, or am I misunderstanding the issue?

@CPernet
Copy link
Collaborator

CPernet commented Aug 9, 2021

+1 on all of @satra comments, we chose attributes of the genetics object (indeed like python @sappelhoff) with multiple omics possible indeed

@erdalkaraca
Copy link
Collaborator

I would also expect the spec to formalize the semantics of the dot notation for tree like structures.

I kind of agree though we would have to clearly formalize somewhere in the schema that this is how sub-objects should be "encoded".

@Remi-Gau
Copy link
Collaborator Author

OK we talked with the maintainers about this.

We agreed that this is "mostly" a rendering issue at the moment.

So we'll keep the metadata yml filename and metadata name as they were.

For example: Genetics.Dataset

---
name: Genetics.Dataset
description: |
  [URI](/02-common-principles.html#uniform-resource-indicator)
  where data can be retrieved.
type: string
format: uri

The idea is to create a new macro to render sub-objects of a an object in a separate table without the dot notation.

Ideally we would want to formalize that this is how subobjects metadata should be encoded in the schema and systematize this rendering in the spec.

@Remi-Gau Remi-Gau changed the title Remove "dot notation" from Genetics subobject in the schema? Clarify usage of "dot notation" for sub-objects Aug 18, 2021
@CPernet
Copy link
Collaborator

CPernet commented Aug 18, 2021

FWI our reasoning was that genetics was a single 'object' added to the BIDS descriptor so any things related to that would be 'attributes' to that object - maybe that a good enough approach to define how to use dot notation ...

@Remi-Gau
Copy link
Collaborator Author

yup makes sense.
just trying to make this more general and potentially apply to new objects with specific attributes that could be added in the spec in the future.

@tsalo
Copy link
Member

tsalo commented Feb 1, 2022

Once #919 is merged and we implement a macro for generating tables for individual objects, I think we will be able to get rid of the Genetics.*.yaml files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion ongoing discussion schema Issues related to the YAML schema representation of the specification. Patch version release.
Projects
None yet
6 participants