Skip to content

ncihtan/data-models

Repository files navigation

HTAN Data Models

The HTAN Data Model can be explored at https://humantumoratlas.org/standards. This repository contains the files representing the data model.

Citation

DOI
To cite the latest data model use the Zenodo Concept DOI above (resolves to latest version). DOIs for specific versions can be found on Zenodo.

Major files

This repository contains 3 major files:

  1. HTAN.model.csv: The CSV representation of the HTAN data model. This file is created by the collective effort of data curators and annotators from a community (in this case HTAN), and will be used to create a JSON-LD representation of the data model. Imaging data captured by the model adheres to the Minimum Information about Tissue Imaging (MITI) standard.

  2. HTAN.model.jsonld | The JSON-LD representation of the HTAN data model, which is automatically created from the CSV data model using the schematic CLI. More details on how to convert the CSV data model to the JSON-LD data model can be found here. This is the central schema (data model) which will be used to power the generation of metadata manifest templates for various data types (e.g., scRNA-seq Level 1) from the schema.

  3. config.yml | The schematic-compatible configuration file, which allows users to specify values for application-specific keys (e.g., path to Synapse configuration file) and project-specific keys (e.g., Synapse fileview for community project). A description of what the various keys in this file represent can be found in the schematic installation guide.

Updating the data model

  1. Create and checkout a new branch from main. We suggest you work in a branch of this repo rather than on a fork.
    git checkout -b <your-feature-branch>
    
    • Ensure the branch name is descriptive eg fix-imaging-dimensions, rfx-rppa-level-4-v2 or add-days-validation.
    • Move the issue status to 'In progress'
  2. Locally edit HTAN.model.csv to add new features, ensuring careful transcription from an RFC or issue if approriate.
  3. If you have created a new component ensure it is added to dca-template-config.json
  4. Push the change with an informative commit message
    git add -A
    git commit -m "update data model"
    git push origin <your-feature-branch>
    
  5. Check that the Github actions to ensure model integrity and update the HTAN.model.jsonld has launched, completed and committed into your branch
    • The action can take ~7 mins to run
    • Monitor the action on Github
    • Ensure the actions have all completed with a green tick ✅
    • Ensure that the github-actions user has commited into your branch with the message auto convert to .jsonld
    • If model integrity tests fail review the errors and implement changes in your branch.
  6. Make a new PR from your feature branch to main
    • Assign @adamjtaylor as reviewer
    • Link the PR to the issue
    • Move the issue status to 'ready for review'.
  7. Merging is blocked until after review is approved.
    • Through review process if needed update the branch from main to ensure alignment with the latest data model.
  8. Once merged...
    • Delete the branch to keep things tidy (should be automatic)
    • Move the issue status to 'In staging'

Data release process

Data releases are made duirng the "Close out party" at the end of our approximatly monthly sprints

  1. Draft a new release
    • Create a new tag following CalVer format v<YY>.<MM>.MINOR eg v24.5.2 for the second release made in May 2024
    • Set the targrt to the main branch
    • Generate release notes
    • Save the draft release
  2. Review the release with at least one other team member and agree to release
    • Set as the latest release.
    • Issue the release
  3. Update Data Curator Confgi: