-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Develop and implement HTAN/CDS seq template #396
Comments
|
Discussed on 2024.05.07 HTAN DCC Ops call:
@aditigopalan and @clarisse-lau can you please work together on this? Please let me know if it would be helpful to set up a time to work on this further. THANK YOU! |
Thank you @aclayton555 and @aditigopalan! (small correction: the mapping file is in the ncihtan/cds_dbgap repo) Wednesday 10:30-11:30am,12:30-1:30 PT |
Sorry, I missed this! Are you still available 10:30am PT tomorrow? @clarisse-lau |
No worries! 10:30 tomorrow works |
Just sent you an invite! |
A thought on this... Component is one of the source attributes used in the CDS mapping file (to map As we cannot have Component twice in a template, we could instead include the |
Here is the template for now, I replaced one of the components with "Data Type" Also here are the attributes: "Last Known Disease Status, Primary Diagnosis, Fixative Type, Treatment Outcome, SizeX, age_at_diagnosis_years, Genomic Reference, Race, NominalMagnification, Days to Recurrence, Morphology, Filename, SizeY, pi_last, Library Selection Method, Days to Last Known Disease Status, file_url_in_cds, pi_first, HTAN_Center, Biospecimen Type, Sequencing Platform, Tseries, Ethnicity, Tissue or Organ of Origin, File Format, SizeZ, PhysicalSizeY, channel_metadata_url, Microscope, Days to Last Follow up, HTAN Data File ID, LensNA, PhysicalSizeX, HTAN Participant ID, Library Layout, Vital Status, Software and Version, Treatment Type, Tumor Tissue Type, Objective, Pyramid, HTAN Biospecimen ID, SizeT, SizeC, Protocol Link, pi_email, Imaging Assay Type, Component, cancer_type, Site of Resection or Biopsy, WorkingDistance, md5, Immersion, Gender, File_Size, Zstack, Progression or Recurrence, Tumor Grade" Should we re-arrange the fields for clarity? Would IT also help to have a definition of some fields (eg: SizeZ) or would the users be familiar with these names? @clarisse-lau let me know what you think! |
Thank you @aditigopalan !
These changes should simplify the CDS template quite a bit, and as it would only include existing data model elements, users will have access to definitions for each field from the HTAN data model. Some rearranging to align with HTAN template conventions can be done at the implementation stage (i.e. using |
Just had a chat with Ashley & Adam. We'd like to subset the attribute list even further to include only sequencing attributes (plus the descriptor columns: Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Data Type). Clinical/biospecimen fields will be annotated separately by the center and pulled in from those templates respectively (as is currently done in the metadata generation scripts). |
@aditigopalan just checking on this and if there is anything you need the team to review at this stage. We are aiming to have this implemented and available for the Stanford center to test with the close out of our 24-5 sprint |
Thanks for checking in! Please let me know if this needs to be subsetted further @aclayton555 @adamjtaylor |
Thanks @aditigopalan I think we only need to have those attributes that actually come from the sequencing technology as the others will come from our Biospecimen and Clinical elements. So lets drop those and keep:
Plus the minimal HTAN columns for a component
|
@aditigopalan if you can open a draft PR and link to this issue that would be useful. Thank you! |
Add "CDS" prefix to all attributes for this template |
Merged! @aditigopalan if you could generate the template using schematic or staging DCA and report if it looks sensible that would be great. |
@adamjtaylor tested using dca-staging! Looks alright to me. |
AMAZING (and radical) COLLABORATION ON THIS! |
Not quite out of the woods yet! @aditigopalan is chasing down an errant loop in the DAG that is dragging some extra attributes into the template. |
Historically, we have leveraged the "Other Assay" template as a catch all for data types for which we do not have an assay-specific RFC and component yet available in our data model. This has allowed contributors to proceed with data submission and annotation under "Other assay." At a subsequent date, when the RFC has been performed and the assay-specific component has been implemented in the data model, the HTAN DCC has re-engaged with contributors to update their annotations from the "Other assay" template to the respective assay-specific template.
As we approach the end of HTAN 1.0, we have data types that need to be submitted for which we currently do not have templates in place (e.g. bulk ATACseq). We could have these submitted using the "Other Assay" template, however, we are trying to move away from the use of this template to the extent possible through the end of HTAN 1.0 and have as much data as possible annotated according to the assay-specific template. Furthermore, the "Other Assay" template does not provide sufficient information to allow for mapping and transfer to CDS.
This ticket emerges around the idea of developing a data level-agnostic, minimal sequencing data template as a catch all for remaining expected sequencing data. This template would capture relevant metadata per the existing HTAN data model AND be readily compatible/mapped to the existing CDS seq metadata template to enable transfer of data submitted under this template to CDS (through the remainder of HTAN 1.0).
Re: level-agnostic, the approach here is to create a low-lift template for contributors to complete for their various file types. Once received by the HTAN DCC, the DCC will determine how submitted files should be organized according to existing levels and tiers of controlled access (i.e. if a fastq is submitted, assume L1 and controlled access).
The text was updated successfully, but these errors were encountered: