This program converts a simple TSV file into a HuBMAP ASCT+B table.
The included file "demo-input.txt" was generated by Excel using the "demo-input.xlsx" file (Save As "Tab delimited Text"). The generated output will be a TSV file, although the "demo-output.xlsx" file included in this repository is an Excel file.
This program has only been tested on a Mac OS using Python 3. Although it should work on a Linux system.
To process the demo input file and generate a TSV file that can be opened by Excel
process.py "<name of top level entity>" <number of anatomical structure levels> <input TSV file> <output TSV file>
process.py "organ" 3 demo-input.txt demo-output.xls
The tab delimited file should contain the following twelve columns:
NAME (REF DOI) LABEL (REF DETAILS) ID (REF NOTES) TYPE CHILDREN GENES PROTEINS PROTEOFORMS LIPIDS METABOLITES FTUs REFERENCES
The Type value needs to be "AS" for anatomical structures and "CT" for cell types. It doesn't matter what type values are used for the other items, so long as it's not either AS or CT.
Children is a comma separated list of child objects. These children need to be either anatomical structures (AS) or cell types (CT). The Genes, Proteins, Proteoforms, etc fields should be comma separated lists of the appropriate objects (e.g., Genes should be a comma separated list of relevant genes). In all cases the objects Name or Ref DOI should be used.
If an anatomical structure contains child structures or cell types, then it can not be assigned biomarkers (e.g., genes, proteins, etc). Biomarkers and references can only be applied to the lowest level of anatomical structures and to cell types.
The first line in the input file is assumed to contain a header and is ignored.
The following example is incomplete and just included to exemplify the field values and usage:
NAME (REF DOI) LABEL (REF DETAILS) ID (REF NOTES) TYPE CHILDREN GENES PROTEINS PROTEOFORMS LIPIDS METABOLITES FTU REFERENCES (NAME/DOI)
ovary UBERON:0000992 AS central ovary, lateral ovary, medial ovary, mesovarium, ovarian ligament, hilum of ovary
central ovary AS central inferior ovary, central superior ovary
lateral ovary AS lateral inferior ovary, lateral superior ovary
medial ovary AS medial inferior ovary, medial superior ovary
mesovarium UBERON:0001342 AS
ovarian ligament UBERON:0008847 AS
hilum of ovary AS ovarian artery, ovarian vein, pampiniform plexus, rete ovarii, hilar cell
corona radiata CL:0000713 CT doi:10.1093/oxfordjournals.humrep.a136365
hilar cell CL:0002095 CT alkaline phosphatase, acid phosphatase, non-specific esterase, inhibin, calretinin, melan-A, cholesterol esters McKay et al 1961, Boss et al 1965, Mills et al 2020, Jungbluth et al 1998, Pelkey et al 1998
mural granulosa cell CT doi:10.1093/oxfordjournals.humrep.a136365
primary oocyte CL:0000654 CT doi:10.1093/oxfordjournals.humrep.a136365
secondary oocyte CL:0000655 CT doi:10.1093/oxfordjournals.humrep.a136365
columnar ovarian surface epithelial columnar cell CT calretinin, mesothelin Mills et al 2020, Reeves et al 1971, Hummitzsch et al 2013, Blaustein et al 1979, McKay et al 1961
flattened cuboidal ovarian surface epithelial cell CT oviduct-specific glycoprotein-1, E-cadherin Mills et al 2020, Reeves et al 1971, Hummitzsch et al 2013, Blaustein et al 1979, McKay et al 1961
oviduct-specific glycoprotein-1 Protein
mesothelin Protein
E-cadherin Protein
doi:10.1093/oxfordjournals.humrep.a136365 PMID: 3558758 Reference
McKay et al 1961 McKay, D., Pinkerton, J., Hertig, A. & Danziger, S. (1961). The Adult Human Ovary: A Histochemical Study. Obstetrics & Gynecology, 18(1), 13-39. Reference
- The user needs to know how many levels for the anatomical structures or at least an over estimate of the number of levels.
- The program doesn't insert a header line in the output file.