Skip to content
Dan Fornika edited this page Oct 14, 2018 · 2 revisions

Project Background

Prior Sample/Data Processing

  • Samples collected from marine environments around the world
  • Water sample collected
  • Filtered to isolate viral component and bacteria (+viruses) component
  • Selected for dsDNA viruses only
  • Fractions underwent shotgun metagenomic sequencing
  • Assembled into contigs
  • Generated RPKM values
  • Compared to metabolomics databases (MetaCyc, COG, KEGG, 3 more) using FAST - combined the rpkm values for all enzymes in pathway with set of rules (specific to each pathway)
  • Output was RPKM value for enriched pathways for every sample
  • EBI Link: https://www.ebi.ac.uk/metagenomics/studies/ERP001736
  • Metadata Link: https://www.ebi.ac.uk/ena/submit/tara-oceans-checklist

Data Contents

  • RPKM - After the contigs are assembled, the reads were aligned back to the contigs - this was used to generate RPKM (essentially a normalized genomic abundance metric - reads per kilobase of transcript per million mapped reads)
  • Sample IDs (df_MASTERTABLE SAMPLE field):
    • Sample IDs that start with 'c' are include both bacterial and viral
    • Sample IDs that start with 'ERR' are from samples that were passed through a finer filter, so are viral only
  • Type (df_MASTERTABLE TYPE field)
    • SINGLE includes only one fraction
    • MULTI includes the viral and bacterial fraction data in a single analysis

Collection Date information exists - Simon Rao

  • See comment below for more details
  • It's okay to make the data public

Notes From Steve Hallam Visit

  • Tara Oceans Project - project that underwent the sampling expedition: http://ocean-microbiome.embl.de/companion.html
  • International consortium of oceanography/marine biologists - made a standardized sample collection process (data is comparable)
  • First expedition - photic samples - didn’t sample very deep
  • PathwayTools - prediction engine - need licence - made MetaCyc identifiers from this
  • PathoLogic - has harmonized names
  • New idea - metabolically functional genes encoded in viruses - more widespread than imagined before
  • Talked about this paper: http://www.pnas.org/content/108/39/E757.short
  • Cyanobacteria normally have fast turn-over - slow down and halt photosynthesis in response to viruses (sequester them and protect neighbouring cells) - virus carries genes that are part of the photosystem - overcomes the defence mechanism and promotes photosynthesis, cellular division
  • Pathway tools - KEGG Atlas - have diagrams for metabolism - recommended using these
  • Envisions this turning into a manuscript - Nature Scientific Data publication
  • Heatmap with distribution of pathways good starting point (something similar to KEGG atlas ideal though)
  • Want to be able to do things like compare samples in Indian Ocian to x Ocean
  • Pathways by location heatmap
  • Metaviriome - attracted to certain pathways - want to visualize the pathways that are affected

MetaCyc Notes

  • Reference database of enzymes and metabolic pathways
  • Mostly small molecule pathways (but updated versions add macromolecular metabolic pathways)
  • Tool PathoLogic uses to predict metabolic networks of organism with annotated genome files - generates Pathway/Genome databases - BioCyc stores the databases generated by SRI
  • Used to generate organism-specific pathway/genome databases
  • Curated from experimentally validated results/academic papers

Website Goals

  • Map
    • Data points are plotted to the map with latitude and longitude values
    • Want to be able to query by location, depth, other metadata (temperature, salinity, etc)
    • Clicking on a sample should pull up data on sample information, pathway information, etc. (Want some figures to make data visual - likely by metabolic category) and link out to MetaCyc information
  • Analysis Functionality
    • Want differential comparison of metabolic pathway activation for samples with given set of characteristics
    • Want to have a way to filter out pathways that are generally present everywhere
  • Interactive KEGG Atlas-like Visualization (if time permits)