This repository provides an overview of the pipelines and tools developed to identify cobionts in the Tree of Life Programme. See https://cobiontid.github.io/ for more information!
Tool | Description | Application | Language |
---|---|---|---|
kmer-counter | Fast k-mer counter for large read sets | Get tetranucleotide counts | Rust |
unique-kmers | Count distinct k-mers in sequences | Calculate k-mer diversity | Rust |
hexamer | Detect likely coding regions | Estimate coding density | C |
fastk-medians | Calculate median number of times each large k-mer in a sequence occurs across the set (modified version of Profex from the original FASTK library) | Approximate k-mer coverage | C |
A demo of the interactive dashboard to explore read sets is available here. You can also try running the demo on Gitpod. A colab notebook with a more limited feature set and instructions is available here.
Disentangling Cobionts and Contamination in Long-Read Genomic Data using Sequence Composition https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkae187/7734044
Kudoa genomes from contaminated hosts reveal extensive gene order conservation and rapid sequence evolution https://www.biorxiv.org/content/10.1101/2024.11.01.621499v1
Phylogenomic analysis of Wolbachia genomes from the Darwin Tree of Life biodiversity genomics project https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001972
MarkerScan: Separation and assembly of cobionts sequenced alongside target species in biodiversity genomics projects https://doi.org/10.12688/wellcomeopenres.20730.1