Skip to content

CobiontID/CobiontID.github.io

Repository files navigation

CobiontID

This repository provides an overview of the pipelines and tools developed to identify cobionts in the Tree of Life Programme. See https://cobiontid.github.io/ for more information!

Software

Standalone tools

Tool Description Application Language
kmer-counter Fast k-mer counter for large read sets Get tetranucleotide counts Rust
unique-kmers Count distinct k-mers in sequences Calculate k-mer diversity Rust
hexamer Detect likely coding regions Estimate coding density C
fastk-medians Calculate median number of times each large k-mer in a sequence occurs across the set (modified version of Profex from the original FASTK library) Approximate k-mer coverage C

Dashboard

A demo of the interactive dashboard to explore read sets is available here. You can also try running the demo on Gitpod. A colab notebook with a more limited feature set and instructions is available here.

Associated publications

Disentangling Cobionts and Contamination in Long-Read Genomic Data using Sequence Composition https://academic.oup.com/g3journal/advance-article/doi/10.1093/g3journal/jkae187/7734044

Kudoa genomes from contaminated hosts reveal extensive gene order conservation and rapid sequence evolution https://www.biorxiv.org/content/10.1101/2024.11.01.621499v1

Phylogenomic analysis of Wolbachia genomes from the Darwin Tree of Life biodiversity genomics project https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001972

MarkerScan: Separation and assembly of cobionts sequenced alongside target species in biodiversity genomics projects https://doi.org/10.12688/wellcomeopenres.20730.1