Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] workflow for gtdb-rs214 genomic #3

Merged
merged 14 commits into from
Apr 15, 2024
Merged

[MRG] workflow for gtdb-rs214 genomic #3

merged 14 commits into from
Apr 15, 2024

Conversation

bluegenes
Copy link
Contributor

@bluegenes bluegenes commented May 5, 2023

This PR adapts the genbank release process to use wort signatures to build databases.

Components:

  • README.md - instructions for running
  • environment.yml - file for conda environment
  • config.yml - configuration file, with all information specific to gtdb-rs214
  • make-gtdb-taxonomy.py- python file to build a taxonomic lineages file from GTDB's bacteria, archea metadata files. This script was written during a session with @mr-eyes @ccbaumler @jeanzzhao
  • Snakefile:
    • downloads GTDB metadata and builds taxonomic lineages file (including 'representative' information)
    • builds database zipfiles (with abundance) by checking for signatures in wort
  • releases.smk:
    • builds non-abundance zip, sbt, lca databases from the zipfiles

Notes:

@bluegenes
Copy link
Contributor Author

Resources used to build release databases from abundance zipfile.

Full

db ksize type h:m:s max_vms
gtdb-rs214 21 zip 0:56:07 13286.50
gtdb-rs214 31 zip 0:56:07 13255.00
gtdb-rs214 51 zip 0:56:11 13286.28
gtdb-rs214 21 lca 0:54:09 25892.92
gtdb-rs214 31 lca 0:53:48 28379.27
gtdb-rs214 51 lca 0:54:34 29132.74
gtdb-rs214 21 sbt 1:56:33 46175.95
gtdb-rs214 31 sbt 1:55:51 46084.80
gtdb-rs214 51 sbt 1:56:24 46200.57

Reps

db ksize type h:m:s max_vms
gtdb-rs214-reps 21 zip 0:10:15 2581.86
gtdb-rs214-reps 31 zip 0:10:19 2582.69
gtdb-rs214-reps 51 zip 0:10:16 2581.24
gtdb-rs214-reps 21 lca 0:11:21 7991.74
gtdb-rs214-reps 31 lca 0:11:58 11023.45
gtdb-rs214-reps 51 lca 0:12:09 11433.45
gtdb-rs214-reps 21 sbt 0:15:04 8962.50
gtdb-rs214-reps 31 sbt 0:15:17 9108.70
gtdb-rs214-reps 51 sbt 0:15:16 9097.23

@bluegenes bluegenes changed the title [WIP] Release gtdb-rs214 genomic [MRG] workflow for gtdb-rs214 genomic May 22, 2023
@bluegenes
Copy link
Contributor Author

ok @sourmash-bio/devs ready for review

I can also commit the full benchmark table here if desired.

bluegenes added a commit to sourmash-bio/sourmash that referenced this pull request May 23, 2023
@bluegenes bluegenes merged commit f831068 into main Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant