Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaxIDs and integration of prokaryotic and eukaryotic information #7

Open
jaygut opened this issue Oct 12, 2021 · 1 comment
Open

TaxIDs and integration of prokaryotic and eukaryotic information #7

jaygut opened this issue Oct 12, 2021 · 1 comment

Comments

@jaygut
Copy link

jaygut commented Oct 12, 2021

Hello,

I've been exploring Struo2 and found some pretty cool improvements w.r.t. Struo1. Unfortunately, the fact that this pipeline is meant essentially for prokaryotes makes me wonder whether this could also be straightforwardly applied using eukaryotic genome information. In the past, I tried to integrate both prokaryotic/eukaryotic genome information for marine plankton communities but I ended up getting a bunch of errors. I was wondering whether you might suggest a way to bypass the need of a GTDB taxonomy, and instead run Struo2 using NCBI taxonomy information. I must note here that I'm planning to achieve this as follows:

  1. Run Struo2 on a bunch of prokaryotic genomes to generate HumanN3-compatible database.
  2. Update DB using eukaryotic gene sequences predicted via BUSCO+Augustus.

Any hints on how to best achieve this using Struo2?

Any feedback would be greatly appreciated!

@nick-youngblut
Copy link
Contributor

The main challenges are integrating the gene data generated via BUSCO+Augustus and creating a hybrid taxonomy. https://github.com/nick-youngblut/gtdb_to_taxdump can possibly help with the taxonomy. I don't have experience with BUSCO or Augustus, so I'd have to see if the output can be formatted to conform with the existing pipeline.

Creating a hybrid kraken database isn't so hard, given that it does not require gene calling. So, one just needs to provide all genomes (bacteria, eukaryotes, etc) and a complete taxonomy (taxdump).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants