Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] update utils for building databases #18

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

bluegenes
Copy link
Contributor

Joint effort to update methods for building databases

@mr-eyes @ccbaumler

@ccbaumler
Copy link

Here are some notes I took -> https://hackmd.io/RkYRWP8gS1Kad0oENNJc7Q?both

The most relevant would be the new direction database updates are being handled:

  1. get two metadata files (ar and bac) from https://data.gtdb.ecogenomic.org/releases/release214/214.0/
  2. use the updated make-gtdb-taxonomy.py (which was renamed to get-gtdb-release-diff.py) to create a csv file containing ident (GCA_#) and taxonomy
  3. compare with previous database using ident columns this is now also done with get-gtdb-release-diff.py script
    • /group/ctbrowngrp/sourmash-db/gtdb-rs207/gtdb-res207.taxonomy.csv
    • wort manifest
  4. Use snakefile download-sketch.smk on the new accessions that need sketching
  5. ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants