Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade sourmash_databases soon (for sourmash 4.0) #10

Closed
ctb opened this issue Jul 18, 2020 · 8 comments
Closed

upgrade sourmash_databases soon (for sourmash 4.0) #10

ctb opened this issue Jul 18, 2020 · 8 comments

Comments

@ctb
Copy link
Contributor

ctb commented Jul 18, 2020

ref sourmash-bio/sourmash#970

@luizirber
Copy link
Member

luizirber commented Jul 18, 2020

I'm working on #7 in #11, would also solve this one.

@luizirber
Copy link
Member

From sourmash-bio/sourmash#778: what bfsize should we use?

for refseq-archaea:
x1e6 = 135MB
x1e5 = 56MB
x1e4 = 31MB

(and I'm always using d=2, I'm not even putting that in the updated Snakefile)

@luizirber
Copy link
Member

Starting to think we should use 1e6, it is way faster...

@luizirber
Copy link
Member

luizirber commented Jul 20, 2020

ok, so 1e6 might be faster, but the bacteria databases are... very big. This is for refseq:
1e6: 26.5 GB
1e5: 11.7 GB
1e4: 7 GB
(and genbank will be even larger)

One alternative would be distributing 1e4 and instruct users on how to "upgrade" it to 1e6, but I don't think it would be upgraded very frequently (and people would just stick to 1e4 because it's ready to use).

Related discussion: sourmash-bio/sourmash#985 (comment) and the leaf-only databases, and run prepare before using it.

@luizirber
Copy link
Member

for refseq-archaea:
x1e6 = 135MB
x1e5 = 56MB
x1e4 = 31MB

I ran some gather queries with these ones:
1e6: 4m52s
1e5: 8m
1e4: 16m46s

so maybe sticking with 1e5 as a compromise between size and speed?

@luizirber
Copy link
Member

@ctb
Copy link
Contributor Author

ctb commented Oct 23, 2020

twitter feedback: https://twitter.com/ctitusbrown/status/1285262175204806658

hah! no feedback. sad.

@ctb
Copy link
Contributor Author

ctb commented May 1, 2022

closing in favor of sourmash-bio/sourmash#2015.

@ctb ctb closed this as completed May 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants