Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document new SBT zip file output. #973

Closed
ctb opened this issue Apr 29, 2020 · 3 comments · Fixed by #1283
Closed

document new SBT zip file output. #973

ctb opened this issue Apr 29, 2020 · 3 comments · Fixed by #1283
Labels
4.0 issues to address for a 4.0 release

Comments

@ctb
Copy link
Contributor

ctb commented Apr 29, 2020

per #648, if you ask sourmash index to produce a file with suffix .sbt.zip, it will put the SBT in a zip file. until 4.0, this won't be the default output, but it is extremely useful :)

A few details that might be worth mentioning -

  • you can speed up access by unzipping the SBT
  • it's not clear if the unzipped version takes up a lot more space, b/c the nodes (internal and signatures both) should all be compressed too, I think?
  • we should try to summarize the performance characteristics (detailed in [MRG] add ZipStorage, support loading tree from storage #648) in terms of space, time, and memory tradeoffs, vs the old SBT.

Anything else?

@luizirber
Copy link
Member

luizirber commented Apr 29, 2020

you can speed up access by unzipping the SBT

Need to measure it better, but doesn't seem to make much difference for now.

it's not clear if the unzipped version takes up a lot more space, b/c the nodes (internal and signatures both) should all be compressed too, I think?

Yup, should take about the same space.

we should try to summarize the performance characteristics (detailed in #648) in terms of space, time, and memory tradeoffs, vs the old SBT.

from #648 (comment)

Level Size Time
0 407 MB 16s
1 252 MB 21s
5 250 MB 39s
9 246 MB 1m48s

Anything else?

@luizirber luizirber added the 4.0 issues to address for a 4.0 release label Apr 29, 2020
@ctb
Copy link
Contributor Author

ctb commented Apr 30, 2020

see #975.

turns out our index documentation in the docs is ...terrible. We should have a whole section on database types, I guess.

@ctb
Copy link
Contributor Author

ctb commented Apr 30, 2020

(by terrible I mean nonexistent :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4.0 issues to address for a 4.0 release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants