-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] add ZipStorage, support loading tree from storage #648
Conversation
Needs more testing, but it is slower than uncompressing the file and loading from the hidden dir for now |
bdcda6a
to
95dab12
Compare
95dab12
to
e7677c8
Compare
e7677c8
to
1e0889c
Compare
From #799 (comment):
|
1e0889c
to
9c8451f
Compare
d8d3683
to
f53ab67
Compare
Loading from a zipfile already works, and I implemented the same logic for tar files but it was waaay to slow (I tried with the current prepared DBs and... it just never finishes), so I removed it. Following what we did with previous SBT updates, I propose:
Still missing:
It is a bit weird to use the zip file just as containers, and do the compression/decompression outside it, but the main benefit is that even if the file is unzipped the |
b1d38bc
to
0e956ad
Compare
👍 |
0bff7d7
to
47aecfc
Compare
OK, time to bikeshed: I'm checking for |
sure :)
On Mon, Apr 27, 2020 at 11:46:14AM -0700, Luiz Irber wrote:
OK, time to bikeshed: I'm checking for `.zip` extensions for loading/saving indices, but should it be `.sbt.zip` instead? I think I prefer the latter because it has `.sbt` in the name.
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#648 (comment)
--
C. Titus Brown, [email protected]
|
75e1f78
to
667c95d
Compare
I got this to work manually by taking an already-existing SBT and zipping it up. Nice! Misc questions in initial conceptual review --
|
This is a nice side benefit, but it is slower. If you zip it up this way it's up to the Python
It find the ONLY
Was this before 667c95d? The name was just Up for discussion: default the index creation to zipped SBTs, if the name doesn't specify Also important: SBTs created with whatever this version of sourmash ends up being (
It's in the TODO list, together with rewriting the docs to use the new trees. Maybe add the zipped SBTs creation to this PR, but punt tutorials/docs changes to another PR (but pre
yup =] They are still saved uncompressed by default, but the |
I still want to go through some of the bits of code and tests and understand them, tho. |
(have you run this on a really big SBT? How well does it perform?) |
On Wed, Apr 29, 2020 at 08:36:14AM -0700, Luiz Irber wrote:
Better solution: create a new exception type, raise it and catch where appropriate (search/gather CLI, for example). For interactive uses it is horrible to raise `SystemExit(1)`...
yes. => new issue?
I would like to avoid the situation in `load_signatures`, tho: the `try` wrapping around the function makes it hard to debug what problems are happening during development...
100% agreed.
> I'm on board with bumping to v6, since the error message for v5 format change is bad :). In the release or commit notes let's make it clear somewhere that no v5 databases should exist in the wild...! We could wait until #925 is merged to do the bump, since that will probably be another incompatibility in created databases, right?
Done. I kept the `_load_v5`, but now defaulting to `v6`. I think the bump can be in this PR, it will be in master but not in any released version (and if you're using `master`, you pay the price of using `master`).
yep!
> when creating a zip file with `sourmash index`, the index file is named `ecoli.sbt.sbt.json` instead of `ecoli.sbt.json`, which is inconsistent.
Fixed, I saw it yesterday but had not updated the PR yet.
maybe we could add an explicit test for the name? that way we're aware of
when it changes and have that included in a test for diff/blame/changelog
purposes.
|
this is now raising
Added test, and a |
Looking pretty fully baked - I have two more files to dig into, but the functionality all seems there! Let me know if you want me to expedite the review. |
Almost, it was missing a test for Ready for final review and merge |
sourmash/sbt_storage.py
Outdated
# TODO: leave it open, or close/open every time? | ||
|
||
if path is None: | ||
# TODO: Open a temporary file? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to flag things here (raise an exception, or some such)? or just require path in the constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be interesting to do what is happening with the buffer now, where it is kept in memory, and throw a Warning
if the file is not open? But probably just make it required is easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks excellent to me! On or before merge, pls create issue to update docs and do other leftovers (e.g. for v4, make sbt.zip the default).
added the issues - now just gotta get the tests passing, I guess :) |
very nice work, @luizirber! |
Fixes #490 , closes #60
Checklist
make test
Did it pass the tests?make coverage
Is the new code covered?without a major version increment. Changing file formats also requires a
major version number increment.
changes were made?