-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MRG: provide --internal-storage
and --no-internal-storage
for index
#390
Conversation
…water into rocksdb_location
…_branchwater into rocksdb_location
(this comment is now outdated, but I'm leaving it in for posterity and reference ;) With the latest trace statements, I find the problem (as expected ;) here, in the
Conveniently 😭 this is not a new bug or problem, this is the same problem that I discussed ad nauseum over in sourmash-bio/sourmash#3008, where we found it to be a problem for manifests - triggered initially by sourmash-bio/sourmash#3053. In brief, the problem is this: when we refer to an external storage, how do we interpret non-absolute paths? As I wrote in #3008, "I am slowly coming around to the idea that loading things relative to the manifest path is correct." So I think the Right Fix would be to rejigger the path to the zip file to be interpreted relative to the RocksDB location. However, there is another fun component to this, which is that I'm not sure it's documented anywhere that RocksDB indices created by this plugin store the sketches externally, which is needed for gather (but not for manysearch)... Per @luizirber on slack, me:
So anyway I think maybe the default for branchwater plugin should be to build a self-contained index. I'm going to dig into the code a bit more to see if that's possible. |
…water into rocksdb_location
There's another wrinkle that I'm having trouble sorting out - it's not clear to me that fastmultigather on the RocksDB index is actually using the external sketches? For the FS storage, I can remove the sketches and it still works 🤔 . |
OK, I was wrong. It looks like The real difference is that ZipStorage errors out on load, while FSStorage errors out as soon as it tries to do anything. So |
Even more fun -
See commit 310df3d for an example test. |
from slack - luiz says:
|
--internal-storage
and --no-internal-storage
for index
…water into rocksdb_location
@bluegenes could you take a look at this and see if it all makes sense to you? It's ready for merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense to me!
Provides command-line access to the sourmash v0.15
internalize_storage()
function added in sourmash-bio/sourmash#3250, viaindex --internal-storage
(defaults to True).Documentation is separately updated in #416.
This PR also:
.rdb
to.rocksdb
internally.Punted to other issues:
Behavior that is not tested:
--internal-storage
is used, and will it still work?