Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide a generic enumerator that works on signatures, not input files #1082

Closed
ctb opened this issue Jul 5, 2020 · 5 comments
Closed

provide a generic enumerator that works on signatures, not input files #1082

ctb opened this issue Jul 5, 2020 · 5 comments

Comments

@ctb
Copy link
Contributor

ctb commented Jul 5, 2020

e.g. when running sourmash index on a database with 100,000 signatures, it would be nice to get a progress report 😂

can probably be added to load_file_as_signatures

@ctb
Copy link
Contributor Author

ctb commented Jul 5, 2020

it would be nice to be able to report % of total signatures reported, where that information is known - for SBTs and LCA DBs, it should be possible to know it as soon as the file is loaded.

@ctb
Copy link
Contributor Author

ctb commented Jul 18, 2020

the bit of this that wasn't fixed in #1083 is the actual index behavior - need to test it on challenging sourmash index and sourmash lca index data sets, with errors and stuff...

@ctb
Copy link
Contributor Author

ctb commented Jun 26, 2021

with manifests, this can be made much nicer now, too - we can indicate how many total signatures there are, even after (especially after?) selection criteria!

also I think this is quite relevant to UX principles around extremely large collections of signatures #1350

@ctb
Copy link
Contributor Author

ctb commented Mar 12, 2022

still need to big-brain this but in recent months we've moved towards just using manifests for everything possible. one outcome of this, along with the pattern matching code in #1871, is that we tend not to iterate slowly over many signatures, but rather iterate quickly over manifests.

In addition, there are reasons that I can't articulate at the moment 😆 where a lot of our search functionality doesn't lend itself to progress bars any more - like, how would that work on LCA and SBT databases anyway? will revisit next time I'm thinking clearly about it.

relevant - #1877 - where I'm thinking that the signature-focused loading/iteration code will actually go away.

@ctb
Copy link
Contributor Author

ctb commented Mar 26, 2022

In addition, there are reasons that I can't articulate at the moment 😆 where a lot of our search functionality doesn't lend itself to progress bars any more - like, how would that work on LCA and SBT databases anyway? will revisit next time I'm thinking clearly about it.

ok, yes, the problem is that SBTs and LCAs don't really have "progress" in searching as such. There's no simple way to estimate what fraction of the database has been searched.

I'm going to close this now, in favor of other issues - #1426 and #1877.

@ctb ctb closed this as completed Mar 26, 2022
@ctb ctb mentioned this issue May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant