Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UX principles around extremely large collections of signatures #1350

Closed
ctb opened this issue Feb 25, 2021 · 6 comments
Closed

UX principles around extremely large collections of signatures #1350

ctb opened this issue Feb 25, 2021 · 6 comments

Comments

@ctb
Copy link
Contributor

ctb commented Feb 25, 2021

Building off of zip file collections specifically and the new load_file_as_signatures code more generally, we should be considering how to improve the user experience associated with large collections of signatures. Off the top of my head,

  • it is not always possible to know in advance how big the collection of sequences is to be searched, or how much memory will be required for the search. This prevents progress bars and/or estimations. One solution is to enable "hints" and pre-caching in various ways (not just prefetch, but lighter weight things where caching some information after the first pass might be valuable - e.g. Would a "Directory" Index be useful? #810
    • could maybe add a collection-hint- generating command?
  • clearer indications around "we are still working!" and "hey this is a big file... might be a lot of time spent loading it..." could be useful
  • indications of what subset of signatures was selected and how (ksize, moltype, downsampling) would be helpful

I think the proper place to put stuff like this is mostly in sourmash_args which is the generic place for "utility functions for CLI stuff"

@ctb
Copy link
Contributor Author

ctb commented Feb 25, 2021

From an internal API perspective, we should be careful to not resolve generators and iterators to concrete lists unless we absolutely have to. Right now we do do that for the LinearIndex class in some places, for example.

Selectors #1072 can also be introduced consistently thruout.

@ctb
Copy link
Contributor Author

ctb commented Feb 28, 2021

Finally found the conversations on some of the underlying implementation issues around "lazy" and operating at scale --

Lazy loading LinearIndex?
Supporting --from-file in search and gather

@ctb
Copy link
Contributor Author

ctb commented Jun 19, 2021

picklists (now merged; #1587 and #1588) and especially manifests #1590 are really useful ways of interacting with large collections of signatures.

@ctb
Copy link
Contributor Author

ctb commented Jun 20, 2021

oh, interesting thought, manifest-containing Index objects could support progress bars... #1082

@ctb
Copy link
Contributor Author

ctb commented Mar 12, 2022

oh, interesting thought, manifest-containing Index objects could support progress bars... #1082

I keep on revisiting this in my head and realizing it's not a good idea, but then never writing it down.

Linking all of this to #1877, which is moving away from progress bars and iteration and towards generic interfaces that are also fast.

@ctb
Copy link
Contributor Author

ctb commented Mar 26, 2022

interestingly, the only remaining bit here is covered by #1426. Closing! 🎉

@ctb ctb closed this as completed Mar 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant