You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the greyhound experiment pre-screens database signatures for matches that have "interesting" containment overlaps with the query, which can be a major optimization for not just downstream containment reporting but ALSO speeds up gather and search, because upper bounds on containment also constrain Jaccard and gather matches.
we could provide further options for optimization and parallelization by performing some kind of clustering, wherein we detect/discover/collect disjoint subsets of overlapped hashes and then run gather only on them.
A simpler version of this idea that would speed up gather (and is already implemented in greyhound, I suspect) would be to take the pre-screened matches and discard all hashes in the query that have no overlaps with any signatures with containment, as they will have no impact on any gather outputs.
The text was updated successfully, but these errors were encountered:
A simpler version of this idea that would speed up gather (and is already implemented in greyhound, I suspect) would be to take the pre-screened matches and discard all hashes in the query that have no overlaps with any signatures with containment, as they will have no impact on any gather outputs.
This is basically done in #1493 with the CounterGather functionality, just in a query dependent way. I'm closing for now, since it's not clear we need to speed gather up any more at this point 😆
the greyhound experiment pre-screens database signatures for matches that have "interesting" containment overlaps with the query, which can be a major optimization for not just downstream containment reporting but ALSO speeds up gather and search, because upper bounds on containment also constrain Jaccard and gather matches.
we could provide further options for optimization and parallelization by performing some kind of clustering, wherein we detect/discover/collect disjoint subsets of overlapped hashes and then run gather only on them.
A simpler version of this idea that would speed up gather (and is already implemented in greyhound, I suspect) would be to take the pre-screened matches and discard all hashes in the query that have no overlaps with any signatures with containment, as they will have no impact on any gather outputs.
The text was updated successfully, but these errors were encountered: