You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
per @luizirber, on running gather on a gigantic signature --
Right now it is spending all the time inside sum_abunds = sum(( orig_query_abunds[k] for k in orig_query_hashes)), because it is pulling each hash of the original query individually (all, err, billions of them?)
Digging into this code, it does this each iteration - and the only reason is because the original query may have been downsampled. Most times it won't need to be calculated more than once.
We should be easily able to refactor this code in one or two ways -
first, cache this by scaled value, maybe? that would be easy to do.
second, refactor this code out to a method on MinHash and (going further) then oxidize it.
This is the trend we're heading towards in #1512 and previous, too - move stuff away from Python and into MinHash.
Naively I wonder if this is or could be solved by a similar function to the one needed for #1463
The text was updated successfully, but these errors were encountered:
per @luizirber, on running gather on a gigantic signature --
Digging into this code, it does this each iteration - and the only reason is because the original query may have been downsampled. Most times it won't need to be calculated more than once.
We should be easily able to refactor this code in one or two ways -
MinHash
and (going further) then oxidize it.This is the trend we're heading towards in #1512 and previous, too - move stuff away from Python and into
MinHash
.Naively I wonder if this is or could be solved by a similar function to the one needed for #1463
The text was updated successfully, but these errors were encountered: