-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improvements in column aggregation #314
Conversation
@@ -432,8 +436,8 @@ Config::Config(int argc, char *argv[]) { | |||
print_usage_and_exit = true; | |||
} | |||
|
|||
if (intersect_ratio < 0 || intersect_ratio > 1) { | |||
std::cerr << "Error: intersection ratio must be in range [0, 1]" | |||
if (min_fraction < 0 || min_fraction > 1 || max_fraction < 0 || max_fraction > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should also be a check here to make sure that min_count > 0
, otherwise the returned mask will be all ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that should be possible. Why not.
By default, min_count = 1
, so it will be 0 only if you specifically set it so.
for (uint64_t i = 0; i < sum.size(); ++i) { | ||
if (sum[i] >= min_cols) { | ||
intersection[i] = true; | ||
if (sum[i] >= min_cols && sum[i] <= max_cols) { | ||
mask[i] = true; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use call_nonzeros
from vector_algorithm.hpp
to speed this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, not sure that makes sense. It's just a single pass, relative to summing up all columns, this should be cheap.
Also, this won't work for zeros, so I'll have to write a special case for min_cols = 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, if the user sets min_cols = 0
and max_cols = inf
, then there's no need to even read the annotators and compute sum
, we can just initialize mask
to be all ones
for (uint64_t i = 0; i < sum.size(); ++i) { | ||
if (sum[i] >= min_cols) { | ||
intersection[i] = true; | ||
if (sum[i] >= min_cols && sum[i] <= max_cols) { | ||
mask[i] = true; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, if the user sets min_cols = 0
and max_cols = inf
, then there's no need to even read the annotators and compute sum
, we can just initialize mask
to be all ones
No description provided.