Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvements in column aggregation #314

Merged
merged 6 commits into from
Apr 28, 2021
Merged

improvements in column aggregation #314

merged 6 commits into from
Apr 28, 2021

Conversation

karasikov
Copy link
Member

No description provided.

@karasikov karasikov requested a review from hmusta April 16, 2021 10:03
@@ -432,8 +436,8 @@ Config::Config(int argc, char *argv[]) {
print_usage_and_exit = true;
}

if (intersect_ratio < 0 || intersect_ratio > 1) {
std::cerr << "Error: intersection ratio must be in range [0, 1]"
if (min_fraction < 0 || min_fraction > 1 || max_fraction < 0 || max_fraction > 1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should also be a check here to make sure that min_count > 0, otherwise the returned mask will be all ones

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that should be possible. Why not.

By default, min_count = 1, so it will be 0 only if you specifically set it so.

Comment on lines 490 to 494
for (uint64_t i = 0; i < sum.size(); ++i) {
if (sum[i] >= min_cols) {
intersection[i] = true;
if (sum[i] >= min_cols && sum[i] <= max_cols) {
mask[i] = true;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use call_nonzeros from vector_algorithm.hpp to speed this up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, not sure that makes sense. It's just a single pass, relative to summing up all columns, this should be cheap.
Also, this won't work for zeros, so I'll have to write a special case for min_cols = 0

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, if the user sets min_cols = 0 and max_cols = inf, then there's no need to even read the annotators and compute sum, we can just initialize mask to be all ones

metagraph/src/cli/transform_annotation.cpp Show resolved Hide resolved
metagraph/src/cli/config/config.cpp Outdated Show resolved Hide resolved
@karasikov karasikov requested a review from hmusta April 21, 2021 14:40
Comment on lines 490 to 494
for (uint64_t i = 0; i < sum.size(); ++i) {
if (sum[i] >= min_cols) {
intersection[i] = true;
if (sum[i] >= min_cols && sum[i] <= max_cols) {
mask[i] = true;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, if the user sets min_cols = 0 and max_cols = inf, then there's no need to even read the annotators and compute sum, we can just initialize mask to be all ones

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants