-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
taxonomic classification skeleton with the RMQ preprocessing #346
Conversation
Please let me know if the structure of the classes and the files looks ok:
|
Signed-off-by: Radu Muntean <[email protected]>
Signed-off-by: Radu Muntean <[email protected]>
Signed-off-by: Radu Muntean <[email protected]>
Signed-off-by: Radu Muntean <[email protected]>
Signed-off-by: Radu Muntean <[email protected]>
Signed-off-by: Radu Muntean <[email protected]>
Thank you very much for the reviews! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please close resolved comments and also do a pass to fix all similar cases.
Signed-off-by: Radu Muntean <[email protected]>
Cool. Please let me know about anything not proper. Usually, I am trying my best to pass through the entire PR and fix all the similar cases, but for some of them, maybe I am not generalizing too well yet |
Signed-off-by: Radu Muntean <[email protected]>
bool TaxonomyBase::get_taxid_from_label(const std::string &label, TaxId *taxid) const { | ||
if (label_type == TAXID) { | ||
*taxid = std::stoul(utils::split_string(label, "|")[1]); | ||
return true; | ||
} else if (TaxonomyBase::label_type == GEN_BANK) { | ||
std::string acc_version = get_accession_version_from_label(label); | ||
if (not accversion_to_taxid_map.count(acc_version)) { | ||
return false; | ||
} | ||
*taxid = accversion_to_taxid_map.at(acc_version); | ||
return true; | ||
} | ||
|
||
logger->error("Error: Could not get the taxid for label {}", label); | ||
exit(1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by these helpers.
You are setting label_type
somewhere else, then it's used here. So it requires a certain sequence of these calls right?
What is the point to separate these functions then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will need to use this label_type
from different enterpoints from the CLI cmd. Currently, it is computed in the TaxonomyClsAnno
constructor and will be further used inside tax_class(sequence)
method where we will need to iterate the column labels and take the associated taxids. If you think that it would be more clear, I can move it as part of the tax_class
and run it multiple times (should be fast, I guess).
Signed-off-by: Radu Muntean <[email protected]>
2df6ca8
to
18c0fe0
Compare
} | ||
} | ||
if (num_taxid_failed) { | ||
logger->warn("During the tax_tree_filepath {} parsing, {} taxids were not found out of {} total evaluations.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger->warn("During the tax_tree_filepath {} parsing, {} taxids were not found out of {} total evaluations.", | |
logger->warn("During the tax_tree_filepath {} parsing, {} taxids were not found out of {} total evaluations", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding logger messages, I am not very sure in what context to add the period and when to remove it. Can you please list some small examples to understand how to write these in the future?
bool TaxonomyBase::get_taxid_from_label(const std::string &label, TaxId *taxid) const { | ||
if (label_type_ == TAXID) { | ||
*taxid = std::stoul(utils::split_string(label, "|")[1]); | ||
return true; | ||
} else if (label_type_ == GEN_BANK) { | ||
std::string acc_version = get_accession_version_from_label(label); | ||
auto it = accversion_to_taxid_map_.find(acc_version); | ||
if (it == accversion_to_taxid_map_.end()) { | ||
return false; | ||
} | ||
*taxid = it->second; | ||
return true; | ||
} | ||
|
||
logger->error("Error: Could not get the taxid for label {}", label); | ||
exit(1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_taxid_from_label
will be used in the next PR. I implemented the entire .hpp
file in this PR and wanted to add a reasoning about why/how to use the enum LabelType
8922b47
to
6bd82bf
Compare
Signed-off-by: Radu Muntean <[email protected]>
6bd82bf
to
249129e
Compare
Signed-off-by: Radu Muntean <[email protected]>
73ddf85
to
ff33e52
Compare
Signed-off-by: Radu Muntean <[email protected]>
584dd59
to
bc22391
Compare
This PR draws the entire structure of the taxonomic classification approach (without CLI cmds):
The header script
tax_classifier.hpp
contains the description of all the taxonomic classification classes:TaxonomyBase
-> implements the methods and fields that will be used by both classification options (via anno matrix or via taxDB)TaxonomyClsAnno
-> implements the methods forvia anno matrix
optionTaxonomyClsImportDB
-> implements the methods forvia taxDB
optionIn this PR there are fully implemented the following methods that are used for the preprocessing steps: