Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

taxonomic classification skeleton with the RMQ preprocessing #346

Merged
merged 14 commits into from
Sep 6, 2021

Conversation

heracle
Copy link
Collaborator

@heracle heracle commented Jul 30, 2021

This PR draws the entire structure of the taxonomic classification approach (without CLI cmds):

The header script tax_classifier.hpp contains the description of all the taxonomic classification classes:

  • TaxonomyBase -> implements the methods and fields that will be used by both classification options (via anno matrix or via taxDB)
  • TaxonomyClsAnno -> implements the methods for via anno matrix option
  • TaxonomyClsImportDB -> implements the methods for via taxDB option

In this PR there are fully implemented the following methods that are used for the preprocessing steps:

  • label to taxid parsing (which is different for GenBank or Kraken2 sequence labels)
  • parsing the taxonomic tree file
  • parsing the accession version to taxid lookup table file
  • DFS and RMQ precomputation

@heracle heracle requested review from hmusta and karasikov July 30, 2021 19:57
@heracle
Copy link
Collaborator Author

heracle commented Jul 30, 2021

Please let me know if the structure of the classes and the files looks ok:

  • At this moment there is one class TaxonomyBase with the methods that will be used by both classification options. Then TaxonomyClsImportDB and TaxonomyClsAnno do a public inheritance.
  • There is a file label_to_taxid.cpp where there are implemented all the methods regarding the label parsing. I created a separate file just for the reason to be able to extend the enum LabelType by changing one file only

metagraph/src/annotation/taxonomy/tax_classifier.hpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.hpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.hpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.cpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.hpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.hpp Outdated Show resolved Hide resolved
@heracle heracle changed the base branch from mtg_taxclass_2 to master August 3, 2021 20:43
metagraph/src/annotation/taxonomy/label_to_taxid.cpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/label_to_taxid.cpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/label_to_taxid.cpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/label_to_taxid.cpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/label_to_taxid.cpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.hpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.cpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.cpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.cpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.cpp Outdated Show resolved Hide resolved
@heracle
Copy link
Collaborator Author

heracle commented Aug 5, 2021

Thank you very much for the reviews!

Copy link
Member

@karasikov karasikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please close resolved comments and also do a pass to fix all similar cases.

Signed-off-by: Radu Muntean <[email protected]>
@heracle
Copy link
Collaborator Author

heracle commented Aug 5, 2021

Cool. Please let me know about anything not proper. Usually, I am trying my best to pass through the entire PR and fix all the similar cases, but for some of them, maybe I am not generalizing too well yet

metagraph/tests/annotation/taxonomy/test_taxonomy.cpp Outdated Show resolved Hide resolved
metagraph/tests/annotation/taxonomy/test_taxonomy.cpp Outdated Show resolved Hide resolved
metagraph/tests/annotation/taxonomy/test_taxonomy.cpp Outdated Show resolved Hide resolved
metagraph/tests/annotation/taxonomy/test_taxonomy.cpp Outdated Show resolved Hide resolved
metagraph/tests/annotation/taxonomy/test_taxonomy.cpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.hpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.hpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.hpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.cpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.cpp Outdated Show resolved Hide resolved
Signed-off-by: Radu Muntean <[email protected]>
metagraph/src/annotation/taxonomy/tax_classifier.cpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.cpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.hpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.hpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.hpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.hpp Outdated Show resolved Hide resolved
Comment on lines 34 to 49
bool TaxonomyBase::get_taxid_from_label(const std::string &label, TaxId *taxid) const {
if (label_type == TAXID) {
*taxid = std::stoul(utils::split_string(label, "|")[1]);
return true;
} else if (TaxonomyBase::label_type == GEN_BANK) {
std::string acc_version = get_accession_version_from_label(label);
if (not accversion_to_taxid_map.count(acc_version)) {
return false;
}
*taxid = accversion_to_taxid_map.at(acc_version);
return true;
}

logger->error("Error: Could not get the taxid for label {}", label);
exit(1);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by these helpers.

You are setting label_type somewhere else, then it's used here. So it requires a certain sequence of these calls right?
What is the point to separate these functions then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will need to use this label_type from different enterpoints from the CLI cmd. Currently, it is computed in the TaxonomyClsAnno constructor and will be further used inside tax_class(sequence) method where we will need to iterate the column labels and take the associated taxids. If you think that it would be more clear, I can move it as part of the tax_class and run it multiple times (should be fast, I guess).

metagraph/src/annotation/taxonomy/tax_classifier.cpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.cpp Outdated Show resolved Hide resolved
Signed-off-by: Radu Muntean <[email protected]>
metagraph/src/annotation/taxonomy/tax_classifier.hpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.hpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.hpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.hpp Outdated Show resolved Hide resolved
}
}
if (num_taxid_failed) {
logger->warn("During the tax_tree_filepath {} parsing, {} taxids were not found out of {} total evaluations.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger->warn("During the tax_tree_filepath {} parsing, {} taxids were not found out of {} total evaluations.",
logger->warn("During the tax_tree_filepath {} parsing, {} taxids were not found out of {} total evaluations",

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding logger messages, I am not very sure in what context to add the period and when to remove it. Can you please list some small examples to understand how to write these in the future?

metagraph/src/annotation/taxonomy/tax_classifier.cpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.cpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.cpp Outdated Show resolved Hide resolved
Comment on lines 16 to 32
bool TaxonomyBase::get_taxid_from_label(const std::string &label, TaxId *taxid) const {
if (label_type_ == TAXID) {
*taxid = std::stoul(utils::split_string(label, "|")[1]);
return true;
} else if (label_type_ == GEN_BANK) {
std::string acc_version = get_accession_version_from_label(label);
auto it = accversion_to_taxid_map_.find(acc_version);
if (it == accversion_to_taxid_map_.end()) {
return false;
}
*taxid = it->second;
return true;
}

logger->error("Error: Could not get the taxid for label {}", label);
exit(1);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this called?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_taxid_from_label will be used in the next PR. I implemented the entire .hpp file in this PR and wanted to add a reasoning about why/how to use the enum LabelType

Signed-off-by: Radu Muntean <[email protected]>
metagraph/src/annotation/taxonomy/tax_classifier.cpp Outdated Show resolved Hide resolved
metagraph/tests/annotation/taxonomy/test_taxonomy.cpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.hpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.hpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.hpp Outdated Show resolved Hide resolved
metagraph/src/annotation/taxonomy/tax_classifier.hpp Outdated Show resolved Hide resolved
Signed-off-by: Radu Muntean <[email protected]>
@heracle heracle mentioned this pull request Sep 1, 2021
@karasikov karasikov merged commit 1609e94 into master Sep 6, 2021
@karasikov karasikov deleted the mtg_taxclass_3 branch September 6, 2021 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants