Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Enable moltypes other than DNA in LCA databases #1013

Merged
merged 22 commits into from
Jun 12, 2020
Merged

Conversation

ctb
Copy link
Contributor

@ctb ctb commented Jun 6, 2020

This PR adds support for protein, dayhoff, and hp signatures to LCA_Database and sourmash lca index. Things now work as expected on the command line, although the default scaled and ksize for sourmash lca index don't really make sense and we might want to change them in some future version.

Fixes #947

Also:

Tests to write:

  • sig describe moltype stuff
  • LCA_Database insert with wrong moltype
  • LCA_Database insert with return val

  • Is it mergeable?
  • make test Did it pass the tests?
  • make coverage Is the new code covered?
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Was a spellchecker run on the source code and documentation after
    changes were made?

@ctb ctb changed the title Enable moltypes other than DNA in LCA databases [WIP] Enable moltypes other than DNA in LCA databases Jun 6, 2020
@codecov
Copy link

codecov bot commented Jun 6, 2020

Codecov Report

Merging #1013 into master will increase coverage by 0.07%.
The diff coverage is 98.11%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1013      +/-   ##
==========================================
+ Coverage   92.23%   92.31%   +0.07%     
==========================================
  Files          72       72              
  Lines        5397     5413      +16     
==========================================
+ Hits         4978     4997      +19     
+ Misses        419      416       -3     
Impacted Files Coverage Δ
sourmash/lca/lca_db.py 94.62% <95.45%> (+0.28%) ⬆️
sourmash/cli/lca/index.py 100.00% <100.00%> (ø)
sourmash/lca/command_index.py 90.37% <100.00%> (+0.21%) ⬆️
sourmash/sig/__main__.py 93.37% <100.00%> (+0.66%) ⬆️
sourmash/sourmash_args.py 95.00% <100.00%> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 94e3244...59a6fb8. Read the comment docs.

@ctb
Copy link
Contributor Author

ctb commented Jun 6, 2020

@bluegenes could you review this, please? thanks!

@ctb ctb changed the title [WIP] Enable moltypes other than DNA in LCA databases [MRG] Enable moltypes other than DNA in LCA databases Jun 6, 2020
@ctb
Copy link
Contributor Author

ctb commented Jun 6, 2020

Yay! Tests passed!

Copy link
Contributor

@bluegenes bluegenes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the ksize, looks good to me!

k=19 gives 19/3=6.33, which rounds down to a k=6. It's fine for testing, but might lead to a bit of confusion if folks look to the tests to figure out how to run something (or that might just be me).

sig1 = sourmash.load_one_signature(sigfile1)
sig2 = sourmash.load_one_signature(sigfile2)

db = sourmash.lca.LCA_Database(ksize=19, scaled=100, moltype='protein')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for soumash 3.x, protein kmers should be divisible by 3. Here, 19/3 = 6.3, so the actual ksize being used is k= 6.

We should maybe reimplement the check that #575/ #576 removed?

@ctb
Copy link
Contributor Author

ctb commented Jun 12, 2020

thanks @bluegenes! I changed the ksize=19 to ksize=57 in 20d5cf5 and 59a6fb8 - will merge if / when the tests pass!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants