Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sourmash 3.x protein ksizes must be divisible by 3 #1019

Closed
wants to merge 24 commits into from
Closed

Conversation

bluegenes
Copy link
Contributor

@bluegenes bluegenes commented Jun 9, 2020

When I wrote #575 / #576 , I thought that protein sigs were calculated at their exact ksize, except when translating (I forgot about this PR until recently). So #576 eliminated a check for the protein ksize being divisible by 3, allowing you to calculate signatures with k=19, which would evaluate to 19/3=6.33, giving a protein ksize of 6.

While this is not terrible, I think disabling these non-divisible (to integers) ksizes is informative for what is happening under the hood, and is thus clearer.

This PR simply reverses what was changed in #576, and removes the unnecessary (and now failing) test. If we change protein ksize calculation in sourmash 4.x, we can add these right back in.

  • Is it mergeable?
  • make test Did it pass the tests?
  • make coverage Is the new code covered?
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Was a spellchecker run on the source code and documentation after
    changes were made?

@bluegenes bluegenes changed the title 3.x protein ksizes must be divisible by 3 sourmash 3.x protein ksizes must be divisible by 3 Jun 9, 2020
@codecov
Copy link

codecov bot commented Jun 9, 2020

Codecov Report

Merging #1019 into master will decrease coverage by 0.37%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1019      +/-   ##
==========================================
- Coverage   92.31%   91.94%   -0.38%     
==========================================
  Files          72       72              
  Lines        5413     5412       -1     
==========================================
- Hits         4997     4976      -21     
- Misses        416      436      +20     
Impacted Files Coverage Δ
sourmash/command_compute.py 97.63% <100.00%> (ø)
sourmash/_compat.py 52.17% <0.00%> (-47.83%) ⬇️
sourmash/signature.py 88.42% <0.00%> (-2.32%) ⬇️
sourmash/lca/lca_utils.py 94.25% <0.00%> (-2.30%) ⬇️
sourmash/lca/command_index.py 89.83% <0.00%> (-0.54%) ⬇️
sourmash/sbt.py 87.34% <0.00%> (-0.28%) ⬇️
sourmash/sbt_storage.py 92.57% <0.00%> (+0.45%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8503b10...9c46e9c. Read the comment docs.

@bluegenes bluegenes requested a review from ctb June 9, 2020 13:51
@ctb
Copy link
Contributor

ctb commented Jun 9, 2020

hi @bluegenes could you explain the rationale for this a little bit more in the main PR text?

@bluegenes
Copy link
Contributor Author

@ctb done!

Base automatically changed from protein_lca_db to master June 12, 2020 14:22
@ctb
Copy link
Contributor

ctb commented Jun 14, 2020

hi @bluegenes as described I merged #1013 after switching ksizes over to 57. I then merged that master back into this PR, and wanted to double check with you that this all looks right.

I actually just double checked myself, and it looks like commit 3ec04b5 is now all that's left on this PR, which seems appropriate. Yay!

Instead of removing the test that now fails, test_do_sourmash_compute_multik_protein_input_non_div3_ksize, what do you think about changing it so that it works correctly? I think this would involve asserting that the result of runscript is failure (a nonzero exit status).

@bluegenes
Copy link
Contributor Author

@ctb - done! test mirrors the "bad ksize" test for nucleotide input.

tests/test_sourmash_compute.py Outdated Show resolved Hide resolved
@ctb
Copy link
Contributor

ctb commented Jun 15, 2020

great thx! one suggested change.

@ctb
Copy link
Contributor

ctb commented Jun 15, 2020

Before merging, I need to think about whether this violates our command-line interface semantic versioning guarantee :)

@ctb
Copy link
Contributor

ctb commented Jun 16, 2020

I think this may be a specific example of an API-breaking change that is ready to merge but would break 3.x! How exciting :)

Briefly, this check was removed several versions ago (December 2019), which doesn't violate semantic versioning requirements, since making command line checks more liberal is fine. To add it back in could, in theory, break people's workflows. Ugh!

So I'll put this in cold storage for a little bit while we settle into our 4.0-targeted groove viz #1016.

@ctb
Copy link
Contributor

ctb commented Jun 16, 2020

(i.e. please don't merge :)

@ctb
Copy link
Contributor

ctb commented Feb 7, 2021

merged in #1277.

@ctb ctb closed this Feb 7, 2021
@ctb ctb deleted the div-3 branch August 20, 2022 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants