Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deprecate sequence_alignment() and replace deprecated Bio.pairwise2 with Bio.Align.PairwiseAligner #3951

Merged
merged 5 commits into from
Dec 8, 2022

Conversation

orbeckst
Copy link
Member

@orbeckst orbeckst commented Dec 7, 2022

Fixes #3950

Changes made in this Pull Request:

  • deprecate sequence_alignment() and schedule for removal in 3.0
  • engineered return tuple as namedtuple to functionally match the originally returned pairwise2.Alignment; we only documented a tuple as return type anyway
  • increase biopython required version to >= 1.80
  • update CHANGELOG

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

- fix #3950
- engineered return tuple as namedtuple to functionally match
  the originally returned pairwise2.Alignment; we only documented
  a tuple as return type anyway
- update CHANGELOG
Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the existing tests are sufficient, lgtm! (one missing empty line though :P )

package/MDAnalysis/analysis/align.py Show resolved Hide resolved
@orbeckst
Copy link
Member Author

orbeckst commented Dec 7, 2022

Since biopython 1.80 the output format changed so my hacky way of getting the aligned sequences with gaps is not working anymore, hence the tests fail.

@IAlibay
Copy link
Member

IAlibay commented Dec 7, 2022

Since biopython 1.80 the output format changed so my hacky way of getting the aligned sequences with gaps is not working anymore, hence the tests fail.

:/ just 1.80? I.e. do we need to support both return types?

Copy link
Member Author

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not use sequence_alignment() anywhere. Possibly deprecate and remove for 3.0. Might be better as an example in the User Guide anyway. Also, @IAlibay is on a crusade to remove biopython (#3820).

If we ever implement an "automatic" RMSD alignment then we can bring something like it back.

# extract sequences (there's no obvious way to get the character
# representation with gaps by other means from the new
# Bio.Align.PairwiseAlignment instance)
seqA, _, seqB, _ = topalignment.format().split("\n")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only worked until Bio 1.79. Then they changed the format.

I knew that parsing str was iffy but luckily our tests showed me right away how iffy that was.

I am now trying to find another way to get our old output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out, it's trivial

seqA = topalignment[0]
seqB = topalignment[1]

... testing now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs biopython >= 1.80.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we'll need to offer both options and just do a version check?

@orbeckst
Copy link
Member Author

orbeckst commented Dec 7, 2022

@IAlibay please review again.

Sorry for the extra work. When I tested it locally it worked because I had biopython 1.79 locally installed. Only when I updated to 1.80 (as in our CI) it broke. The changes should make it more robust because I think I am using the proper biopython API to get the sequence but it requires the most recent one >= 1.80 (which should be available for Python >= 3.8).

@IAlibay
Copy link
Member

IAlibay commented Dec 7, 2022

@orbeckst do we have any idea what the minimum dependency versions for Biopython are? I'm apprehensive about having a core dependency that's forced to use the latest release since that can clash with a lot of older deps versions.

@IAlibay
Copy link
Member

IAlibay commented Dec 7, 2022

The conda-forge feedstock uses pin_compatible numpy so that might be ok? (that's currently set to nep29-ish)

(The docs show how one should write code if/when we deprecated this function.)
@orbeckst
Copy link
Member Author

orbeckst commented Dec 8, 2022

do we have any idea what the minimum dependency versions for Biopython are

I tried my code with biopython 1.79 and it failed. For the above to work, it has to be >= 1.80.

@orbeckst
Copy link
Member Author

orbeckst commented Dec 8, 2022

Alternatively, we outright deprecate sequence_alignment() and hope that we remove it before biopython removes pairwise2.

@orbeckst
Copy link
Member Author

orbeckst commented Dec 8, 2022

Looking at biopython's setup.py https://github.com/biopython/biopython/blob/0d763824ab5ab5fd447fe2589b209e7c64a54b50/setup.py#L112 , they only depend on numpy, any version. Their ci-dependencies.txt is also non-specific.

@codecov
Copy link

codecov bot commented Dec 8, 2022

Codecov Report

Base: 94.38% // Head: 94.38% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (cd55591) compared to base (3712439).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #3951   +/-   ##
========================================
  Coverage    94.38%   94.38%           
========================================
  Files          194      194           
  Lines        25242    25248    +6     
  Branches      3493     3493           
========================================
+ Hits         23825    23831    +6     
  Misses        1368     1368           
  Partials        49       49           
Impacted Files Coverage Δ
package/MDAnalysis/analysis/align.py 98.35% <100.00%> (+0.03%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@orbeckst
Copy link
Member Author

orbeckst commented Dec 8, 2022

Not sure where these failures in window come from

mdanalysis-CI (Azure_Tests Win-Python38-32bit-full) fails TestH5MDWriterBaseAPI.test_write_trajectory_universe

____________ TestH5MDWriterBaseAPI.test_write_trajectory_universe _____________
[gw0] win32 -- Python 3.8.10 C:\hostedtoolcache\windows\Python\3.8.10\x86\python.exe

self = <MDAnalysisTests.coordinates.test_h5md.TestH5MDWriterBaseAPI object at 0x10D5B148>
                            velocities=True, forces=True) as w:
>               for ts in universe.trajectory:

D:\a\1\s\testsuite\MDAnalysisTests\coordinates\test_h5md.py:135: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
C:\hostedtoolcache\windows\Python\3.8.10\x86\lib\site-packages\mdanalysis-2.4.0.dev0-py3.8-win32.egg\MDAnalysis\coordinates\base.py:795: in __iter__
    self._reopen()
C:\hostedtoolcache\windows\Python\3.8.10\x86\lib\site-packages\mdanalysis-2.4.0.dev0-py3.8-win32.egg\MDAnalysis\coordinates\H5MD.py:747: in _reopen
    self.close()
C:\hostedtoolcache\windows\Python\3.8.10\x86\lib\site-packages\mdanalysis-2.4.0.dev0-py3.8-win32.egg\MDAnalysis\coordinates\H5MD.py:728: in close
    self._file.close()
C:\hostedtoolcache\windows\Python\3.8.10\x86\lib\site-packages\h5py\_hl\files.py:435: in close
    file_list = [x for x in file_list if h5i.get_file_id(x).id == self.id.id]
C:\hostedtoolcache\windows\Python\3.8.10\x86\lib\site-packages\h5py\_hl\files.py:435: in <listcomp>
    file_list = [x for x in file_list if h5i.get_file_id(x).id == self.id.id]
h5py\_objects.pyx:54: in h5py._objects.with_phil.wrapper
    ???
h5py\_objects.pyx:55: in h5py._objects.with_phil.wrapper
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   ValueError: Not an id of a file object (not an ID of a file object)

h5py\h5i.pyx:114: ValueError

I am letting it re-run, maybe that's a transient issue.

@IAlibay
Copy link
Member

IAlibay commented Dec 8, 2022

Not sure where these failures in window come from

That's a new one, but I've seen similar transient h5py win32 issues lately. It's been a long while since wheels for win32 got released, honestly we should just kill it soon (definitely for v3.0).

@orbeckst
Copy link
Member Author

orbeckst commented Dec 8, 2022

Do you want to deprecate sequence_alignment() and remove in 3.0? I can add the deprecation in this PR.

@IAlibay
Copy link
Member

IAlibay commented Dec 8, 2022

Do you want to deprecate sequence_alignment() and remove in 3.0? I can add the deprecation in this PR.

If it's likely not being used much, then I'd be happy for this to happen.

@orbeckst
Copy link
Member Author

orbeckst commented Dec 8, 2022

I am pretty sure I added it a long time ago, in the times of "batteries included is better". It would be fitting if I removed it again ;-). I doubt it's used a lot and in any case, it's 3 lines of code. I'll add deprecations and raise an issue for the UG.

- add deprecation
- test for deprecation warning
- combined tests for sequence_alignment in a class for easier removal
- update CHANGELOG
@orbeckst orbeckst changed the title replaced deprecated Bio.pairwise2 with Bio.Align.PairwiseAligner deprecate sequence_alignment() and replace deprecated Bio.pairwise2 with Bio.Align.PairwiseAligner Dec 8, 2022
@orbeckst orbeckst requested a review from IAlibay December 8, 2022 06:41
testsuite/MDAnalysisTests/analysis/test_align.py Outdated Show resolved Hide resolved
package/MDAnalysis/analysis/align.py Show resolved Hide resolved
package/MDAnalysis/analysis/align.py Outdated Show resolved Hide resolved
match_score, mismatch_penalty, gap_penalty, gapextension_penalty)
# choose top alignment
return aln[0]
aligner = Bio.Align.PairwiseAligner(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main argument for pairwisealigner is scoring do we want/need to pass that in somehow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are deprecating it then no, let's not add features. At this stage I'd just want to avoid the code suddenly breaking because pairwise2 is gone (and getting rid of DeprecationWarning).

@orbeckst
Copy link
Member Author

orbeckst commented Dec 8, 2022

Same error on Windows 32bit. Not sure what to do — no idea which version of hdf5 is installed or if it even has anything to do with bumping biopython to 1.80; at least I wouldn't understand how, given the modest requirements of biopython.

@IAlibay
Copy link
Member

IAlibay commented Dec 8, 2022

@orbeckst I think I might know where to start looking, do you mind if I push directly to the branch?

@orbeckst
Copy link
Member Author

orbeckst commented Dec 8, 2022

do you mind if I push directly to the branch?

Please go ahead!

(I can see how reviewing this PR motivated you to get PEP8 back in PR #3954 , sorry.)

@IAlibay
Copy link
Member

IAlibay commented Dec 8, 2022

Did it... just fix itself?

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just merge this whilst it's green @orbeckst 🤣 (if azure fails elsewhere we'll do a separate PR)

@orbeckst
Copy link
Member Author

orbeckst commented Dec 8, 2022

haha, ok, will do

@orbeckst orbeckst merged commit a05a524 into develop Dec 8, 2022
@orbeckst orbeckst deleted the replace-bio-pairwise branch December 8, 2022 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[deprecation] Bio.pairwise2 is deprecated in favor of Bio.Align.PairwiseAligner
2 participants