Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deprecate sequence_alignment() and replace deprecated Bio.pairwise2 with Bio.Align.PairwiseAligner #3951

Merged
merged 5 commits into from
Dec 8, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion package/CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ The rules for this file:

------------------------------------------------------------------------------
??/??/?? IAlibay, Luthaf, hmacdope, rafaelpap, jbarnoud, BFedder, aya9aladdin,
jaclark5, jfennick, lilyminium
jaclark5, jfennick, lilyminium, orbeckst

* 2.4.0

Expand Down Expand Up @@ -70,6 +70,8 @@ Enhancements
* Added isolayer selection method (Issue #3845)

Changes
* Replaced deprecated Bio.pairwise2 with Bio.align.PairwiseAligner in
MDAnalysis.analysis.align.sequence_alignment (Issue #3950)
* Moved `libmdanalysis` cython header to `lib`(Issue #3912, PR #3913)
* Auxiliary; determination of representative frames: The default value for
cutoff is now None, not -1. Negative values are now turned
Expand Down
47 changes: 36 additions & 11 deletions package/MDAnalysis/analysis/align.py
Original file line number Diff line number Diff line change
Expand Up @@ -187,13 +187,14 @@
import os.path
import warnings
import logging
import collections

import numpy as np

import Bio.SeqIO
import Bio.AlignIO
import Bio.Align
import Bio.Align.Applications
import Bio.pairwise2

import MDAnalysis as mda
import MDAnalysis.lib.qcprot as qcp
Expand Down Expand Up @@ -973,7 +974,7 @@ def sequence_alignment(mobile, reference, match_score=2, mismatch_penalty=-1,

The residues in `reference` and `mobile` will be globally aligned.
The global alignment uses the Needleman-Wunsch algorithm as
implemented in :mod:`Bio.pairwise2`. The parameters of the dynamic
implemented in :mod:`Bio.Align.PairwiseAligner`. The parameters of the dynamic
programming algorithm can be tuned with the keywords. The defaults
should be suitable for two similar sequences. For sequences with
low sequence identity, more specialized tools such as clustalw,
Expand Down Expand Up @@ -1004,22 +1005,46 @@ def sequence_alignment(mobile, reference, match_score=2, mismatch_penalty=-1,

See Also
--------
BioPython documentation for `pairwise2`_. Alternatively, use
BioPython documentation for `PairwiseAligner`_. Alternatively, use
:func:`fasta2select` with :program:`clustalw2` and the option
``is_aligned=False``.

.. _`pairwise2`: http://biopython.org/DIST/docs/api/Bio.pairwise2-module.html

.. _`PairwiseAligner`:
https://biopython.org/docs/latest/api/Bio.Align.html#Bio.Align.PairwiseAligner


.. versionadded:: 0.10.0

"""
aln = Bio.pairwise2.align.globalms(
reference.residues.sequence(format="string"),
mobile.residues.sequence(format="string"),
match_score, mismatch_penalty, gap_penalty, gapextension_penalty)
# choose top alignment
return aln[0]
.. versionchanged:: 2.4.0
Replace use of deprecated :func:`Bio.pairwise2.align.globalms` with
:class:`Bio.Align.PairwiseAligner`.

"""
aligner = Bio.Align.PairwiseAligner(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main argument for pairwisealigner is scoring do we want/need to pass that in somehow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are deprecating it then no, let's not add features. At this stage I'd just want to avoid the code suddenly breaking because pairwise2 is gone (and getting rid of DeprecationWarning).

mode="global",
match_score=match_score,
mismatch_score=mismatch_penalty,
open_gap_score=gap_penalty,
extend_gap_score=gapextension_penalty)
aln = aligner.align(reference.residues.sequence(format="Seq"),
mobile.residues.sequence(format="Seq"))
# choose top alignment with highest score
topalignment = aln[0]

# reconstruct the results tuple that used to be of type Bio.pairwise2.Alignment
AlignmentTuple = collections.namedtuple(
"Alignment",
["seqA", "seqB", "score", "start", "end"])
# extract sequences (there's no obvious way to get the character
# representation with gaps by other means from the new
# Bio.Align.PairwiseAlignment instance)
seqA, _, seqB, _ = topalignment.format().split("\n")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only worked until Bio 1.79. Then they changed the format.

I knew that parsing str was iffy but luckily our tests showed me right away how iffy that was.

I am now trying to find another way to get our old output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out, it's trivial

seqA = topalignment[0]
seqB = topalignment[1]

... testing now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs biopython >= 1.80.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we'll need to offer both options and just do a version check?


# start/stop are not particularly meaningful and there's no obvious way to
# get the old pairwise2 start/stop from the new PairwiseAligner output.
return AlignmentTuple(seqA, seqB, topalignment.score,
0, max(reference.n_residues, mobile.n_residues))

orbeckst marked this conversation as resolved.
Show resolved Hide resolved
def fasta2select(fastafilename, is_aligned=False,
ref_resids=None, target_resids=None,
Expand Down