-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Renumbering residues #51
Comments
Hi, I just see that I didn't seem to have implemented something like that. There are some easy ways to do that using pandas base functionality. E.g., to decrease the residue numbers by 1 you can simply do
However, if the residue numbers are not in order, or if there are gaps, like (1, 2, 10, 20), which you want to rename to (1, 2, 3, 4, 5), you would have to do it differently. E.g., you could first get all the unique residue numbers in the order they appear: from collections import OrderedDict
ordered_unique_elements = \
list(OrderedDict.fromkeys(ppdb.df['ATOM']['residue_number'])) and then map from the old residue numbers to the new, contiguous residue numbers: mapping_dict = {ordered_unique_elements[i]: i+1
for i in range(0, len(ordered_unique_elements))}
ppdb.df['ATOM']['residue_number'] = \
ppdb.df['ATOM']['residue_number'].map(mapping_dict) I could actually add that as a method to BioPandas, or maybe just explain it in documentation. What do you think? |
Wow, that is some elegant python code! Do you think both methods should handle renumbering the ANISOU records at the same time? Otherwise the records might go out of sync. What I'm trying to achieve is to split a pdb by chains, reorder the chains and combine them in a different pdb. I've looked also at pdbtools, however that is more command line based and I'd like to do that in python code. |
@rasbt when you re-number sequentially you should also consider insertion codes (i.e. get all unique residue numbers + icodes, assign new numbers and remove the insertion codes). |
Good point. Yeah, with the renumbering, there are so many things to consider, all of which are pretty use-case specific. (Probably why I haven't made such a function/method in the past). I am still thinking whether a standardized renumbering method should be added vs extending the documentation with easy-to-follow examples that give people more flexibility ... |
I would second a renumbering function, especially for antibody sequences. The insertion code makes it pretty difficult |
Sounds good, I agree. I am currently caught up with a pretty long to do list of other things (and the semester is going to start Tue); so I am not sure when I will get to this, yet. If someone wants to take a crack at it, I welcome PRs. |
Insertion codes were never much of an issue for me until i started in working with antibodies, where they are everything (different programs use different numbering, its a nightmare!) Anyway with the help of Stack Overflow I was able to figure this out, (https://stackoverflow.com/questions/59804249/mapping-tuple-dictionary-to-multiple-columns-of-a-dataframe). I will make a PR when I'm less embarrassed of my brute force methods and ugly code. For now here are my notes. ppdb.amino3to1 will 'cut_out' duplicate residue numbers with insertions. You sequence needs to be rid of insertion codes (unique 'residue_number') for the sequence to be returned correctly. For an antibody complex for instance, I split off the heavy and light chain sequences from ppdb.df['ATOM'] into separate dataframes and renumbered them sequentially without insertion codes with the following function (inspired by sebastian)
I added the renumbered heavy and light chain dataframes back into ppdb.df['ATOM'] I worked with my renumbering script (Anarci) to output a dataframe such the output had columns corresponding to the 'residue_num' 'insertion' 'new_res' and 'new_ins'
then a left sided merge back into the corresponding heavy or light chain dataframe (I'm a little unsure how this works still), drop the origional residue_numbers and rename the new. Merge the whole thing back into the PandasPdb and write out. I'm sure theres a more elegant way, but please give me a break, I'm a crystallographer. I love Biopandas by the way. I hope this helps anyone struggling with the same issue. |
is slightly better in my opinion. |
Same request for such a built-in feature. |
Hi! Nice library, has a lot of potential.
Is there a way to renumber residues?
Renumbering atoms seems trivial (just assign a range to the atom_number), however renumbering residues would probably require some heavy duty group_by magic and could be built in.
(renumbering atoms could also be built in:)
The text was updated successfully, but these errors were encountered: