-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pynteny: a Python package to perform synteny-aware, profile HMM-based searches in sequence databases #65
Comments
Hello, I would like to submit the package described above to pyOpenSci. Not sure if it fits your scope. Thanks! |
Hi @Robaina and welcome to pyOpenSci! Thank you for filling out the submission in a detailed thoughtful way.
We should coordinate with them. Our review processes are complementary. So far we've usually done the pyOS review first and then we pass off to JOSS. For example, see discussion on the pyGMT review here: #43 or the completed devicely review: #37 If that makes sense, please let me know, and please reference this issue on the open JOSS review issue. I will need to do more detailed editor checks but I wanted to welcome you and give you an initial answer as soon as possible. I will follow up tomorrow. Thank you @Robaina! |
Hi @NickleDave , thank you! I'm looking forward to being part of the pyOpenSci community :)
Excellent. Yes, that makes sense, I'll reference this issue in the open JOSS review issue.
Thank you. I recently learned about pyOpenSci and JOSS and feel very aligned with your values regarding open source and scientific development. Would be happy to help. |
Glad to hear it 🙌
I saw you commented there, thank you. I should have made it more clear, typically we review first so you do not need to go through two reviews. @lwasser asked me to clarify. I will comment on that review and link for the editor so they're aware. On my to-do list to do more thorough checks later! |
Hi again @Robaina I am familiarizing myself with the project a little more before I ask you to make a full submission; I want to make sure I understand how it works. Good things:
Some feedback:
Questions:
|
Hi @NickleDave, thank you for your careful first review of the project, here are my thoughts:
That makes sense. Will port the wiki to a Sphinx site. I agree on including cli snippets!
Will do. A good opportunity to finally get rid of the deprecated setup.py.
Yeah, actually I wasn't following any standard structure in the project. Seems reasonable to change it to what you suggest.
It started like that, yes. Pynteny also depends on Prodigal, which is available in bioconda. I made wrappers for both to be run within Pynteny. Although providing a wrapper was more of a necessity rather than a designed functionality back then. However, I have checked and there is a python binding to prodigal, which is pip installable: Pyrodigal. All other dependencies are also both pip and conda installable, so I could make Pynteny also pip installable by switching from using HMMER and Prodigal to using PyHMMER and Pyrodigal. I would have to delve into it since I believe that PyHMMER and HMMER output formats are structured differently. I plan to submit Pynteny to the bioconda channel since it is dedicated to software within the biological sciences. Bioconda would make Pynteny available for MacOS as well as Linux, but not for Windows (that is, outside using WSL). However, I think that it is worth trying PyHMMER and Pyrodigal so I can make Pynteny also pip installable and potentially usable also in Windows without WSL
Currently, Pynteny provides a simple CLI wrapper to HMMER (contrary to the sophisticated interface provided by PyHMMER). This is not the main functionality of Pynteny (and could be entirely removed by using PyHMMER instead of HMMER). Pynteny uses HMMER (potentially PyHMMER) as a tool to run sequence profile HMM searches, which is a step required to then filter those searches by the syntenic structure provided by the user. So, to clarify, these would be the main functionalities provided by Pynteny:
See my answers above. Hope my comments help clarify Pynteny's functionality. Please, don't hesitate to ask any further questions. |
Yes, that was clear to me. Sorry, I should have explicitly stated this in the referenced issue at JOSS like you did. |
@Robaina so so psyched to have you here. Just a note that if you need help with any of the above items we are happy to support you. you can post questions here in this issue OR on discourse if you wish. I just went through updating a package with docs and pyproject.toml and found that the online documentation was a bit confusing so just reach out if you have questions! in the future we will have better documentation to support making changes like the ones discussed above! |
Hi @lwasser, thanks! You and @NickleDave have been very supportive. Alright, I'll try discourse as it may help future developers. |
Hi @NickleDave, I have thought about relocating the tests directory to root as you suggested. While I see the advantages you listed by locating tests there, I'm not entirely convinced that this is the best option in this case. This is because currently tests are installed (as a subpackage) alongside pynteny and can be run with the subcommand |
Hi @Robaina thank you so much for your clear answers and for being open to feedback. Your response makes me more confident Pynteny is in scope, since it also includes data retrieval, and my understand that it leverages functionality from both HMMER and Prodigal to provide a unique ability to query a peptide database with HMMs. Aside: do you have a sort of workflow diagram in your docs that captures this description, including which dependency does what? Might be helpful Before we make a final decision, I just want to get a little feedback from devs in bioinformatics and genomics--I have asked in our Slack--but I am 90% sure we will ask you to go ahead with a submission. (I should have asked them before, I am sorry for not doing so in the first place.) In the meantime, could I ask you to track any changes you are making as a result of this presubmission as issues, and then reference those issues on any submission? @lwasser may modify our survey as well to make sure we capture that (we are still figuring out process as we come back on line). So far we have (correct me if I'm wrong):
|
re: keeping tests in docs This does not affect our decision on a submission; we don't have any hard and fast rules right now about where test should be. As a developer I totally understand why you want to keep them there. The reasons I would suggest moving them out are Most recent updated guides I know suggest putting tests in a separate directory: Does that help? |
Hi @NickleDave ,
Excellent! Regarding the diagram: I could do a simple one and include it in the index page of the docs.
Alright, no problem. I'll wait for the final decision.
I already made some changes, but I can do that from now on.
Yeah, so I'll try myst as per your suggestion. Ideally, I want to use the same markdown files I already made but I think that shouldn't be a problem.
This is already been taken care of. Thanks for your comments about tests. Makes sense. I have relocated "tests" outside the package and removed "Pynteny tests" subcommand.
I have this in mind and will definitely test those packages. However, I do have a question: would making Pynteny pip-installable be a requisite for consideration at pySciOpen? Or rather a desirable change that I could do later on? Thank you for all your suggestions. They have definitely been very helpful |
Happy to hear the suggestions are helpful! Just so we're on the same page: I am still expecting we will go forward with this as a submission, I am just waiting to hear back from subject matter experts. Thanks for your patience up until now and please expect that final decision early next week.
Short answer: we want to be understanding of the current reality of Python packaging and we want to favor providing people access to good tools. So I would not say it's a requirement. Longer answer: we don't have official language on this in our guides right now and this is part of the revisions we're doing right now. So effectively, no it's not a requirement. I see that you did raise an issue about it. @lwasser noticed that Pyteny is on PyPI. I was able to pip install on Linux and at least run the download command from the cli. On mac with Python 3.7 I get Given that it's kind of already on PyPI, plus our discussion above, I think this means it seems very possible you can make Pynteny pip installable once you figure out any data format issues. Again we would not require you get this done before submission or somehow reject after review because you weren't able to get it done. Does that make sense? Great to see all the progress you've already made just during this presubmission--new docs look good! Feel free to ask if you have further questions. Otherwise I will be back in touch early next week. |
Hi again :)
no problem at all, I did not read it as an "imposition". I asked that because I'm not entirely sure I can easily integrate PyHMMER / Pyrodigal into Pynteny without major changes to the code. Also, using those packages means abandoning usage of the original implementations which I guess are more tested. Anyway, I haven't tested it yet, I just wanted to get an idea about your opinions on the topic.
Right. Sorry, Pynteny on pip is not functional. I thought I had deleted Pynteny from pip. I already did. This was an early attempt (hence the old setup.py file as well) but I abandoned it after realizing HMMER and prodigal were only on bioconda, so it made little sense to me to first install a conda environment with dependencies and then Pynteny via pip. Thus I continued with conda build and made a recipe. If PyHMMER/Pyrodingal works well, I'll build a proper pip package (using myproject.toml) and upload it to pip.
It does. Again, thanks for the suggestions. I'm not a software engineer / developer by training so these suggestions are very useful. Till next week then! |
Hi @Robaina! And of course we appreciate your enthusiasm! I did try to install as in the README last night and had some issues.
Does that make sense? We do feel I have some meetings ~12-5 today but will raise those issues about what I ran into with installs by the end of the day at the latest |
Hi @NickleDave , no worries, it's all good. I have added the meta file I used to build the conda package. I have only tested it in Ubuntu 20.04 so far... I'll look into the issue when you post it (and ask around Discourse) to figure out what happened in your case. Thanks! |
Great, thank you @Robaina for your quick reply and again for your patience. We haven't run into this situation before where a package was installed through a user's private channel.
As I said, we have not faced this situation before, so we do not have a process and documentation in place for it. We are correcting that in our docs now. I totally understand if you are frustrated and you feel like we keep adding requirements for review. Please know that is not our intention. Given that:
How does that sound? I promise not to add any more requests! And I'm sorry for not doing due diligence in the first place. We do think this is in scope and want to get you a review as soon as we can! |
Hi @NickleDave,
No problem. I actually had already prepared and submitted a recipe to bioconda and checked that it passed all build tests there. But then canceled the pull request since at the time I thought it didn't make much sense to submit to bioconda before the review (and all the potential changes to the code). However, it's fine with me to submit Pynteny to bioconda prior to revision. Will do.
From the user viewpoint conda / bioconda should take care of installing all dependencies, so one wouldn't need to install HMMER and Prodigal as a prerequisite to then install Pynteny. However, for developers / reviewers I think it's best to add a section in README.md containing info for developers. This section would contain a list of dependencies (also listed in environment.yml) as well as instructions on how to install Pynteny locally with setup.py / myproject.toml to avoid running a conda-build (which is considerably slower). Would this be an acceptable option?
No problem. I'll open a PR in bioconda today or tomorrow.
I understand that these requirements are meant to improve the software / review process. Also, I know that there are a lot of changes going on at PyOpenSci right now. No problem! |
🙌 🙌 🙌 Thank you so much. We really appreciate your understanding. This has been a bit bumpy but you are really helping us improve our process.
Yes, this would be perfect. As long as you list them, good enough, but ideally I can clone the repo, make a conda env with those other dependencies, and then And thanks for removing the environment.yml. I know I have some projects where I need to do that too 😇 😹. The joys of developing with multiple package managers.
Excellent! |
Hi @NickleDave , I have opened another issue (#67) with a full submission. Thanks! |
Closing this now that you submitted! Thank you again |
Submitting Author: Semidán Robaina (@Robaina)
Package Name: Pynteny: a Python package to perform synteny-aware, profile HMM-based searches in sequence databases
One-Line Description of Package: Query sequence database by HMMs arranged in predefined synteny structure
Repository Link (if existing): https://github.com/Robaina/Pynteny
Description
Pynteny is Python tool to search for synteny blocks in (prokaryotic) sequence data through HMMs of the ORFs of interest and HMMER. By leveraging genomic context information, Pynteny can be employed to decrease the uncertainty of functional annotation of unlabelled sequence data due to the effect of paralogs. Pynteny can be accessed (i) through the command line, (ii) as a Python module or (iii) as a (locally served) web application.
Scope
Please indicate which category or categories this package falls under:
Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:
Pynteny's main objective is to provide a means to query NGS (unannotated) sequence databases, such as metagenomic/metatranscriptomic datasets using syntenic blocks (i.e. spatial arrangements of genes) rather than single target genes/protein domains. In this sense, I would classify Pynteny within Data Extraction.
On the other hand, Pynteny can also be employed in microbiology / genetic courses. To this end, it provides a web graphical interface (Streamlit app) to facilitate interaction. We have successfully employed Pynteny in some of our microbiology courses at the University of La Laguna. Hence, I think tagging Pynteny within "Education" may be appropriate.
Pynteny was designed to be used by researchers working with large, unannotated sequence databases, such as those typically encountered in metagenomic analyses. It can be accessed through a command line interface or easily integrated into pipelines as a Python package. Pynteny can also be used through a graphical interface running locally in the browser, which is more suitable for educational purposes.
To extent of my knowledge, there isn't any Python package that provides the functionality provided by Pynteny.
I submitted this package for publication at JOSS a few days back. The submission is currently under consideration for scope.
P.S. *Have feedback/comments about our review process? Leave a comment here
The text was updated successfully, but these errors were encountered: