-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extending ncbi_byid #101
Comments
thanks for the issue @boopsboops There's a few things going on here, so separating them out:
also:
@dwinter is there anything else to mention here? |
Hi @sckott
So, I had a much better look at your code, and it seems fairly trivial to insert the things I need, and have successfully done so. What threw me was that you were using "GBSeq" XML style, which I have never seen before, and did not know was available. Anyway, if you think it will be valuable to offer these extra fields in your function, I can post the code here or do a pull request. If not, I'm happy to just fork and use it locally for my own work. Cheers! |
Thanks for the follow up. A PR with your changes sounds good |
@boopsboops does your merged PR solve all issues you had? |
Sorry for the delay. Yes, it did. Thanks for the help :) |
Hi @sckott et al (and @dwinter too),
I just stumbled on the function
ncbi_byid
after it cropped in the r-sig-phylo list.I'm working on assembling reference libraries for eDNA metabarcoding, and being able to curate, update and review the quality of your reference libraries is really important. Therefore having this data in a table, rather than fasta format is essential.
Your function seems super quick, and gives back a lovely table. However, it's lacking a lot of the fields I would require for better filtering and quality control of reference libraries, such as lat_lon, country, specimen_voucher, publication status etc etc etc.
All this kind of info is usually in the GenBank metadata, and I've already implemented such a function to "tabulize" it (
gb2df
), but I think you will agree, it's an absolute abomination. I used EBI rather than NCBI as it's much faster to download large numbers of sequences to a local tempfile, and then did a lot of multithreaded XML scraping (which is very inefficient).https://github.com/boopsboops/SeaDNA/blob/master/scripts/gb2df_example.R https://github.com/boopsboops/SeaDNA/blob/master/scripts/gb2df.R
So, if you guys think that these would be important or appropriate additions to
ncbi_byid
, I'll be more than happy to help (although I confess to not really getting my head around how the function works yet).Cheers,
Rupert
The text was updated successfully, but these errors were encountered: