Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undesired added bonus w ncbi_searcher using id (var.) #34

Closed
mpnelsen opened this issue Jun 19, 2015 · 4 comments
Closed

Undesired added bonus w ncbi_searcher using id (var.) #34

mpnelsen opened this issue Jun 19, 2015 · 4 comments
Assignees
Milestone

Comments

@mpnelsen
Copy link

Not trying to be a troublemaker, but I've encountered another minor issue with the the ncbi_searcher function. I'm using the txid number and it's retrieving the accessions of interest, but it's also retrieving accessions for varieties, forms, etc of the same species (which actually have different txid numbers). I know I can find filter the output and just extract the sequences i'm interested in (filter by species name or else get taxid numbers for each accession and then filter), but it seems like it would be more efficient to refine the initial search to only retrieve accessions matching that txid.

Example:

require(traits)
search.res<-ncbi_searcher(id=60451,getrelated=FALSE,entrez_query="internal transcribed spacer",verbose=FALSE,seqrange="350:3500")
sort(unique(search.res$taxon))
[1] "Tuber excavatum" "Tuber excavatum var. intermedium"
[3] "Tuber excavatum var. longisporum" "Tuber excavatum var. sulphureum"

I'm wondering if perhaps the [ORGN] modifier is automatically used in the ncbi_searcher search, which also grabs varieties (with diff txid numbers), etc.?

For instance, on NCBI...a search using:
txid60451[ORGN] AND internal transcribed spacer
----retrieves varieties

but,

txid60451 AND internal transcribed spacer
------or
txid60451[ORGN:noexp] AND internal transcribed spacer
------only retrieve the txid of interest

Would it be possible to modify the search option in ncbi_searcher to only retrieve accessions for the txid of interest? Thanks much for your help!

@sckott sckott added this to the v0.2 milestone Jun 19, 2015
@sckott sckott self-assigned this Jun 19, 2015
@sckott
Copy link
Contributor

sckott commented Jun 19, 2015

hi, not troublemaking at all - super useful in fact! there's nothing better than a user finding a potential problem

this is what we're using https://github.com/ropensci/traits/blob/master/R/ncbi_searcher.R#L124 the porgn - makes sense if searching by taxonomic ID to only get records that match that ID exactly

@sckott
Copy link
Contributor

sckott commented Jun 19, 2015

@mpnelsen can you reinstall from github, then try again?

see the new fuzzy parameter. releavant change is extracted here:

query_term <- if (fuzzy) {
    sprintf("xXarbitraryXx[porgn:__txid%s] AND %s[SLEN]", id, seqrange)
 } else {
    sprintf("txid%s AND %s[SLEN]", id, seqrange)
}

We needed to retain a fuzzy option since the getrelated parameter would have done nothing if we only allowed txid<ID> searches

@mpnelsen
Copy link
Author

works perfectly! thanks again.

@sckott
Copy link
Contributor

sckott commented Jun 19, 2015

👍

@sckott sckott modified the milestones: v0.2, v0.1.2 Aug 26, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants