-
Notifications
You must be signed in to change notification settings - Fork 4
Does anyone know how to install uclust now that it's no longer available on Robert Edgar's website? #21
Comments
Jesse -- What is the goal of the pipeline? If it's making NAST
alignments the results are quite likely not very good and I'll bet
there's a better way to approach the analysis. Robert.
…On 2/9/2021 11:44 AM, Jesse McNichol wrote:
I'm using pyNAST for a software pipeline, but am running into an
issue. pyNAST depends on uclust, but it's no longer available - the
links on the qiime1 install page and the pyNAST biocore page now
redirect to USEARCH on Robert Edgar's website. Any suggestions
@rcedgar <https://github.com/rcedgar> @gregcaporaso
<https://github.com/gregcaporaso> @kylebittinger
<https://github.com/kylebittinger> @jairideout
<https://github.com/jairideout> ?
Thanks,
Jesse
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB4UI7FHOHKT6DCTI7LLM6DS6GGDXANCNFSM4XLTMXXA>.
|
Hi Robert, thanks for your prompt reply. The goal is to align SSU rRNA fragments extracted from metagenomics and metatranscriptomics to a stable reference sequence so I can then subset to regions containing binding sites for oligonucleotide primers. The ultimate goal is to then compare those sequences with primers typically used for PCR amplicon generation to evaluate how well primers should work on a given environmental dataset (paper in review currently). When I was constructing the pipeline I did do a lot of manual inspection of the alignments, and as far as I could tell pyNAST did an acceptable job - some sequences got shifted a by a few bases relative to the reference (e.g. E. coli 16S), but they all got aligned to the generally right area of the SSU rRNA molecule. This was fine as a way of subsetting the pile of SSU rRNA I got out into general regions of the molecule that I could then query for the oligo sequence. I would be very open to improving the pipeline in the future, so if you have suggestions on how to get a better stable alignment or improve the pipeline in general I would be all ears. For the near term though, it would be fantastic if there was a way to access the older |
Jesse -- go ahead and share with your collaborators, this message grants
permission for an exception to the license. A much better approach than
NAST is to make individual pair-wise alignments of each metagenomic
sequence (call it Q) to the top hit (call it T) in a large reference
such as SILVA. The quality of the Q+T pair-wise alignment will usually
be very good because the identity will be high, vastly better than a
NAST alignment which is riddled with inadvertent errors (introduced in
making the reference) and deliberately inserted errors (to force the
query into the pre-set number of columns). The usearch_global command in
usearch will do a good job of this; I recommend using -fulldp for this
use-case -- it's slower but avoids occasional artifacts introduced by
the default alignment algorithm. The search_oligodb command in usearch
can be used to align an existing primer set to SILVA to get the
coordinates for each primer in each reference sequence. Hope this helps.
Robert.
…On 2/9/2021 1:07 PM, Jesse McNichol wrote:
Hi Robert, thanks for your prompt reply. The goal is to align SSU rRNA
fragments extracted from metagenomics and metatranscriptomics to a
stable reference sequence so I can then subset to regions containing
binding sites for oligonucleotide primers. The ultimate goal is to
then compare those sequences with primers typically used for PCR
amplicon generation to evaluate how well primers should work on a
given environmental dataset (paper in review currently). When I was
constructing the pipeline I did do a lot of manual inspection of the
alignments, and as far as I could tell pyNAST did an acceptable job -
some sequences got shifted a by a few bases relative to the reference
(e.g. /E. coli/ 16S), but they all got aligned to the generally right
area of the SSU rRNA molecule. This was fine as a way of subsetting
the pile of SSU rRNA I got out into general regions of the molecule
that I could then query for the oligo sequence. I would be very open
to improving the pipeline in the future, so if you have suggestions on
how to get a better stable alignment or improve the pipeline in
general I would be all ears. For the near term though, it would be
fantastic if there was a way to access the older |uclust| executable
as we need a quick way to make this analysis reproducible to respond
to peer reviews that are due in a few weeks. I have an older |uclust|
executable on my system that still works, but my collaborators are
having trouble running the pipeline without |uclust| and it's not
clear how to get this executable to them without violating your
license agreement since it's no longer available on your website. Any
thoughts? Best, Jesse
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB4UI7C3UH2WLS4ILV3I6D3S6GPYHANCNFSM4XLTMXXA>.
|
Hi Robert, thanks a lot for your response. That makes sense and it sounds like usearch does exactly what I want to do in a much more robust way than pyNAST. I only wish I had asked you a year ago when I was constructing the pipeline! I will definitely keep this in mind for future iterations, which I may get to soon if there is sufficient community interest. Thank you also for agreeing that the executable can be shared with my collaborators. There may also be other people who wish to use the current iteration of my pipeline once the paper comes out (it's not pretty, but it works!). In this case, the easiest thing would be to host a copy of the usearch executable on my github page, but I don't know whether you'd be comfortable with this. If you'd prefer it not to be public, then maybe I could offer to share it by email so it cannot be freely downloaded? Would either of these options be acceptable for you? Best, Jesse |
email sounds fine.
…On 2/9/2021 2:40 PM, Jesse McNichol wrote:
Hi Robert, thanks a lot for your response. That makes sense and it
sounds like usearch does exactly what I want to do in a much more
robust way than pyNAST. I only wish I had asked you a year ago when I
was constructing the pipeline! I will definitely keep this in mind for
future iterations, which I may get to soon if there is sufficient
community interest. Thank you also for agreeing that the executable
can be shared with my collaborators. There may also be other people
who wish to use the current iteration of my pipeline once the paper
comes out (it's not pretty, but it works!). In this case, the easiest
thing would be to host a copy of the usearch executable on my github
page, but I don't know whether you'd be comfortable with this. If
you'd prefer it not to be public, then maybe I could offer to share it
by email so it cannot be freely downloaded? Would either of these
options be acceptable for you? Best, Jesse
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB4UI7BG3UGYBDRN4I4I2D3S6G2UTANCNFSM4XLTMXXA>.
|
Thanks Robert, will share the executable by email to any one who needs it. |
I'm using pyNAST for a software pipeline, but am running into an issue. pyNAST depends on uclust, but it's no longer available - the links on the qiime1 install page and the pyNAST biocore page now redirect to USEARCH on Robert Edgar's website. Any suggestions @rcedgar @gregcaporaso @kylebittinger @jairideout ?
Thanks,
Jesse
The text was updated successfully, but these errors were encountered: