-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filtering & Annotation workflow #36
Comments
@NeginValizadegan maybe start with simple bash scripts first for testing the steps, then add to a nextflow script. |
There are some memory-related issue with blastn step. Job was killed at 10GB memory and bus error at 40, 100, and even 150 GB. Still troubleshooting. |
@NeginValizadegan re: the BLASTN work (and the annotation steps in general), I'm guessing you are trying to run all the annotation steps in one bash script? I'd recommend keeping it simple and testing out each step in an independent bash script; these can be independently moved into nextflow process blocks when they are working. So for example you have the |
@chrisfields Yes, but I set it up so that I can deactivate specific steps so not running it all at once but putting it all in one script. At the end of the script, I have the main section which allows me to comment out the steps I don't want to run easily. |
Linking a3476ca here |
Linking 321f764 |
…d sample fasta file as input
1. The order of processes has changed to this: 0. create blast databases 1. filter below a read length 2. kraken 3. blastn nt 4. run blast GRCh38, GRch38.p0, CHM13 5. repeatmasker 6. quast 2. The read length filtered fasta file will be now filtered based on a list of read ids from kraken that are not any of the following: - Homo sapiens - Eukaryota - cellular organisms - unclassified - root 3. The file from previous step will be used as input to blastn NT for further contamination detection. LIST OF TOOLS USED IN THIS PIPLINE: 1. blastn --> BLAST+/2.10.1 2. seqkit --> seqkit/0.12.1 2. kraken2 --> Kraken2/2.0.8 4. repeatmasker --> RepeatMasker/4.1.2 5. quast --> quast/5.0.0
…amination. The filtered file will be input to blast against human reference genome
Linking 1983b41 here |
…nge the parameter inside the config file from false to true to skip cd-hit
For example, you can do this to see the last commit: fc187b9 |
Linking 0f29654 here. |
Linking 3c7310a here. |
Linking cc16dc6 here. |
…script for testing the pipeline
Linking 4a9b3f2 here (pipeline testing). |
The first step in the workflow (assembly) is performed per sample and is in
assembly.nf
. @NeginValizadegan will work on the annotation steps for each sample assembly, with the basic steps:Any others?
The text was updated successfully, but these errors were encountered: