-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multiqc #14
Add multiqc #14
Conversation
…es, fix dry run command
…nto add_kaiju_kraken
@skchronicles , example report at |
@@ -42,7 +41,6 @@ rule fastq_screen: | |||
subset = 1000000, | |||
aligner = "bowtie2", | |||
output_dir = lambda w: config['out_to'] + "/" + w.project + "/" + config['run_ids'] + "/" + w.sid + "/fastq_screen/", | |||
# container: "docker://rroutsong/dmux_ngsqc:0.0.1", | |||
containerized: "/data/OpenOmics/SIFs/dmux_ngsqc_0.0.1.sif" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add an option to point to a sif cache and dynamically resolve one of the following: a local SIF on the file-system or a URI to pull an image from Dockerhub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a solution to this issue in the next coming PR. I have serialized the server-centric SIF directories and dynamically adding the specific server configuration at initialization time.
Ends up like:
containerized: server_config["sif"] + "dmux_ngsqc_0.0.1.sif"
SIF cache is always specified at execution time through environmental variables and subprocess.
@@ -98,7 +100,6 @@ rule kraken_annotation: | |||
kraken_log = config['out_to'] + "/{project}/" + config['run_ids'] + "/{sid}/kraken/{sid}.log", | |||
params: | |||
kraken_db = "/data/OpenOmics/references/Dmux/kraken2/k2_pluspfp_20230605" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a method to dynamically resolve the reference files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also addressed this in the next PR. I just kind of saved all the server resolution methods until I moved onto bigsky.
log: config['out_to'] + "/.logs/" + config['projects'] + "/" + config['run_ids'] + "/multiqc/multiqc.log" | ||
shell: | ||
""" | ||
multiqc -q -ip \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point, we may want to point to a MutliQC config file to clean up the general statistics table, create two sections for fastqc, and create a preferred module order in the final report.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is outlined in #15
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to change how adapter sequences are being removed. Currently, there is a bug where the barcode sequences from Illumina's sample sheet (i7/i5) sequences are being passed to fastqc and fastp. These barcode sequences should be removed after bcl2fastq step and do not represent traditional library-prep-kit-specific adapter sequences that need to removed. With that being said, let's make use of fastp's auto-detect-adapter-sequences feature to remove them. We can also make use of fastqc's internal contaminates/adapters list to identify sequencing adapters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's fastp rule in new branch master_job_and_bigsky
:
shell:
"""
fastp \
--detect_adapter_for_pe \
--in1 {input.in_read1} --in2 {input.in_read2} \
--out1 {output.out_read1} \
--out2 {output.out_read2} \
--html {output.html} \
--json {output.json} \
"""
Fastqc:
shell:
"""
mkdir -p {params.output_dir}
fastqc -o {params.output_dir} -t {threads} {input.samples}
"""
FastQC before trim depends on demuxed reads, after trimmed depends on trimmed reads file.
Will address some of these comments/issues in the next PR. |
Add multiqc into ngsqc pipeline.
Addresses #12 #7