genomic-medicine-sweden/nallo is a bioinformatics analysis pipeline for long-read rare disease SV/SNV identification using both PacBio and (targeted) ONT-data. Heavily influenced by best-practice pipelines such as nf-core/nanoseq, nf-core/sarek, nf-core/raredisease, PacBio Human WGS Workflow, epi2me-labs/wf-human-variation and brentp/rare-disease-wf.
- Short variant calling & joint genotyping of SNVs (
deepvariant
+GLNexus
) - SV calling and joint genotyping (
sniffles2
) - Tandem repeats (HiFi only) (
TRGT
) - Assembly based variant calls (HiFi only) (
dipcall
) - CNV-calling (
HiFiCNV
) - Call paralogous genes (
Paraphase
)
- Annotate SNVs and INDELs with database(s) of choice, i.e. gnomAD, CADD etc. (
echtvar
andVEP
) - Annotate repeat expansions with stranger
- Rank variants (
GENMOD
)
Note
If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test
before running the workflow on actual data.
Prepare a samplesheet with input data:
samplesheet.csv
project,sample,file,family_id,paternal_id,maternal_id,sex,phenotype
testrun,HG002,/path/to/HG002.fastq.gz,FAM1,HG003,HG004,1,2
testrun,HG005,/path/to/HG005.bam,FAM1,HG003,HG004,2,1
Now, you can run the pipeline using:
nextflow run genomic-medicine-sweden/nallo -profile YOURPROFILE \
--input samplesheet.csv \
--preset <revio/pacbio/ONT_R10> \
--fasta <reference.fasta> \
--outdir <OUTDIR>
For more details and further functionality, please refer to the usage documentation.
Warning
Please provide pipeline parameters via the CLI or Nextflow -params-file
option. Custom config files including those provided by the -c
Nextflow option can be used to provide any configuration except for parameters;
see docs.
To run in an offline environment, download the pipeline and singularity images using nf-core download
:
nf-core download genomic-medicine-sweden/nallo
genomic-medicine-sweden/nallo was originally written by Felix Lenner.
We thank the following people for their extensive assistance in the development of this pipeline:
If you would like to contribute to this pipeline, please see the contributing guidelines.
This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md
file.