- The main program is developed in Perl v 5.8.8.
- Parallel::ForkManager module from CPAN
(http://search.cpan.org/~dlux/Parallel-ForkManager-0.7.9/lib/Parallel/ForkManager.pm) - String::Approx module from CPAN
(http://search.cpan.org/~jhi/String-Approx-3.27/Approx.pm) - R for ploting
(http://www.r-project.org/) - Jellyfish for kmer counting (Optional) (http://www.cbcb.umd.edu/software/jellyfish/)
Note: The two Perl modules can be installed by INSTALL.sh script in the lib directory.
cd lib
./INSTALL.sh
-
Trimming by quality 5 and filtering reads with any ambiguous base or low complexity.
$ perl FaQCs.pl -p reads1.fastq reads2.fastq -d out_directory
-
Quailty check only on subsamples of input, no trimming and filtering.
$ perl FaQCs.pl -p reads1.fastq reads2.fastq -d out_directory -qc_only
Usage: perl FaQCs.pl [options] [-u unpaired.fastq] -p reads1.fastq reads2.fastq -d out_directory
Version 1.34
Input File: (can use more than once)
-u <Files> Unpaired reads
-p <Files> Paired reads in two files and separate by space
Trim:
-mode "HARD" or "BWA" or "BWA_plus" (default BWA_plus)
BWA trim is NOT A HARD cutoff! (see bwa's bwa_trim_read() function in bwaseqio.c)
-q <INT> Targets # as quality level (default 5) for trimming
-5end <INT> Cut # bp from 5 end before quality trimming/filtering
-3end <INT> Cut # bp from 3 end before quality trimming/filtering
-adapter <bool> Trim reads with illumina adapter/primers (default: no)
-rate <FLOAT> Mismatch ratio of adapters' length (default: 0.2, allow 20% mismatches)
-artifactFile <File> additional artifact (adapters/primers/contaminations) reference file in fasta format
Filters:
-min_L <INT> Trimmed read should have to be at least this minimum length (default:50)
-avg_q <NUM> Average quality cutoff (default:0, no filtering)
-n <INT> Trimmed read has more than this number of continuous base "N" will be discarded.
(default: 2, "NN")
-lc <FLOAT> Low complexity filter ratio, Maximum fraction of mono-/di-nucleotide sequence (default: 0.85)
-phiX <bool> Filter phiX reads (slow)
Q_Format:
-ascii Encoding type: 33 or 64 or autoCheck (default)
Type of ASCII encoding: 33 (standard) or 64 (illumina 1.3+)
-out_ascii Output encoding. (default: 33)
Output:
-prefix <TEXT> Output file prefix. (default: QC)
-stats <File> Statistical numbers output file (default: prefix.stats.txt)
-d <PATH> Output directory.
Options:
-t <INT > # of CPUs to run the script (default:2 )
-split_size <INT> Split the input file into several sub files by sequence number (default: 1000000)
-qc_only <bool> no Filters, no Trimming, report numbers.
-kmer_rarefaction <bool>
Turn on the kmer calculation. Turn on will slow down ~10 times. (default:Calculation is off.)
(meaningless if -subset is too small)
-m <INT> kmer for rarefaction curve (range:[2,31], default 31)
-subset <INT> Use this nubmer x split_size for qc_only and kmer_rarefaction
(default: 10, 10x1000000 SE reads, 20x1000000 PE reads)
-discard <bool> Output discarded reads to prefix.discard.fastq (default: 0, not output)
-substitute <bool> Replace "N" in the trimmed reads with random base A,T,C ,or G (default: 0, off)
-trim_only <bool> No quality report. Output trimmed reads only.
-5trim_off <bool> Turn off trimming from 5'end.
-debug <bool> keep intermediate files
======== Version 1.34
- add option "-5trim_off Turn off trimming from 5'end."
- add INSTALL.sh script for two requried perl modules installations.
======== Version 1.33
- input paired no need quote for exploit the autocomplete feature
- add trim_only option
- mode with "HARD" or "BWA" or "BWA_plus" (default BWA_plus)
======== Version 1.32
- add -5end and -3end to cut x number base from 5' end or 3' end before quality trimming/filtering
- fix bug on phiX filtering with reverse complementary strand hit
- fix error when all reads in subsample are filtered/trimmed.
======== Version 1.31
- report raw subsample graphic results side-by-side with qc results for comparison.
======== Version 1.3
- add -phiX to filter phiX reads
- add -substitute to replace "N" in the trimmed reads with random base A,T,C ,or G
- change -adapter behavior from filtering to trimming
- change -n behavior from # of tolerance to number of continuous base "N" filtering
======== Version 1.2
- add -adapter and -artifactFile for filtering reads with Adapters/Primers and other contaminations
- require String::Approx module from CPAN for above function
======== Version 1.1 New features and changes in illumina_fastq_qc version 1.1 with respect to version 1.0:
- add -qc_only option for quick quality check without trimming and filtering
- add -discard option to output discarded reads
======== Version 1.0 Stable function release. Features:
- trim bidirection
- minimium length filtering after trim
- "N" base filter
- low complexity filter
- average read quality filter
- autocheck quality encoding and quality encoding coversion
- multi-threads (required Parallel::ForkManager)
- input paired end reads aware
Chienchi Lo, PatrickS.G. Chain (2014) Rapid evaluation and Quality Control of Next Generation Sequencing Data with FaQCs. BMC Bioinformatics. 2014 Nov 19;15