Skip to content
/ halc Public

High Throughput Algorithm for Long Read Error Correction

Notifications You must be signed in to change notification settings

lanl001/halc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LATEST NEWS

The HALC paper is accepted for publication in BMC Bioinformatics!

Overview

HALC is software that makes error correction for long reads with high throughput.

Copy right

HALC is under the Artistic License 2.0.

Short manual

  1. System requirements

    HALC is suitable for 32-bit or 64-bit machines with Linux operating systems. At least 4GB of system memory is recommended for correcting larger data sets.

  2. Installation

    Aligner BLASR and error correction software LoRDEC (only for -ordinary mode) are required to run HALC.

    • The source files in 'src' and 'thirdparty' folders can be compiled to generate a 'bin' folder by running Makefile: make all.
    • Put BLASR, LoRDEC and the 'bin' folder to your $PATH: export PATH=PATH2BLASR:$PATH , export PATH=PATH2LoRDEC:$PATH and export PATH=PATH2bin:$PATH, respectively.
  3. Inputs

    • Long reads in FASTA format.
    • Contigs assembled from the corresponding short reads in FASTA format.
    • The initial short reads in FASTA format (only for -ordinary mode; obtained with cat left_reads.fa >short_reads.fa and then cat right_reads.fa >>short_reads.fa).
  4. Using AlignGraph

    runHALC.py long_reads.fa contigs.fa [-options|-options]
    

    Options (default value):
    -o/-ordinary short_reads.fa (yes)
    Ordinary mode utilizing repeats to make correction. The error correction software LoRDEC and the initial short reads are required to refine the repeat corrected regions. It is exclusive with the -repeat-free option.
    -r/-repeat-free (no)
    Repeat-free mode without utilizing repeats to make correction. It is exclusive with the -ordinary option.
    -b/-boundary n (4)
    Maximum boundary difference to split the subcontigs.
    -a/-accurate (yes)
    Accurate construction of the contig graph.
    -c/-coverage n (auto)
    Expected coverage on contigs. If not specified, it can be automatically calculated.
    -w/-width n (4)
    Maximum width of the dynamic programming table.
    -k/-kmer n (25)
    Kmer length for LoRDEC refinement.
    -t/-threads n (auto)
    Number of threads for one process to create. It is automatically set to the number of computing cores.
    -l/-log (no)
    System log to print.

  5. Outputs

    • Error corrected full long reads.
    • Error corrected trimmed long reads.
    • Error corrected split long reads.

Chinese name

HALC's Chinese name is 浩克.

About

High Throughput Algorithm for Long Read Error Correction

Resources

Stars

Watchers

Forks

Packages

No packages published