GitHub - albertozeni/LOGAN: LOGAN: High-Performance Multi-GPU X-Drop Long-Read Alignment.

Introduction

Pairwise sequence alignment is one of the most computationally intensive kernels in genomic data analysis, accounting for more than 90% of the run time for key bioinformatics applications. This method is particularly expensive for third-generation sequences due to the high computational expense of analyzing these long read lengths (1Kb-1Mb). Given the quadratic overhead of exact pairwise algorithms such as Smith-Waterman, for long alignments the community primarily relies on approximate algorithms that search only for high-quality alignments and stop early when one is not found. In this work, we present the first GPU optimization of the popular X-drop alignment algorithm, named LOGAN.

Usage

Compilation

LOGAN requires CUDA 10 and C++14. To compile LOGAN simply type:

make demo v100=true

LOGAN has been optimized to run on the NVIDIA Tesla V100 (GB), but can run on any NVIDA GPU. To compile to use other GPUs simply type:

make demo

This command disables Tesla V100 GPU optmization flags. LOGAN has also been optimized to run on a specific number of GPU threads, but it is capable to adapt the number of GPU threads depending on the X drop value. To do so simply type:

make demo adapt=true

Demo

LOGAN generates an executable called demo. To check everything has been compiled properly type:

./demo inputs_demo/example.txt 17 21 1

This command executes LOGAN on our example dataset with a k-mer length of 17, an X-drop value of 21 using a single GPU. If everything executes correctly you can start using LOGAN with any input, any X-drop, and any number of GPUs.

The command line inputs are:

./demo [input] [k-mer-length] [X-drop] [#GPUS]

The input format for this demo is:

[seqV] [posV] [seqH] [posH] [strand]

Each line of the input contains a pair of sequences to align: the query sequence (seqV), the starting position of the seed on the query sequence (posV), the target sequence (seqH), the starting position of the seed on the target sequence (posH), and the relative strand ("c" if on opposite strands, "n" otherwise). Tab separated.

LOGAN has been incorporated as part of the BELLA genomic pipeline. BELLA is a computationally efficient and highly accurate long-read to long-read aligner and overlapper. Users can optionally use LOGAN as pairwise alignment kernel.

Performance Analysis

LOGAN and SeqAn execution times in seconds for 100K alignments (Power9 platform with 6 NVIDIA Tesla V100S). The dataset used for these comparisons is available here: https://drive.google.com/drive/folders/1IUpxTPeD4oIOtS6jhp1N9G2N2BNKgrCI?usp=sharing

X-Drop	SeqAn	LOGAN (1 GPU)	LOGAN (6 GPUs)
10	5.1	2.2	1.9
20	12.7	3.1	2.0
50	29.6	5.0	2.2
100	45.7	7.2	2.7
500	102.6	14.9	4.0
1000	133.3	20.2	4.9
2500	168.0	25.3	5.5
5000	176.6	26.7	5.8

LOGAN and ksw2 execution times in seconds for 100K alignments ("Skylake" platform with 8 NVIDIA Tesla V100S).

X-Drop	ksw2	LOGAN (1 GPU)	LOGAN (8 GPUs)
10	6.9	2.5	1.7
20	7.0	3.8	1.8
50	7.7	5.8	2.1
100	10.3	7.3	2.4
500	113.0	15.2	3.4
1000	209.5	20.4	4.3
2500	1235.8	25.9	5.2
5000	3213.1	27.2	5.2

Citation

To cite our work or to know more about our methods, please refer to:

Zeni A, Guidi G, Ellis M, Ding N, Santambrogio MD, Hofmeyr S, Buluç A, Oliker L, Yelick K. LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment. 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 462-471. IEEE, 2020.

Copyright Notice

LOGAN: High-Performance X-Drop Pairwise Alignment on GPU (LOGAN) Copyright (c) 2019, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy). All rights reserved.

If you have questions about your rights to use or distribute this software, please contact Berkeley Lab's Intellectual Property Office at [email protected].

NOTICE. This Software was developed under funding from the U.S. Department of Energy and the U.S. Government consequently retains certain rights. As such, the U.S. Government has been granted for itself and others acting on its behalf a paid-up, nonexclusive, irrevocable, worldwide license in the Software to reproduce, distribute copies to the public, prepare derivative works, and perform publicly and display publicly, and to permit other to do so.

Acknowledgments

Funding provided in part by DOE ASCR through the Exascale Computing Project, and computing provided by NERSC and the Oak Ridge Leadership Computing Facility. Thanks to Francesco Peverelli and Muaaz Awan for useful suggestions and valuable discussions.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
inputs_demo		inputs_demo
media		media
src		src
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
license.txt		license.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Usage

Compilation

Demo

Performance Analysis

Citation

Copyright Notice

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

License

albertozeni/LOGAN

Folders and files

Latest commit

History

Repository files navigation

Introduction

Usage

Compilation

Demo

Performance Analysis

Citation

Copyright Notice

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages