Refactor SIMD code and add support for SSE2 and NEON #32

althonos · 2022-09-28T10:49:24Z

Hi from a Martin to another!

I came to work on Opal since it's used in STAR, and I'm trying to improve the portability of some bioinformatics tools to new platforms (namely Aarch64 which is becoming more prevalent with the new Mac M1 chip). Even though it looks you're not in the field anymore, I hope you can find some time to review and merge this PR!

The current way SIMD was handled was not so compatible with NEON because some SSE/AVX intrinsics do not have strict equivalents in NEON, so I removed the macro-based aliasing and made all SIMD implementations in the Simd<T> templates. Doing so also let me add an emulation layer for SSE2 in the event SSE4.1 is not available. In theory, it should even be possible to add a "fake" SIMD implementation with just one lane to be used as a fallback on some other machines that way, but getting NEON support should already cover the vast majority of modern Arm machines.

Since the code was common between Simd<T> and SimdSw<T> (except for the signedness of min/max operations on char vectors), I merged the 6 different template implementations into just 4, and added the remaining SIMD operations like load/store/bitwise-and to the template as well.

Test

The NEON code was tested on the Raspberry Pi 4:

$ uname -a
Linux chloroplast 5.10.0-17-arm64 #1 SMP Debian 5.10.136-1 (2022-08-13) aarch64 GNU/Linux

$ ./test
Starting Opal!
Using ARM NEON!
Time: 7.257904
Maximum: 573

Starting normal!
Time: 10.553968
Maximum: 573

Times faster: 1.454134

$ ./opal_aligner -x1 ../test_data/query/O74807.fasta ../test_data/db/uniprot_sprot15.fasta
Using SW alignment mode.
Reading query fasta file...
Read query sequence, 110 residues.

Reading database fasta file...
Read 15 database sequences, 4491 residues total.
Whole database read: 15 database sequences, 4491 residues in total.

Comparing query to database...
Finished!

#<i>: <score> (<query start>, <target start>) (<query end>, <target end>)
#0: 155 (?, ?) (106, 161)
#1: 178 (?, ?) (109, 229)
#2: 169 (?, ?) (107, 314)
#3: 119 (?, ?) (108, 145)
#4: 152 (?, ?) (109, 437)
#5: 68 (?, ?) (102, 58)
#6: 143 (?, ?) (106, 214)
#7: 151 (?, ?) (105, 181)
#8: 182 (?, ?) (109, 304)
#9: 94 (?, ?) (84, 74)
#10: 126 (?, ?) (87, 126)
#11: 169 (?, ?) (108, 398)
#12: 138 (?, ?) (99, 239)
#13: 181 (?, ?) (106, 927)
#14: 108 (?, ?) (97, 84)

Cpu time of searching: 0.01
GCUPS (giga cell updates per second): 0.07

althonos · 2023-05-30T10:04:15Z

I also fixed a segmentation fault in NW with full alignment mode, where the substitution matrix may have been negatively indexed after reaching the sequence edges, see althonos/pyopal#1

…Database_`

althonos added 7 commits September 27, 2022 23:53

Merge the Simd and SimdSW templates

f2504ec

Refactor all SIMD-specific code into templates

7fcf5a6

Make SIMD vector types specific to each templated implementation

fa55686

Add Arm implementation using NEON intrinsics

60ee9ce

Fix implementation of SIMD::allzeroes for NEON templates

61c4221

Add implementation for SSE2 emulating _mm_min_epi8 and _mm_max_epi8

133d2d3

Update README.md to mention NEON and SSE2 support

3698706

pettyalex mentioned this pull request Apr 18, 2023

Plans for ARM Support? alexdobin/STAR#1833

Open

Fix crash in findAlignment on NW/full alignment mode

07fb127

althonos added 3 commits August 28, 2023 23:19

Re-enable overflow detection for non-saturating arithmetic in `search…

c719ebb

…Database_`

Fix out-of-bounds access in findAlignment

adf83aa

Patch opal.cpp to allow compiling with MSVC

ae5be38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor SIMD code and add support for SSE2 and NEON #32

Refactor SIMD code and add support for SSE2 and NEON #32

althonos commented Sep 28, 2022

althonos commented May 30, 2023 •

edited

Loading

Refactor SIMD code and add support for SSE2 and NEON #32

Are you sure you want to change the base?

Refactor SIMD code and add support for SSE2 and NEON #32

Conversation

althonos commented Sep 28, 2022

Test

althonos commented May 30, 2023 • edited Loading

althonos commented May 30, 2023 •

edited

Loading