Skip to content

Commit

Permalink
Replace home-brew string end searching with memchr.
Browse files Browse the repository at this point in the history
With long aux tags the trival while loop can be suprisingly slow.

"while (s < end && *s) ++s;" isn't well vectorised or turned into
word-by-word processing by neither gcc nor clang, but these tricks are
used by the system memchr implementation.

An alternative could be this (used in my WIP VCF parser), which is
more optimised for relatively short strings.  Included here just for
potential future reference on systems with noddy memchr
implementations.

    #define haszero(x) (((x)-0x0101010101010101UL)&~(x)&0x8080808080808080UL)
    static inline char *memchr8(char *s, char sym, size_t len) {
        const uint64_t sym8 = sym * 0x0101010101010101UL;
        uint64_t *s8 = (uint64_t *)s;
        uint64_t *s8_end = (uint64_t *)(s+(len&~7));

        while (s8 < s8_end && !haszero(*s8 ^ sym8))
            s8++;

        // Precise identification
        char *s_end = s + len;
        s = (char *)s8;
        while (s < s_end && *s != sym) {
            s++;
        }

        return s < s_end ? s : NULL;
    }
  • Loading branch information
jkbonfield committed Sep 24, 2024
1 parent 5d8a186 commit 3b71dbc
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions sam.c
Original file line number Diff line number Diff line change
Expand Up @@ -4856,9 +4856,9 @@ static inline uint8_t *skip_aux(uint8_t *s, uint8_t *end)
switch (size) {
case 'Z':
case 'H':
while (s < end && *s) ++s;
return s < end ? s + 1 : end;
case 'B':
s = memchr(s, 0, end-s);
return s ? s+1 : end;
case 'B':
if (end - s < 5) return NULL;
size = aux_type2size(*s); ++s;
n = le_to_u32(s);
Expand Down

0 comments on commit 3b71dbc

Please sign in to comment.