Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

riscv64: Implement optimised crc using zbc and zbb extensions #299

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Commits on Aug 8, 2024

  1. build: Add riscv64 support

    Use the base implementations for every function.
    
    Signed-off-by: Daniel Gregory <[email protected]>
    daniel-gregory committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    8a4c891 View commit details
    Browse the repository at this point in the history

Commits on Aug 30, 2024

  1. riscv64: Implement optimised crc using zbc and zbb

    The Zbc extension defines instructions for carryless multiplication that
    can be used to accelerate the calculation of CRC checksums. This
    technique is described in Intel's whitepaper, "Fast CRC Computation for
    Generic Polynomials Using PCLMULQDQ Instruction".
    
    The Zbb extension defines, among other bit manipulation operations, an
    instruction for byte-reversing a register (rev8). This is used when
    doing endianness swaps.
    
    crc_fold_common_clmul.h defines a macro that reduces a double-word
    aligned buffer to 128 bits by folding four 128-bit chunks in parallel
    then folding a single 128-bit chunk until less than two remain. This
    macro can be reused for all the CRC algorithms with some parametrisation
    controlling:
    
    - where the seed is xor-ed into the first fold
    - whether an endianness swap is needed on double-words read in
    - whether the algorithm is reflected, which affects whether clmulh gives
      back the high double word of a result or the low double word
    
    Where the algorithms differ more is in how the final 128-bits is reduced
    to a 32/64 bit result (which also changes if the algorithm is reflected)
    and how the buffer is made to be double-word aligned.
    
    32-bit CRCs use a Barrett's reduction to reduce the buffer enough to be
    double-word aligned and to reduce any excess leftover after folding. As
    the different CRC32 algorithms isa-l supports differ in whether the seed
    is inverted and function signature, the alignment, excess and
    128-bit reduction are defined as macros in crc32_*_common_clmul.h that
    the implementations (crc32_*.S) include and surround with
    algorithm-specific assembly and precomputed constants. This also makes
    it straightforward to reuse the macros to calculate crc16_t10dif.
    
    64-bit CRCs use a table-based reduction to align the buffer and handle
    excess. All isa-l's CRC64 algorithms pass arguments in the same order
    and invert the seed before & after folding, so crc64_*_common_clmul.h
    both contain a macro for defining a CRC64 function with a particular
    name. Then each of the crc64_*.S contain a call to that macro along
    with the precomputed constants and lookup table.
    
    The .h header files added don't contain C code and so are excluded from
    Clang formatting, similarly to the header files defined for aarch64.
    
    Signed-off-by: Daniel Gregory <[email protected]>
    daniel-gregory committed Aug 30, 2024
    Configuration menu
    Copy the full SHA
    b9e6022 View commit details
    Browse the repository at this point in the history
  2. riscv64: Implement crc16_t10dif_copy

    Rather than duplicating all the crc32 4-folding and modifying it to
    write back to the destination the read-in bytes, write a very simple
    memcpy that then tail calls crc16_t10dif. This makes the performance of
    crc16_t10dif_copy much worse than crc16_t10dif, but still about twice as
    fast as crc16_t10dif_copy_base.
    
    Signed-off-by: Daniel Gregory <[email protected]>
    daniel-gregory committed Aug 30, 2024
    Configuration menu
    Copy the full SHA
    a62dd04 View commit details
    Browse the repository at this point in the history