Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase the input block size for bgzip. #1768

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Commits on Oct 31, 2024

  1. Increase the input block size for bgzip.

    Commit e495718 changed bgzip from unix raw POSIX read() calls to
    hread().  Unfortunately hread gets its buffer size from stat of the
    input file descriptor, which can be 4kb for a pipe.  We're reading
    0xff00 bytes, so this ends up being split over two reads mostly, with
    one or both involving additional memcpys.  This makes the buffered I/O
    worse performing than non-buffered.  In the most extreme cases (cat
    data | bgzip -l0 > /dev/null) this is a two fold slow down.
    
    The easy solution is just to increase the buffer size to something
    sensible.  Currently we play it cautiously and only do this on pipes
    and fifos.
    
    Fixes samtools#1767
    jkbonfield committed Oct 31, 2024
    Configuration menu
    Copy the full SHA
    186d21b View commit details
    Browse the repository at this point in the history

Commits on Nov 14, 2024

  1. Make use of posix_memalign for hfile buffer.

    On AMD EPYC 7713 aligning to cache size boundaries makes a very
    significant difference to fp->backend->read performance in the
    kernel.  A modern Intel CPU did not demonstrate this difference.
    
    x86 often have cache line size of 64 bytes, and apple Arm chips 128
    bytes.  I haven't tested if Arm benefits from alignment during read
    calls, but we can check size with sysconf(_SC_LEVEL1_DCACHE_LINESIZE).
    However to avoid additional autoconfery I just picked 256 as it gives
    us headroom and is simple.
    
    Speed ups on the AMD EPYC:
    
    time bash -c 'for i in `seq 1 30`;do cat < ~/lustre/enwik9| ./bgzip -l5 -@32 > /dev/null;done'
    
    Unaligned
    real    0m45.012s
    user    10m7.661s
    sys     0m58.770s
    
    Aligned
    real    0m30.717s
    user    11m14.004s
    sys     0m32.921s
    
    It is likely this could improve other bits of code too.
    jkbonfield committed Nov 14, 2024
    Configuration menu
    Copy the full SHA
    63e1a18 View commit details
    Browse the repository at this point in the history