Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Index] Convert to hashing to HashBuilder with BLAKE3 #9140

Open
wants to merge 2 commits into
base: stable/20240723
Choose a base branch
from

Commits on Oct 29, 2024

  1. [Index] Convert to hashing to HashBuilder with BLAKE3

    `Hashing.h` is non-deterministic between runs. Update the index hashing
    to use xxhash for the unit hash and BLAKE3 for the record hash. Ideally
    we'd use xxhash for the record hash as well, but there's no easy
    `HashBuilder` option for it today.
    
    This also removes the hash caching logic from the record hasher, as it
    turns out to be slower than just hashing everything for BLAKE3 and
    greatly simplifies the implementation with its removal.
    
    Numbers for indexing `Foundation` and `Cocoa` textual includes on an M2
    Pro over 10 runs with 3 warmup are as follows.
    
    Build with full re-index (ie. index removed between each run) -
    ```
          Current: 688ms +- 8ms
           BLAKE3: 691ms +- 4ms
    BLAKE3 cached: 711ms +- 8ms
       No-op hash: 620ms +- 4ms
    ```
    
    Same but with an existing index (which would hash but then not write any
    output) -
    ```
          Current: 396ms +- 4ms
           BLAKE3: 394ms +- 4ms
    BLAKE3 cached: 419ms +- 3ms
       No-op hash: 382ms +- 5ms
    ```
    
    The no-op hash is a little misleading in the full re-index since it will
    be writing out fewer records. But the existing index case is
    interesting, showing that hashing is only a small part of the entire
    build and index.
    
    Also worth noting that there was some fairly significant run-to-run
    variance of around 30ms, but the above was a generally typical pattern
    (ie. current about the same as BLAKE3, which is faster than BLAKE3
    cached, and no-op is the fastest). The main take away is that this isn't
    a noticable performance regression.
    bnbarham committed Oct 29, 2024
    Configuration menu
    Copy the full SHA
    a5298e5 View commit details
    Browse the repository at this point in the history

Commits on Oct 31, 2024

  1. Configuration menu
    Copy the full SHA
    4daaa11 View commit details
    Browse the repository at this point in the history