Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Index] Convert to hashing to
HashBuilder
with BLAKE3
`Hashing.h` is non-deterministic between runs. Update the index hashing to use xxhash for the unit hash and BLAKE3 for the record hash. Ideally we'd use xxhash for the record hash as well, but there's no easy `HashBuilder` option for it today. This also removes the hash caching logic from the record hasher, as it turns out to be slower than just hashing everything for BLAKE3 and greatly simplifies the implementation with its removal. Numbers for indexing `Foundation` and `Cocoa` textual includes on an M2 Pro over 10 runs with 3 warmup are as follows. Build with full re-index (ie. index removed between each run) - ``` Current: 688ms +- 8ms BLAKE3: 691ms +- 4ms BLAKE3 cached: 711ms +- 8ms No-op hash: 620ms +- 4ms ``` Same but with an existing index (which would hash but then not write any output) - ``` Current: 396ms +- 4ms BLAKE3: 394ms +- 4ms BLAKE3 cached: 419ms +- 3ms No-op hash: 382ms +- 5ms ``` The no-op hash is a little misleading in the full re-index since it will be writing out fewer records. But the existing index case is interesting, showing that hashing is only a small part of the entire build and index. Also worth noting that there was some fairly significant run-to-run variance of around 30ms, but the above was a generally typical pattern (ie. current about the same as BLAKE3, which is faster than BLAKE3 cached, and no-op is the fastest). The main take away is that this isn't a noticable performance regression.
- Loading branch information