Skip to content

Commit

Permalink
[Index] Convert to hashing to HashBuilder with BLAKE3
Browse files Browse the repository at this point in the history
`Hashing.h` is non-deterministic between runs. Update the index hashing
to use xxhash for the unit hash and BLAKE3 for the record hash. Ideally
we'd use xxhash for the record hash as well, but there's no easy
`HashBuilder` option for it today.

This also removes the hash caching logic from the record hasher, as it
turns out to be slower than just hashing everything for BLAKE3 and
greatly simplifies the implementation with its removal.

Numbers for indexing `Foundation` and `Cocoa` textual includes on an M2
Pro over 10 runs with 3 warmup are as follows.

Build with full re-index (ie. index removed between each run) -
```
      Current: 688ms +- 8ms
       BLAKE3: 691ms +- 4ms
BLAKE3 cached: 711ms +- 8ms
   No-op hash: 620ms +- 4ms
```

Same but with an existing index (which would hash but then not write any
output) -
```
      Current: 396ms +- 4ms
       BLAKE3: 394ms +- 4ms
BLAKE3 cached: 419ms +- 3ms
   No-op hash: 382ms +- 5ms
```

The no-op hash is a little misleading in the full re-index since it will
be writing out fewer records. But the existing index case is
interesting, showing that hashing is only a small part of the entire
build and index.

Also worth noting that there was some fairly significant run-to-run
variance of around 30ms, but the above was a generally typical pattern
(ie. current about the same as BLAKE3, which is faster than BLAKE3
cached, and no-op is the fastest). The main take away is that this isn't
a noticable performance regression.
  • Loading branch information
bnbarham committed Aug 17, 2024
1 parent 8f9b972 commit f8030c1
Show file tree
Hide file tree
Showing 8 changed files with 336 additions and 397 deletions.
2 changes: 1 addition & 1 deletion clang/include/clang/Index/IndexRecordWriter.h
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ class IndexRecordWriter {
/// \returns Success if we should continue writing this record, AlreadyExists
/// if the record file has already been written, or Failure if there was an
/// error, in which case \p Error will be set.
Result beginRecord(StringRef Filename, llvm::hash_code RecordHash,
Result beginRecord(StringRef Filename, uint64_t RecordHash,
std::string &Error, std::string *RecordFile = nullptr);

/// Finish writing the record file.
Expand Down
7 changes: 4 additions & 3 deletions clang/lib/Index/ClangIndexRecordWriter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -63,8 +63,7 @@ StringRef ClangIndexRecordWriter::getUSRNonCached(const IdentifierInfo *Name,

ClangIndexRecordWriter::ClangIndexRecordWriter(ASTContext &Ctx,
RecordingOptions Opts)
: Impl(Opts.DataDirPath), Ctx(Ctx), RecordOpts(std::move(Opts)),
Hasher(Ctx) {
: Impl(Opts.DataDirPath), Ctx(Ctx), RecordOpts(std::move(Opts)) {
if (Opts.RecordSymbolCodeGenName)
ASTNameGen.reset(new ASTNameGenerator(Ctx));
}
Expand All @@ -76,7 +75,9 @@ bool ClangIndexRecordWriter::writeRecord(StringRef Filename,
std::string &Error,
std::string *OutRecordFile) {

auto RecordHash = Hasher.hashRecord(IdxRecord);
std::array<uint8_t, 8> RecordHashArr = index::hashRecord(IdxRecord, Ctx);
uint64_t RecordHash = 0;
std::memcpy(&RecordHash, RecordHashArr.data(), RecordHashArr.size());

switch (Impl.beginRecord(Filename, RecordHash, Error, OutRecordFile)) {
case IndexRecordWriter::Result::Success:
Expand Down
1 change: 0 additions & 1 deletion clang/lib/Index/ClangIndexRecordWriter.h
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,6 @@ class ClangIndexRecordWriter {
std::unique_ptr<ASTNameGenerator> ASTNameGen;
llvm::BumpPtrAllocator Allocator;
llvm::DenseMap<const void *, StringRef> USRByDecl;
IndexRecordHasher Hasher;

public:
ClangIndexRecordWriter(ASTContext &Ctx, RecordingOptions Opts);
Expand Down
Loading

0 comments on commit f8030c1

Please sign in to comment.