-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Index] Convert to hashing to HashBuilder
with BLAKE3
#9140
base: stable/20240723
Are you sure you want to change the base?
Conversation
@swift-ci please test llvm |
@swift-ci please test |
@@ -76,7 +75,9 @@ bool ClangIndexRecordWriter::writeRecord(StringRef Filename, | |||
std::string &Error, | |||
std::string *OutRecordFile) { | |||
|
|||
auto RecordHash = Hasher.hashRecord(IdxRecord); | |||
std::array<uint8_t, 8> RecordHashArr = index::hashRecord(IdxRecord, Ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn’t it be easier if index::hashRecord
returned uint64_t
instead of std::array<uint8_t, 8>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion here. This will have to be done either way, it's just a matter of where. I actually implemented it that way first, but ended up changing it since std::array
is what the HashBuilder
API returns. Happy to change if you'd prefer it.
IndexRecordHasher &Hasher; | ||
|
||
public: | ||
DeclHashVisitor(IndexRecordHasher &Hasher) : Hasher(Hasher) {} | ||
|
||
hash_code VisitDecl(const Decl *D) { | ||
return VisitDeclContext(D->getDeclContext()); | ||
void VisitDecl(const Decl *decl) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What’s the current LLVM naming convention regarding uppercase variables names? I know that we’re trying to move to lowercase variable names in the Swift compiler but I’m not sure about LLVM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure honestly. https://discourse.llvm.org/t/top-level-clang-tidy-options-and-variablename-suggestion-on-codingstandards/58783 was the latest discussion I could find and anecdotally I see a bunch of camelCase in clang
now.
@swift-ci please test llvm |
@swift-ci please test |
`Hashing.h` is non-deterministic between runs. Update the index hashing to use xxhash for the unit hash and BLAKE3 for the record hash. Ideally we'd use xxhash for the record hash as well, but there's no easy `HashBuilder` option for it today. This also removes the hash caching logic from the record hasher, as it turns out to be slower than just hashing everything for BLAKE3 and greatly simplifies the implementation with its removal. Numbers for indexing `Foundation` and `Cocoa` textual includes on an M2 Pro over 10 runs with 3 warmup are as follows. Build with full re-index (ie. index removed between each run) - ``` Current: 688ms +- 8ms BLAKE3: 691ms +- 4ms BLAKE3 cached: 711ms +- 8ms No-op hash: 620ms +- 4ms ``` Same but with an existing index (which would hash but then not write any output) - ``` Current: 396ms +- 4ms BLAKE3: 394ms +- 4ms BLAKE3 cached: 419ms +- 3ms No-op hash: 382ms +- 5ms ``` The no-op hash is a little misleading in the full re-index since it will be writing out fewer records. But the existing index case is interesting, showing that hashing is only a small part of the entire build and index. Also worth noting that there was some fairly significant run-to-run variance of around 30ms, but the above was a generally typical pattern (ie. current about the same as BLAKE3, which is faster than BLAKE3 cached, and no-op is the fastest). The main take away is that this isn't a noticable performance regression.
@swift-ci please test llvm |
Hashing.h
is non-deterministic between runs. Update the index hashing to use xxhash for the unit hash and BLAKE3 for the record hash. Ideally we'd use xxhash for the record hash as well, but there's no easyHashBuilder
option for it today.This also removes the hash caching logic from the record hasher, as it turns out to be slower than just hashing everything for BLAKE3 and greatly simplifies the implementation with its removal.
Numbers for indexing
Foundation
andCocoa
textual includes on an M2 Pro over 10 runs with 3 warmup are as follows.Build with full re-index (ie. index removed between each run) -
Same but with an existing index (which would hash but then not write any output) -
The no-op hash is a little misleading in the full re-index since it will be writing out fewer records. But the existing index case is interesting, showing that hashing is only a small part of the entire build and index.
Also worth noting that there was some fairly significant run-to-run variance of around 30ms, but the above was a generally typical pattern (ie. current about the same as BLAKE3, which is faster than BLAKE3 cached, and no-op is the fastest). The main take away is that this isn't a noticable performance regression.