Fast hash/equality for Model objects #129

jakemac53 · 2024-10-31T15:22:02Z

As a part of performance work, I have been looking into generating hash functions for Model objects (see my WIP branch). It isn't too bad to make a basic implementation, but it is very slow, taking almost an entire second cumalitively computing hashes for each edit in the large JSON benchmark.

My first approach here is to generate "identityHash" functions which do lookups on the node object for each known property, recursively calling "identityHash" on all the nested objects, for example Interface.identityHash looks like this:

  /// Hash code for comparing instances of this extension type.
  int get identityHash => Object.hash(
        Object.hashAll((node['members'] as Map?)
                ?.cast<String, Member>()
                ?.entries
                .map((entry) =>
                    Object.hash(entry.key, entry.value.identityHash)) ??
            const []),
        (node['thisType'] as NamedTypeDesc?)?.identityHash ?? 0,
        Object.hashAll((node['metadataAnnotations'] as List?)
                ?.cast<MetadataAnnotation>()
                ?.map((entry) => entry.identityHash) ??
            const []),
        (node['properties'] as Properties?)?.identityHash ?? 0,
      );

Ultimately the result of this is that even cached macro phases take an unacceptable amount of time (multiple milliseconds), so we will need to come up with something faster and evaluate exactly what is making this so slow.

The text was updated successfully, but these errors were encountered:

jakemac53 · 2024-10-31T15:28:02Z

Some ideas:

It is probably faster to iterate the entries - although we would have to do a switch for the keys to cast things to the correct extension types.
We could also potentially try just using the deep hashing functionality from package:collection and work at the raw map level.
We could potentially build up a hash in the JsonBufferBuilder as we build objects, but unless we also serialize that hash on the wire it would require reading the entire payload each time when we first receive a buffer. We also only want the hash of the Model field, not the entire buffer, which complicates things.

davidmorgan · 2024-10-31T15:44:10Z

Hmmmm when the query finishes on the host, the Model should be the only thing in the buffer, e.g. after the initial query here:

macros/pkgs/_analyzer_macros/lib/macro_implementation.dart

Line 110 in 178c59f

model: await _queryTarget(target.qualifiedName))));

the host could then hash the whole buffer and send the result. It'd be nice to have it as a field on Model, we can only do that today by having a Map field since those can be added to after creation.

That's assuming that hashing the buffer as raw bytes turns out to be fast, I hope so :)

Hashing raw bytes will only do what we want if the bytes are sufficiently deterministic, I don't know if that will be the case, but it's a lot easier than hashing objects so I guess it's worth a try :)

jakemac53 · 2024-10-31T15:54:13Z

We could also potentially try just using the deep hashing functionality from package:collection and work at the raw map level.

I did give this a shot and it was much worse, fwiw (about 3x worse).

jakemac53 · 2024-10-31T16:22:21Z

It is probably faster to iterate the entries - although we would have to do a switch for the keys to cast things to the correct extension types.

I tried this as well and it seems a bit faster but not enough to meaningfully change the result.

jakemac53 · 2024-10-31T17:20:55Z

It looks like just hacking something in to grab the raw bytes and hash those is definitely faster, 150ms or so per edit for the large json example, cumulatively spent doing hash computations. I also edited my branch to save the hash instead of the Model objects for the cached responses, and compare against that, so we do about half as many total hashes. I haven't played around with the different hash algorithms much to see which is fastest. I tried sha1, md5, and sha256, sha1/md5 were close but sha256 was about twice as slow.

Still a lot of time to spend hashing, and I don't love that it is dependent on specific ordering in which the buffer is built up (technically, my other approach also was though, but it would have been relatively trivial to make it not so).

Given that this now only ever actually hashes a given Model object once ever, I don't believe it is worth trying to compute the hash on the fly and store it in the buffer.

davidmorgan · 2024-11-01T09:35:06Z

I read a bit about what the google3 build does, there is a public doc about a specialized hash, PSHA2, it's based on SHA256 with tweaks to add parallelization so it can run using SIMD instructions, i.e. it uses parallelization within a single CPU. The parallel part of psha256 kicks in for message sizes of 1024 bytes so it looks like it won't be too hard to hit it.

Comparing the command line versions of md5sum, sha256sum, sha1sum and psha2sum on my machine using 1GB of input: md5 hits 0.54GB/s, sha256 is 1.08GB/s, sha1 is 1.23GB/s and psha256 is 1.56GB/s.

Using the package:crypto example script AOT-compiled the numbers are: md5 0.17GB/s, sha1 0.10GB/s, sha256 0.08GB/s

Based on these numbers, if you were using the package:crypto implementations, it looks like there would be about 9x speedup with native-performance PSHA2, 7x with native-performance SHA1, 3x with native-performance md5sum.

I wonder how much of that native-performance boost we could get with ffi? But actually these are so important, I could see an argument for directly supporting them in the platform, e.g. put them in dart:io.

davidmorgan · 2024-11-01T10:09:09Z

Some bazel discussion bazelbuild/bazel#22011 mentions blake3, which looks even faster, b3sum hits 7.87GB/s, mostly because it also parallelizes across multiple cores: with 1 thread (b3sum --num-threads 1) it's 1.83GB/s which is just a bit faster than PSHA2, then it gets considerably faster as I add 1-3 more threads.

mraleph · 2024-11-01T11:38:17Z

Do you really need to compute hash to compare things? Can't you just (for example) do a simple byte-by-byte comparison of the incoming data? It seems you are just using it a proxy for equality anyway.

That being said: I puzzled as to why we are trying to solve all these problems externally to the tools which are supposed to have all information necessary to short cut the what has changed computation. CFE is supposed to have fine grained incremental recompilation of the dependencies, analyzer does not - but should eventually implement it anyway. It seems like we are reinventing the same calculation with macros specific twists.

davidmorgan · 2024-11-01T12:43:41Z

Do you really need to compute hash to compare things? Can't you just (for example) do a simple byte-by-byte comparison of the incoming data? It seems you are just using it a proxy for equality anyway.

If a match means there is no more work to do then it's nice to get a match based on hashes, because then you can get a match without having to store all possible matches as full data.

That being said: I puzzled as to why we are trying to solve all these problems externally to the tools which are supposed to have all information necessary to short cut the what has changed computation. CFE is supposed to have fine grained incremental recompilation of the dependencies, analyzer does not - but should eventually implement it anyway. It seems like we are reinventing the same calculation with macros specific twists.

Neither analyzer nor CFE has a data model that's immediately useful to macros, because they are private to the analyzer and CFE, which means you can't code against them--they change. So, macros have their own data model that is public and stable. (The JSON representation and corresponding binary format and extension types).

Macros describe what data they need as a query, the host (analyzer or CFE) converts its own data model to the macro model and sends it in response.

Macros usually only care about a part of the code, for example fields in classes with a particular annotation and their types, so what each macro receives is significantly cut down from the full host model. This also means that it should be very common that when a file changes the macro does not have to rerun: something changed but it wasn't what the macro cares about.

This investigation is about noticing that the data being sent to a macro is the same as last time, and so the output from last time can be reused. "The same as last time" is easy to check by keeping a hash from last time and comparing.

It's true that we could perhaps optimize further by pushing some part of the "same as last time" check before the conversion to the macro data model, so for example if the CFE could compare what changed against the macro query before it even starts to do the conversion. But this would be a lot more work to do, and it's possible than convert-then-compare gets us most of the performance, so we obviously check that first.

jakemac53 · 2024-11-01T14:44:44Z

That being said: I puzzled as to why we are trying to solve all these problems externally to the tools which are supposed to have all information necessary to short cut the what has changed computation.

We have to implement it twice that way, which is extra work as well as a larger potential for bugs. The question of "what does this macro depend on" is fairly challenging to answer, and the query results hash model is quite trivial and reliable.
It is probably a lot more expensive, especially in terms of memory, to maintain back-links across the program to the macro applications which depend on those nodes. The current solution we are exploring just needs to store a single hash per query that the macro runs. We do trade off CPU cost for that, to recompute the queries and hash them, absolutely. We are just exploring now the actual cost of re-running those queries and hashing the results.
The hashing approach also could be more compatible with re-using macro results in modular compiler. Although we probably won't actually do this.

jakemac53 · 2024-11-06T21:38:59Z

I hooked up the CPU profiler, from head but with #134 applied.

Here are some noteworthy things, from a profile spanning a single incremental edit:

about 27% of the total time is still spent computing hashes. We definitely need to improve that.

I don't know exactly all this is doing, but it seems pretty expensive and if we were more deeply integrated we could possibly avoid re-doing this work when it is cached cc @scheglov .

We are spending >10% of the time just re-running the queries to see what has been changed. It looks like almost half that time is spent just creating typed maps though (almost all the created objects here are a result of queries), which doesn't even account for all the work spent building up these objects.
This indicates we need to optimize the speed for creating these objects.

This isn't too unexpected, we do expect a lot of the time to go to parsing, since every library file gets re-parsed, and all augmentation code is parsed twice I believe.

jakemac53 · 2024-11-06T21:43:28Z

Fwiw, this is my launch_config.json, which assumes you have already generated a benchmark to run (dart tool/benchmark_generator/bin/main.dart large macro 64 for example):

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "benchmark_debug",
            "request": "launch",
            "type": "dart",
            "program": "pkgs/_macro_tool/bin/main.dart",
            "args": [
                "--workspace=goldens/foo",
                "--packageConfig=.dart_tool/package_config.json",
                "--script=goldens/foo/lib/generated/large/a0.dart",
                "--host=analyzer",
                "--watch"
            ],
            "cwd": "${workspaceFolder}"
        },
    ]
}

scheglov · 2024-11-06T21:52:12Z

One of the previously recorded tree of performance operations in dart-lang/sdk#55784 (comment) provides details what we do in Linker._mergeMacroAugmentations.

Merge separate macro results into single code.
Parse.
Build unlinked data. Mostly hashing.

Similar data internally.

scheglov · 2024-11-06T22:06:51Z

See also my previous benchmarks for hashing, Dart vs. Rust.
Or, maybe more fairly, different algorithms, some of which can be optimized for modern CPUs.

jakemac53 · 2024-11-06T22:13:06Z

See also my previous benchmarks for hashing, Dart vs. Rust.

Fwiw, the actual hashing is not the problem in this particular case, it is the work to pull out the interesting bits of the objects that we want to hash that is expensive.

In my PR I am just using Object.hash and Object.hashAll, at least initially, and they consume combined only about 300ms, which isn't nothing to be sure but its not the primary issue by any means.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast hash/equality for Model objects #129

Fast hash/equality for Model objects #129

jakemac53 commented Oct 31, 2024

jakemac53 commented Oct 31, 2024 •

edited

Loading

davidmorgan commented Oct 31, 2024

jakemac53 commented Oct 31, 2024

jakemac53 commented Oct 31, 2024

jakemac53 commented Oct 31, 2024 •

edited

Loading

davidmorgan commented Nov 1, 2024

davidmorgan commented Nov 1, 2024

mraleph commented Nov 1, 2024 •

edited

Loading

davidmorgan commented Nov 1, 2024

jakemac53 commented Nov 1, 2024

jakemac53 commented Nov 6, 2024

jakemac53 commented Nov 6, 2024

scheglov commented Nov 6, 2024 •

edited

Loading

scheglov commented Nov 6, 2024

jakemac53 commented Nov 6, 2024

Fast hash/equality for Model objects #129

Fast hash/equality for Model objects #129

Comments

jakemac53 commented Oct 31, 2024

jakemac53 commented Oct 31, 2024 • edited Loading

davidmorgan commented Oct 31, 2024

jakemac53 commented Oct 31, 2024

jakemac53 commented Oct 31, 2024

jakemac53 commented Oct 31, 2024 • edited Loading

davidmorgan commented Nov 1, 2024

davidmorgan commented Nov 1, 2024

mraleph commented Nov 1, 2024 • edited Loading

davidmorgan commented Nov 1, 2024

jakemac53 commented Nov 1, 2024

jakemac53 commented Nov 6, 2024

jakemac53 commented Nov 6, 2024

scheglov commented Nov 6, 2024 • edited Loading

scheglov commented Nov 6, 2024

jakemac53 commented Nov 6, 2024

jakemac53 commented Oct 31, 2024 •

edited

Loading

jakemac53 commented Oct 31, 2024 •

edited

Loading

mraleph commented Nov 1, 2024 •

edited

Loading

scheglov commented Nov 6, 2024 •

edited

Loading