Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster block sync #7101

Merged
merged 10 commits into from
Mar 18, 2024
Merged

Conversation

jgriffiths
Copy link
Contributor

@jgriffiths jgriffiths commented Feb 21, 2024

Per discussion on #6984, adding another introspection wrapper to avoid non-interesting matches is non-optimal. It's a duplicate of the is_xxx script call already exposed, and it doesn't fix the many other places where allocations are made unnecessarily.

It seems the original rationale for the copies is behaviour that was since fixed (see commit descriptions for details). But also, the script introspection functions are inconsistent w.r.t taking lengths. Removing this inconsistency by adding length args exposes a lot of places that were allocating redundantly which this PR addresses. In particular, we avoid allocations for interesting scripts which #6984 does not. We are then able to remove the allocating wrapper calls completely, ending up with less code overall.

There are 2 other drive-by fixes to improve performance: Avoid DB queries for non-interesting script types, and avoid computing unused key fingerprints when identifying wallet addresses. The latter in particular avoids a hash160 for every key tested, which is all used wallet keys plus the gap limit in the case of negative lookups, for every potentially interesting tx output.

The end result is less code, more code consistency and much less memory/cpu usage, especially for large wallets/after many txs have been made.

Reverts/replaces #6984.

cc: @cdecker

See the next commit for context on this revert.

This reverts commit d185b0f.

Signed-off-by: Jon Griffiths <[email protected]>
…ating"

The is_xxx script functions already perform these checks for us without
allocating. They currently expect their memory to be tal-allocated,
which is why a copy is made. However:

- The memory already *is* tal-allocated since wally is configured to do so.
- The tal-dependency is artifical and is removed in a future commit in
  this PR.

This reverts commit c329756.

Signed-off-by: Jon Griffiths <[email protected]>
Standardizes the is_xxx script function all take a script length, and changes
their first-level callers to pass it. This has several knock on benefits:

- We remove the repeated tal_count/tal_bytelen calls on the script, in
  particular the redundant calls that result when we must check for multiple
  types of script - which is almost all cases.
- We remove the dependency on the memory being tal-allocated (It is, in
  all cases, but theres no reason we need to require that).
- We remove all cases where we create a copy of the script just to id it.
- We remove all allocations for non-interesting scripts while iterating block
  txs in process_getfilteredblock_step1().
- We remove all allocations *including for potentially interesting scripts* in
  topo_add_utxos().

Signed-off-by: Jon Griffiths <[email protected]>
Only one caller needs it, and they can trivially and cheaply compute it
themselves.

Signed-off-by: Jon Griffiths <[email protected]>
Avoids a hash160 for every key we derive to test. Callers that need the
key re-derive it without the skip flag, so there are no side effects
from this optimization.

Changelog-Changed: core: Processing blocks should now be faster

Signed-off-by: Jon Griffiths <[email protected]>
The pattern of making tal-allocated copies of wally data to pass around
was made redundant after these calls were added by the use of
tal_wally_start/tal_wally_end to parent wally allocations. We can thus
just pass the data directly and avoid the allocations.

Removes redundant allocations when checking tx filters and computing fees.

Signed-off-by: Jon Griffiths <[email protected]>
@jgriffiths jgriffiths changed the title WIP: Faster block sync Faster block sync Feb 21, 2024
@rustyrussell
Copy link
Contributor

This, like the previous patch it replaces, ARE MISSING BENCHMARKS.

Without that, I have no way of evaluating it. Is processing blocks really a bottleneck? Under what circumstances? Should we stop re-evaluating block on restart, instead? Should we query bitcoind via the bitcoin protocol instead of this hexdump?

@cdecker
Copy link
Member

cdecker commented Feb 22, 2024

I still have the benchmarking setup from #6984 and will run this a couple of times.

@jgriffiths
Copy link
Contributor Author

@rustyrussell despite the PR name, the primary motivation for this change was clean up the code, and remove the non-general optimization from the reverted commit. In particular, removing the extra tx wrapper calls, two of which are based on an outdated premise and are now an anti-pattern (*), and the other which duplicate already available logic. It also makes the script functions more general (they can operate on memory from anywhere) and internally consistent vs the previous mix of some taking lengths and other not. If you look at the complete diff you should be able to see that a bunch of code has been removed and what remains is now simpler/more consistent. Its a happy coincidence that this also removes a bunch of memory allocations and other logic on a previously identified hotspot.

The 2 drive-by optimizations are trivial, especially the key fingerprint one. I probably should have called this PR something else, that's my bad. IMO even if there were no performance improvement, this PR should still be merged. The code will be faster since we are avoiding doing a bunch of stuff we didn't need to, but the e.g. wall-time difference is not the primary metric by which the changes should be judged.

@cdecker Thanks for benchmarking, much appreciated! One thing to note is that as the wallet has more addresses generated, the effect of e.g. not computing the fingerprint grows larger. A simple test of syncing with a new/empty wallet will not reflect this - the longer a wallet has been used, the more difference this will make. If you generate a few hundred/thousand addresses before testing, your result will be more indicative of the effects in the real world.

(*) i.e. force the user to create redundant copies, when the need to do this has been long fixed.

@cdecker
Copy link
Member

cdecker commented Feb 26, 2024

Took me a while to benchmark, since I redid all 3 comparison runs as well, but the results are in:


|-----------------------+------------------------------------------+--------+---------+-----------------+-----------|
| Name                  | Commit                                   | Blocks |    Time | Blocks / second |   Speedup |
|-----------------------+------------------------------------------+--------+---------+-----------------+-----------|
| Baseline              | 3cbe35e7da71c918434510b8051cc9d33fc8a23f |   1000 |  216.57 |       4.6174447 |        0. |
| #6583                 | 5650dfdc24cc19e8534885c5ec6a89eb8f38a262 |   1000 | 183.233 |       5.4575322 | 18.193775 |
| #6583 & #6984         | 0a18732b8b8af7b039835ea5c6c276e0b4f8095e |   1000 |  176.38 |       5.6695770 | 22.786030 |
| #6953 & #6984 & #7101 | afe613cc1a4a2aff6975d288848777c09eae5fa1 |   1000 |  157.20 |       6.3613232 | 37.767176 |

This is a different machine than the one I used to test in #6984, hence the different results. So there is quite a visible performance improvement with this patch added 👍

@rustyrussell rustyrussell merged commit f39d2ee into ElementsProject:master Mar 18, 2024
36 checks passed
@jgriffiths jgriffiths deleted the faster_block_sync branch March 18, 2024 02:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants