-
Notifications
You must be signed in to change notification settings - Fork 836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bonsai cleaning and robustness #5123
Conversation
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
ethereum/core/src/main/java/org/hyperledger/besu/ethereum/bonsai/AbstractTrieLogManager.java
Fixed
Show fixed
Hide fixed
ethereum/core/src/main/java/org/hyperledger/besu/ethereum/bonsai/AbstractTrieLogManager.java
Fixed
Show fixed
Hide fixed
ethereum/core/src/main/java/org/hyperledger/besu/ethereum/bonsai/TrieLogManager.java
Fixed
Show fixed
Hide fixed
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: garyschulte <[email protected]>
Signed-off-by: garyschulte <[email protected]>
…kito stuff Signed-off-by: garyschulte <[email protected]>
Signed-off-by: garyschulte <[email protected]>
e56a055
to
56a0a1f
Compare
...core/src/main/java/org/hyperledger/besu/ethereum/bonsai/CachedSnapshotWorldstateManager.java
Outdated
Show resolved
Hide resolved
…ogmanager, cache accumulator AND snapshot Bonsai*UpdateAccumulator is broken currently in that it does not correctly compose parent accumulator Signed-off-by: garyschulte <[email protected]>
Signed-off-by: garyschulte <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
2b94a9a
to
44cc113
Compare
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
Signed-off-by: garyschulte <[email protected]>
…ring to accumulator from BonsaiWorldState Signed-off-by: garyschulte <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
…ateLayeredStorage Signed-off-by: garyschulte <[email protected]>
…a getMutable() Signed-off-by: garyschulte <[email protected]>
Signed-off-by: Karim TAAM <[email protected]>
ethereum/core/src/main/java/org/hyperledger/besu/ethereum/bonsai/trielog/TrieLogManager.java
Fixed
Show fixed
Hide fixed
Signed-off-by: garyschulte <[email protected]>
…d state storage" This reverts commit 55aa031. Signed-off-by: garyschulte <[email protected]>
...va/org/hyperledger/besu/ethereum/bonsai/storage/BonsaiSnapshotWorldStateKeyValueStorage.java
Show resolved
Hide resolved
.../main/java/org/hyperledger/besu/ethereum/bonsai/storage/BonsaiWorldStateKeyValueStorage.java
Show resolved
Hide resolved
} | ||
|
||
@Override | ||
public boolean isWorldStateAvailable(final Hash rootHash, final Hash blockHash) { | ||
return trieLogManager.getBonsaiCachedWorldState(blockHash).isPresent() | ||
return trieLogManager.containWorldStateStorage(blockHash) | ||
|| persistedState.blockHash().equals(blockHash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference between persistedState and worldStateStorage is still cryptic. would persistedState be better called "workingState" since it is the one "main" state being worked on and updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
persistedState
here is the one and only persisted BonsaiWorldState and worldStateStorage
is its storage. Since the worldstate storage is accessible via BonsaiWorldState, it is a bit redundant to have both as member variables. it would be a longer conditional, but this could be rewritten as:
return trieLogManager.containWorldStateStorage(blockHash)
|| persistedState.blockHash().equals(blockHash)
|| persistedState.getWorldStateStorage().isWorldStateAvailable(rootHash, blockHash);
one could also argue that the second and third conditions are functionally equivalent if we want simplicity/clarity.
Signed-off-by: garyschulte <[email protected]>
e1cf242
to
4d05693
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review still in progress
.flatMap( | ||
bonsaiWorldState -> | ||
rollMutableStateToBlockHash(bonsaiWorldState, blockHeader.getHash())) | ||
.map(MutableWorldState::freeze); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest separating this else case into its own method, since returning a "frozen" MutableWorldState doesn't seem very mutable anymore.
} | ||
} | ||
|
||
public synchronized void updateWorldStateStorage( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest renaming this "viewNewStorage" or "looksAt" so as not to imply that we are updating it, rather we are updating the storage being viewed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we don't just replace this whole object in the cache rather than updating the main piece of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reason is/was because there is an implied chain of "parent" storages via worldViewSubscriberId. If we replaced the object in the cache we would have a "ripple" effect where we have to update all cache entries that represent descendents. We wanted to minimize the need for locking and potential for concurrency issues.
If we just replace the storage itself, there is no ripple and it can be a fairly cheap synchronized operation.
This is a good docs topic since it isn't obvious that the subscription implies that chain and why that chain exists (for layered worldstates and for handling the releasing of snapshots, etc)
protected synchronized void tryClose() throws Exception { | ||
if (shouldClose.get() && subscribers.getSubscriberCount() < 1) { | ||
doClose(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else case might warrant some logging so we can figure out where failures to close come from.
final BonsaiWorldStateUpdater localUpdater, | ||
final BonsaiWorldStateArchive worldStateArchive, | ||
final BonsaiPersistedWorldState forWorldState) { | ||
final BlockHeader blockHeader, final BonsaiWorldStateUpdateAccumulator localUpdater) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
way simpler!
return trieLog; | ||
} | ||
|
||
synchronized void scrubCachedLayers(final long newMaxHeight) { | ||
public synchronized void scrubCachedLayers(final long newMaxHeight) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might warrant a javadoc comment explaining how this fits into the big picture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will add a docs topic for evicting worldstates from cache 👍
.map(CachedWorldState::getMutableWorldState); | ||
} | ||
return Optional.empty(); | ||
public boolean containWorldStateStorage(final Hash blockHash) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public boolean containWorldStateStorage(final Hash blockHash) { | |
public boolean containsWorldStateStorage(final Hash blockHash) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this moved? it is showing up as a completely new addition, can't see any diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDK the git history for it, but it is an amalgam of the now deleted Persisted, Snapshot and InMemory worldstate behaviors.
@@ -65,35 +70,36 @@ public class BonsaiWorldStateUpdater extends AbstractWorldUpdater<BonsaiWorldVie | |||
private final Map<Address, StorageConsumingMap<BonsaiValue<UInt256>>> storageToUpdate = | |||
new ConcurrentHashMap<>(); | |||
|
|||
BonsaiWorldStateUpdater(final BonsaiWorldView world) { | |||
this(world, (__, ___) -> {}, (__, ___) -> {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THANK YOU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I enjoyed reading that, reminds me of my scala days 😆
.sorted( | ||
Comparator.comparingLong( | ||
view -> Math.abs(blockHeader.getNumber() - view.getBlockNumber()))) | ||
.map(CachedBonsaiWorldView::getWorldStateStorage) | ||
.findFirst(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: equivalent to using .max() I think...
.sorted( | |
Comparator.comparingLong( | |
view -> Math.abs(blockHeader.getNumber() - view.getBlockNumber()))) | |
.map(CachedBonsaiWorldView::getWorldStateStorage) | |
.findFirst(); | |
.max( | |
Comparator.comparingLong( | |
view -> Math.abs(blockHeader.getNumber() - view.getBlockNumber()))) | |
.map(CachedBonsaiWorldView::getWorldStateStorage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max(abs(...)) will get the block with the greatest difference, whereas we want a block with the least difference. But to your point we could do min(abs(...)) and should end up with O(1) instead of O(n log n). 👍
this, blockchain, worldStateStorage, maxLayersToLoad.orElse(RETAINED_LAYERS)); | ||
this.blockchain = blockchain; | ||
this.worldStateStorage = worldStateStorage; | ||
this.persistedState = new BonsaiWorldState(this, worldStateStorage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a Java brain-fart moment...how can we reference this
inside BonsaiWorldStateProvider constructor before BonsaiWorldStateProvider instance is fully initialised?
e.g. what if BonsaiWorldState constructor called archive.getCachedMerkleTrieLoader()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a theoretical risk, but not currently one because the archive is the only user of that object until the archive is fully constructed.
However, I have never been fond of having an archive reference in worldstate. In a future PR IMO we could/should create functional interface and/or function references for the bits of the archive functionality that a worldstate needs so we can break the circular archive<->worldstate dependency.
@@ -65,35 +70,36 @@ public class BonsaiWorldStateUpdater extends AbstractWorldUpdater<BonsaiWorldVie | |||
private final Map<Address, StorageConsumingMap<BonsaiValue<UInt256>>> storageToUpdate = | |||
new ConcurrentHashMap<>(); | |||
|
|||
BonsaiWorldStateUpdater(final BonsaiWorldView world) { | |||
this(world, (__, ___) -> {}, (__, ___) -> {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I enjoyed reading that, reminds me of my scala days 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the fact this deletes more code than it adds: +1,951 −3,009
I see that quite a few unit tests are modified and redundant test code deleted which is great. I may have missed it but I was expecting to see some new tests that covers the new functionality.
I ran coverage for bonsai
package and it seems quite low, which concerns me, particularly for the potential of future regressions.
return Optional.empty(); | ||
} | ||
} catch (final RuntimeException re) { | ||
LOG.trace("Archive rolling failed for block hash " + blockHash, re); | ||
LOG.info("Archive rolling failed for block hash " + blockHash, re); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should this be error or warn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed I think that Error would be more appropriate
try { | ||
worldStateStorage.close(); | ||
} catch (Exception e) { | ||
// no op |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we swallowing this exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it stands presently, this close() won't throw unless there is a system level failure which we wouldn't be able to handle (such as an exception from rocksdb)
Karim and I have discussed a follow-on PR that implements a more robust shutdown behavior on exit. The shutdown PR will modify this behavior and in the case of an exception on closing, we can/will at least log the cause of the failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not worth a log at least?
} | ||
} | ||
|
||
public synchronized void updateWorldStateStorage( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we don't just replace this whole object in the cache rather than updating the main piece of it?
BonsaiWorldStateUpdater(final BonsaiWorldView world) { | ||
this(world, (__, ___) -> {}, (__, ___) -> {}); | ||
} | ||
private boolean isAccumulatorStateChanged; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this thread-safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not modified in a concurrent way so it's good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be nice to use the @ThreadSafe
annotation as documentation to show it's been considered (on the whole class if it's thread safe too)
@Override | ||
public KeyValueStorageTransaction getSnapshotTransaction() { | ||
return snapTx; | ||
private void throwIfClosed() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Erroring early, nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised this class hasn't been removed or renamed...is it tied to FOREST as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is still used for bonsai layeredKeyValueStorage and in tests for in-memory worldstates. Forest doesn't directly use it in production (but could/might in integration tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doesn't this live in the bonsai package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this class is generic and could even be theoretically used for Forest
For this it is planned to make a pass to think about testing bonsai correctly. We have planned a next step to build tests that would be good quality tests and not just for a good coverage. We chatted with @garyschulte and @non-fungible-nelson about that. I don't think that just unit tests will be enough to avoid regressions, more complex tests are needed |
Bonsai-safe refactor Signed-off-by: Karim TAAM <[email protected]> Co-authored-by: garyschulte <[email protected]>
Bonsai-safe refactor Signed-off-by: Karim TAAM <[email protected]> Co-authored-by: garyschulte <[email protected]>
PR description
The idea of this refactor is just to make the code more readable and less dangerous to maintain and update.
Current status
Problems :
its number of Worldstate (InMemory, Layered, Snapshot, Persisted). This made the code very unreadable and complicated to update.
the worldstate copy mechanism which meant that we had to use this code everywhere and we had to make copies of copies which could get us lost and prevent us from knowing what we had at the end (InMemory, Snapshot, Persisted? Layered )?
the multiple layers of subscribers. Because of these copies, we could have a copy that subscribes to another copy that itself subscribes to a snapshot in the cache etc. It became complicated to find the logic to properly unsubscribe paved the way for many regressions when modifying codes
the fact of creating a snapshot on a persistent state that may be in a transient state
Class naming
Proposition
The proposal is to delete all the different worldstates to have only one worldstate. The difference will reside in the type of storage that we have
Snapshot storage
Layered storage
Persisted storage
a worldstate can be frozen or not, which means that it will not try to modify its storage. If it is frozen it will be faster to calculate the root hash because it will have fewer steps
We will also have the same logic for all the worldstates and we will not need to duplicate the code or try to understand what a worldstate does or what a worldstate does not do. they all do the same thing. What changes is the storage we use and is it frozen or not.
Another improvement is the simplification of the BonsaiWorldStateArchive which no longer needs to be complex. It will also be renamed to BonsaiWorldStateProvider to better understand its role.
The provider will just do some simple step :
Do we want to persist our modifications ?
The snapshot of the persisted state is always added before it is needed, at the time of the fcu or at the start of Besu in order to never create a snapshot of a transient state
We no longer have an InMemory. The layered will be used for all cases where we do not want to change the persisted state. The trielogs will just be used for rollback or rollfoward but no more to find the status of an account or a storage
We will no longer need to copy etc. We store snapshot storage in the cache and when we need it we wrap it in a layered storage it and create a worldstate from that. This makes it possible to greatly simplify the subscribe mechanism to its minimum and no longer need to copy worldtstates.
Finally, the bonsai updater is renamed to accumulator in order to better explain its role
Performance metrics
Block Processing and FCU times
Block performance and FCU times are at least similar to 23.1.0 release or even better on a 32 GiB VM.
On a 16 GiB/4 vCPU AWS VM (m6a.xlarge)
On a 32 GiB/8 vCPU AWS VM (m6a.2xlarge)
CPU usage
This PR fixes an issue we had on versions 23.1.0 and 23.1.1, which is the short CPU spikes. We can see on both 4vCPU and 8 vCPU, Besu doesn't have anymore the short CPU spikes we can see on version 23.1.0.
On a 16 GiB/4 vCPU AWS VM (m6a.xlarge)
On a 32 GiB/8 vCPU AWS VM (m6a.2xlarge)
Memory management
Memory management is better with this PR than 23.1.0. We can see in the screenshots below that Besu is able to release more memory with PR during Garbage Collection and thus consume less memory at the process level (RSS memory). GC activity is pretty similar on this PR compared to 23.1.0.
I didn't notice any OutOfMemory error or OOM killer event both configurations (16 and 32 GiB VMs).
On a 16 GiB/4 vCPU AWS VM (m6a.xlarge) (-Xmx5g)
GC activity (PR #5123)
GC activity (version 23.1.0)
On a 32 GiB/8 vCPU AWS VM (m6a.2xlarge) (-Xmx8g)
GC activity (PR #5123)
GC activity (version 23.1.0)
Disk and network IO
Disk and Network IO is very close on both configurations
On a 16 GiB/4 vCPU AWS VM (m6a.xlarge) (-Xmx5g)
On a 32 GiB/8 vCPU AWS VM (m6a.2xlarge) (-Xmx8g)
Performance of some ETH RPC Calls (2 active users, during 5 minutes), the time displayed below is in ms
Fixed Issue(s)
Documentation
doc-change-required
label to this PR ifupdates are required.
Acceptance Tests (Non Mainnet)
./gradlew acceptanceTestNonMainnet
locally if my PR affects non-mainnet modules.Changelog