Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import export trie log #6363

Merged
merged 51 commits into from
Jan 22, 2024
Merged
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
16c0a49
Add x-trie-log subcommand for one-off backlog prune
siladu Nov 20, 2023
7dd4928
long -> int
siladu Nov 20, 2023
bf2b098
Removed banned method
gfukushima Dec 12, 2023
e67ae51
Preload process stream in parallel
gfukushima Dec 12, 2023
9b4e0c9
Drop unwanted trielogs and keep reatain layers only
gfukushima Dec 14, 2023
0b9fe83
Add output to user and cleanup refactor
gfukushima Dec 15, 2023
426848e
small tweak to display cf that had reference dropped by RocksDbSegmen…
gfukushima Dec 15, 2023
7401b59
spotless
gfukushima Dec 15, 2023
1b7fb72
Fix classes that changed package
gfukushima Dec 15, 2023
11e6b05
spotless
gfukushima Dec 15, 2023
f2d01e2
Code review
gfukushima Dec 15, 2023
04f1aaa
Only clear DB when we have the exact amount of trie logs we want in m…
gfukushima Dec 15, 2023
2f01c5a
Trielogs stream to and from file to avoid possibly OOM
gfukushima Dec 18, 2023
56e4c8e
Process trie logs in chunks to avoid OOM
gfukushima Dec 18, 2023
78561b0
save and read in batches to handle edge cases
gfukushima Dec 19, 2023
42c72cf
save and read files to/from database dir
gfukushima Dec 20, 2023
9961fc2
Merge branch 'main' into x-trie-log-subcommand-2
gfukushima Dec 20, 2023
9389540
add unit tests and PR review fixes
gfukushima Dec 21, 2023
e3d4fbc
Merge branch 'main' into x-trie-log-subcommand-2
gfukushima Dec 21, 2023
c7144fe
spdx
gfukushima Dec 21, 2023
20b0ba5
Fix unit tests directory creation and deletion
gfukushima Dec 21, 2023
586ab25
rename Xbonsai-trie-log-pruning-enabled to Xbonsai-limit-trie-logs-en…
gfukushima Jan 4, 2024
67e6f3d
Import and export trie log subcommands
gfukushima Jan 4, 2024
b9640e5
PR review
gfukushima Jan 4, 2024
3bc1878
spotless
gfukushima Jan 4, 2024
d47ddf5
fix path resolver and added unit tests
gfukushima Jan 7, 2024
999edb6
Merge branch 'main' into import-export-trie-log
gfukushima Jan 8, 2024
1699fe4
fix unit test
gfukushima Jan 8, 2024
e679cb3
Merge remote-tracking branch 'origin/import-export-trie-log' into imp…
gfukushima Jan 8, 2024
5d3b4f2
fix unit test
gfukushima Jan 8, 2024
f839b75
Merge branch 'main' into import-export-trie-log
gfukushima Jan 8, 2024
0caa4cf
Add import and export to list of subcommands under --x-trie-log
gfukushima Jan 8, 2024
2d5d31d
Merge remote-tracking branch 'origin/import-export-trie-log' into imp…
gfukushima Jan 8, 2024
37df23e
Remove static from setup method
gfukushima Jan 8, 2024
5ce1800
change option name and fix descriptions
gfukushima Jan 8, 2024
98423dc
Merge branch 'main' into import-export-trie-log
gfukushima Jan 8, 2024
cf3a5e6
Fix subcommands descriptions
gfukushima Jan 9, 2024
c759bba
Merge branch 'main' into import-export-trie-log
gfukushima Jan 9, 2024
5fb9413
Remove old flag and move commands const into Unstable
gfukushima Jan 12, 2024
087c54b
Allow list of block hashes to passed as well as a file to be generate…
gfukushima Jan 12, 2024
4b033a3
Merge remote-tracking branch 'origin/import-export-trie-log' into imp…
gfukushima Jan 12, 2024
3a89ac3
Allow list of block hashes to passed as well as a file to be generate…
gfukushima Jan 12, 2024
9be7d13
Fix broken test when replaced the old option
gfukushima Jan 12, 2024
75d1c3b
import and export using rlp
jframe Jan 19, 2024
5eb4cda
Merge branch 'main' into import-export-trie-log
jframe Jan 19, 2024
5425075
tests for exporting and importing multiple trielogs
jframe Jan 19, 2024
2de0b19
Merge branch 'import-export-trie-log' of github.com:gfukushima/besu i…
jframe Jan 19, 2024
248a776
Merge branch 'main' into import-export-trie-log
jframe Jan 19, 2024
55d653e
fix build
jframe Jan 19, 2024
c9d38f0
Merge branch 'main' into import-export-trie-log
jframe Jan 19, 2024
b9d7620
Merge branch 'main' into import-export-trie-log
jframe Jan 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,11 @@ public class DataStorageOptions implements CLIOptions<DataStorageConfiguration>
private static final String BONSAI_STORAGE_FORMAT_MAX_LAYERS_TO_LOAD =
"--bonsai-historical-block-limit";

private static final String BONSAI_TRIE_LOG_PRUNING_ENABLED =
"--Xbonsai-trie-log-pruning-enabled";

private static final String BONSAI_LIMIT_TRIE_LOGS_ENABLED = "--Xbonsai-limit-trie-logs-enabled";
siladu marked this conversation as resolved.
Show resolved Hide resolved

// Use Bonsai DB
@Option(
names = {DATA_STORAGE_FORMAT},
Expand All @@ -65,7 +70,7 @@ static class Unstable {

@CommandLine.Option(
hidden = true,
names = {"--Xbonsai-trie-log-pruning-enabled"},
names = {BONSAI_LIMIT_TRIE_LOGS_ENABLED, BONSAI_TRIE_LOG_PRUNING_ENABLED},
siladu marked this conversation as resolved.
Show resolved Hide resolved
description = "Enable trie log pruning. (default: ${DEFAULT-VALUE})")
private boolean bonsaiTrieLogPruningEnabled = DEFAULT_BONSAI_TRIE_LOG_PRUNING_ENABLED;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
package org.hyperledger.besu.cli.subcommands.storage;

import static com.google.common.base.Preconditions.checkArgument;
import static java.util.Collections.singletonList;
import static org.hyperledger.besu.controller.BesuController.DATABASE_PATH;

import org.hyperledger.besu.datatypes.Hash;
Expand Down Expand Up @@ -97,16 +98,15 @@ private static void processTrieLogBatches(
final String batchFileNameBase) {

for (long batchNumber = 1; batchNumber <= numberOfBatches; batchNumber++) {

final String batchFileName = batchFileNameBase + "-" + batchNumber;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might make more sense to have first/last block numbers included in the filename. Otherwise it won't be clear what is actually in the files after an export

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The batch filenames aren't used as part of import/export subcommands the filename is taken from command line args instead. This is only used for the prune subcommand

final long firstBlockOfBatch = chainHeight - ((batchNumber - 1) * BATCH_SIZE);

final long lastBlockOfBatch =
Math.max(chainHeight - (batchNumber * BATCH_SIZE), lastBlockNumberToRetainTrieLogsFor);

final List<Hash> trieLogKeys =
getTrieLogKeysForBlocks(blockchain, firstBlockOfBatch, lastBlockOfBatch);

saveTrieLogBatches(batchFileNameBase, rootWorldStateStorage, batchNumber, trieLogKeys);
LOG.info("Saving trie logs to retain in file (batch {})...", batchNumber);
siladu marked this conversation as resolved.
Show resolved Hide resolved
saveTrieLogBatches(batchFileName, rootWorldStateStorage, trieLogKeys);
}

LOG.info("Clear trie logs...");
Expand All @@ -118,15 +118,12 @@ private static void processTrieLogBatches(
}

private static void saveTrieLogBatches(
final String batchFileNameBase,
final String batchFileName,
final BonsaiWorldStateKeyValueStorage rootWorldStateStorage,
final long batchNumber,
final List<Hash> trieLogKeys) {

LOG.info("Saving trie logs to retain in file (batch {})...", batchNumber);

try {
saveTrieLogsInFile(trieLogKeys, rootWorldStateStorage, batchNumber, batchFileNameBase);
saveTrieLogsInFile(trieLogKeys, rootWorldStateStorage, batchFileName);
} catch (IOException e) {
LOG.error("Error saving trie logs to file: {}", e.getMessage());
throw new RuntimeException(e);
Expand Down Expand Up @@ -210,9 +207,8 @@ private static void recreateTrieLogs(
final String batchFileNameBase)
throws IOException {
// process in chunk to avoid OOM

IdentityHashMap<byte[], byte[]> trieLogsToRetain =
readTrieLogsFromFile(batchFileNameBase, batchNumber);
final String batchFileName = batchFileNameBase + "-" + batchNumber;
IdentityHashMap<byte[], byte[]> trieLogsToRetain = readTrieLogsFromFile(batchFileName);
final int chunkSize = ROCKSDB_MAX_INSERTS_PER_TRANSACTION;
List<byte[]> keys = new ArrayList<>(trieLogsToRetain.keySet());

Expand Down Expand Up @@ -265,11 +261,10 @@ private static void validatePruneConfiguration(final DataStorageConfiguration co
private static void saveTrieLogsInFile(
final List<Hash> trieLogsKeys,
final BonsaiWorldStateKeyValueStorage rootWorldStateStorage,
final long batchNumber,
final String batchFileNameBase)
final String batchFileName)
throws IOException {

File file = new File(batchFileNameBase + "-" + batchNumber);
File file = new File(batchFileName);
if (file.exists()) {
LOG.error("File already exists, skipping file creation");
return;
Expand All @@ -285,11 +280,10 @@ private static void saveTrieLogsInFile(
}

@SuppressWarnings("unchecked")
private static IdentityHashMap<byte[], byte[]> readTrieLogsFromFile(
final String batchFileNameBase, final long batchNumber) {
static IdentityHashMap<byte[], byte[]> readTrieLogsFromFile(final String batchFileName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non blocking feedback -

I am all for code reuse, but if we are going to allow for arbitrary import and export, the import files should be more readable and "createable".

The ObjectOutputStream seems fine for backup/recovery of a pruning process, but when part of a general import/export process this file format is too inscrutable IMO.

At least for import/export we should serialize/deserialize these as json maps. Key as the hash string, and the trielog itself as hex (or as a rich json object if we wanted to be super transparent). In addition to being a bit more introspectable, it would allow us to create and import our own handcrafted trielogs when debugging


IdentityHashMap<byte[], byte[]> trieLogs;
try (FileInputStream fis = new FileInputStream(batchFileNameBase + "-" + batchNumber);
try (FileInputStream fis = new FileInputStream(batchFileName);
ObjectInputStream ois = new ObjectInputStream(fis)) {

trieLogs = (IdentityHashMap<byte[], byte[]>) ois.readObject();
Expand Down Expand Up @@ -357,5 +351,30 @@ static void printCount(final PrintWriter out, final TrieLogCount count) {
count.total, count.canonicalCount, count.forkCount, count.orphanCount);
}

static void importTrieLog(
final BonsaiWorldStateKeyValueStorage rootWorldStateStorage,
final Path dataDirectoryPath,
final Hash trieLogHash) {
final String trieLogFile =
dataDirectoryPath.resolve(DATABASE_PATH).resolve(trieLogHash.toString()).toString();

var trieLog = readTrieLogsFromFile(trieLogFile);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not sure how important this is so feel free to ignore it doesn't make sense. But having the format in RLP to match the storage in the database trielog column family might make comparison easier without the extra encoding of the IdentityMap. Depends on the purpose of this tool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added this subcommands to allow extraction and insertion of trie logs mostly for debug or troubleshooting purposes, but a version that imports and exports RLP could be very useful too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd lean towards a more readable format rather than binary as it's hard to work with. Agree RLP is a more useful middle ground than custom binary (if that's what this is).
e.g. I wonder if this could be used to import test data...compare with RlpBlockImporter.java and JsonBlockImporter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have us to extend the current feature to allow the export to RLP reusing this feature if necessary, it isn't quite simple to do that right now and it would impact the sub-command prune logic as well.


var updater = rootWorldStateStorage.updater();
trieLog.forEach((key, value) -> updater.getTrieLogStorageTransaction().put(key, value));
jframe marked this conversation as resolved.
Show resolved Hide resolved
updater.getTrieLogStorageTransaction().commit();
}

static void exportTrieLog(
final BonsaiWorldStateKeyValueStorage rootWorldStateStorage,
final Path dataDirectoryPath,
final Hash trieLogHash)
throws IOException {
final String trieLogFile =
dataDirectoryPath.resolve(DATABASE_PATH).resolve(trieLogHash.toString()).toString();

saveTrieLogsInFile(singletonList(trieLogHash), rootWorldStateStorage, trieLogFile);
}

record TrieLogCount(int total, int canonicalCount, int forkCount, int orphanCount) {}
}
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,18 @@
import static com.google.common.base.Preconditions.checkArgument;
import static com.google.common.base.Preconditions.checkNotNull;

import org.hyperledger.besu.cli.DefaultCommandValues;
import org.hyperledger.besu.cli.util.VersionProvider;
import org.hyperledger.besu.controller.BesuController;
import org.hyperledger.besu.datatypes.Hash;
import org.hyperledger.besu.ethereum.chain.MutableBlockchain;
import org.hyperledger.besu.ethereum.storage.StorageProvider;
import org.hyperledger.besu.ethereum.trie.bonsai.storage.BonsaiWorldStateKeyValueStorage;
import org.hyperledger.besu.ethereum.trie.bonsai.trielog.TrieLogPruner;
import org.hyperledger.besu.ethereum.worldstate.DataStorageConfiguration;
import org.hyperledger.besu.ethereum.worldstate.DataStorageFormat;

import java.io.IOException;
import java.io.PrintWriter;
import java.nio.file.Path;
import java.nio.file.Paths;
Expand All @@ -43,7 +46,12 @@
description = "Manipulate trie logs",
mixinStandardHelpOptions = true,
versionProvider = VersionProvider.class,
subcommands = {TrieLogSubCommand.CountTrieLog.class, TrieLogSubCommand.PruneTrieLog.class})
subcommands = {
TrieLogSubCommand.CountTrieLog.class,
TrieLogSubCommand.PruneTrieLog.class,
TrieLogSubCommand.ExportTrieLog.class,
TrieLogSubCommand.ImportTrieLog.class
})
public class TrieLogSubCommand implements Runnable {

@SuppressWarnings("UnusedVariable")
Expand Down Expand Up @@ -123,6 +131,78 @@ public void run() {
}
}

@Command(
name = "export",
description =
"This command prunes all trie log layers below the retention threshold, including orphaned trie logs.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the pruning description not the export description

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for catching this

mixinStandardHelpOptions = true,
versionProvider = VersionProvider.class)
static class ExportTrieLog implements Runnable {

@SuppressWarnings("unused")
@ParentCommand
private TrieLogSubCommand parentCommand;

@SuppressWarnings("unused")
@CommandLine.Spec
private CommandLine.Model.CommandSpec spec; // Picocli injects reference to command spec

@CommandLine.Option(
names = "--trie-log-hash",
paramLabel = DefaultCommandValues.MANDATORY_LONG_FORMAT_HELP,
description = "The hash of the block you want to export the trie log.",
arity = "1..1")
private String trieLogHash;

jframe marked this conversation as resolved.
Show resolved Hide resolved
@Override
public void run() {
TrieLogContext context = getTrieLogContext();
final Path dataDirectoryPath =
Paths.get(
TrieLogSubCommand.parentCommand.parentCommand.dataDir().toAbsolutePath().toString());
try {
TrieLogHelper.exportTrieLog(
context.rootWorldStateStorage(), dataDirectoryPath, Hash.fromHexString(trieLogHash));
} catch (IOException e) {
throw new RuntimeException(e);
}
}
}

@Command(
name = "import",
description =
"This command prunes all trie log layers below the retention threshold, including orphaned trie logs.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description looks like the pruning description

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for catching this!

mixinStandardHelpOptions = true,
versionProvider = VersionProvider.class)
static class ImportTrieLog implements Runnable {

@SuppressWarnings("unused")
@ParentCommand
private TrieLogSubCommand parentCommand;

@SuppressWarnings("unused")
@CommandLine.Spec
private CommandLine.Model.CommandSpec spec; // Picocli injects reference to command spec

@CommandLine.Option(
names = "--trie-log-hash",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the exported file contain more than one trielog? Would of expected the import to specify the filename and just import all trielogs in that file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at the moment, this could be extended in the future to do so if needed.

paramLabel = DefaultCommandValues.MANDATORY_LONG_FORMAT_HELP,
description = "The hash of the block you want to import the trie log.",
arity = "1..1")
private String trieLogHash;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to blockHash? the variable name suggests this is the hash of the trielog

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be useful to be able to specify a block number instead of block hash. But only a suggestion not sure which one we will use more in practice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think block hash is prob better is this case, as you could eventually request by number and not have the right block hash if your node forked at that specific block requested.


@Override
public void run() {
TrieLogContext context = getTrieLogContext();
final Path dataDirectoryPath =
Paths.get(
TrieLogSubCommand.parentCommand.parentCommand.dataDir().toAbsolutePath().toString());
TrieLogHelper.importTrieLog(
context.rootWorldStateStorage(), dataDirectoryPath, Hash.fromHexString(trieLogHash));
}
}

record TrieLogContext(
DataStorageConfiguration config,
BonsaiWorldStateKeyValueStorage rootWorldStateStorage,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,6 @@

import org.apache.tuweni.bytes.Bytes;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
Expand All @@ -65,8 +64,8 @@ class TrieLogHelperTest {
static BlockHeader blockHeader4;
static BlockHeader blockHeader5;

@BeforeAll
public static void setup() throws IOException {
@BeforeEach
public void setup() throws IOException {

blockHeader1 = new BlockHeaderTestFixture().number(1).buildHeader();
blockHeader2 = new BlockHeaderTestFixture().number(2).buildHeader();
Expand Down Expand Up @@ -94,10 +93,11 @@ public static void setup() throws IOException {
.getTrieLogStorageTransaction()
.put(blockHeader5.getHash().toArrayUnsafe(), Bytes.fromHexString("0x05").toArrayUnsafe());
updater.getTrieLogStorageTransaction().commit();

createDirectory();
siladu marked this conversation as resolved.
Show resolved Hide resolved
}

@BeforeEach
void createDirectory() throws IOException {
static void createDirectory() throws IOException {
Files.createDirectories(dataDir.resolve("database"));
}

Expand Down Expand Up @@ -227,7 +227,6 @@ public void cantPruneIfUserRequiredFurtherThanFinalized() {

@Test
public void exceptionWhileSavingFileStopsPruneProcess() throws IOException {
Files.delete(dataDir.resolve("database"));

DataStorageConfiguration dataStorageConfiguration =
ImmutableDataStorageConfiguration.builder()
Expand All @@ -243,7 +242,11 @@ public void exceptionWhileSavingFileStopsPruneProcess() throws IOException {
assertThrows(
RuntimeException.class,
() ->
TrieLogHelper.prune(dataStorageConfiguration, inMemoryWorldState, blockchain, dataDir));
TrieLogHelper.prune(
dataStorageConfiguration,
inMemoryWorldState,
blockchain,
dataDir.resolve("unknownPath")));

// assert all trie logs are still in the DB
assertArrayEquals(
Expand All @@ -262,4 +265,43 @@ public void exceptionWhileSavingFileStopsPruneProcess() throws IOException {
inMemoryWorldState.getTrieLog(blockHeader5.getHash()).get(),
Bytes.fromHexString("0x05").toArrayUnsafe());
}

@Test
public void exportedTrieMatchesDbTrieLog() throws IOException {
TrieLogHelper.exportTrieLog(inMemoryWorldState, dataDir, blockHeader1.getHash());
Path trieLogFile = dataDir.resolve("database").resolve(blockHeader1.getHash().toString());

var trieLog =
TrieLogHelper.readTrieLogsFromFile(trieLogFile.toString()).entrySet().stream()
.findFirst()
.get();

assertArrayEquals(trieLog.getKey(), blockHeader1.getHash().toArrayUnsafe());
assertArrayEquals(trieLog.getValue(), Bytes.fromHexString("0x01").toArrayUnsafe());

Files.delete(trieLogFile);
}

@Test
public void importedTrieLogMatchesDbTrieLog() throws IOException {
StorageProvider tempStorageProvider = new InMemoryKeyValueStorageProvider();
BonsaiWorldStateKeyValueStorage inMemoryWorldState2 =
new BonsaiWorldStateKeyValueStorage(tempStorageProvider, new NoOpMetricsSystem());

TrieLogHelper.exportTrieLog(inMemoryWorldState, dataDir, blockHeader1.getHash());
Path trieLogFile = dataDir.resolve("database").resolve(blockHeader1.getHash().toString());

var trieLog = TrieLogHelper.readTrieLogsFromFile(trieLogFile.toString());
var updater = inMemoryWorldState2.updater();

trieLog.forEach((k, v) -> updater.getTrieLogStorageTransaction().put(k, v));

updater.getTrieLogStorageTransaction().commit();

assertArrayEquals(
inMemoryWorldState2.getTrieLog(blockHeader1.getHash()).get(),
Bytes.fromHexString("0x01").toArrayUnsafe());

Files.delete(trieLogFile);
}
}
Loading