Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improvements to big block tests #3493

Conversation

smuu
Copy link
Member

@smuu smuu commented May 17, 2024

Overview

This PR is for sanaz/big-block-test to debug the issues and to speed up the testing process.

@staheri14

@smuu smuu requested a review from a team as a code owner May 17, 2024 15:09
@smuu smuu requested review from ramin and rootulp and removed request for a team May 17, 2024 15:09
@celestia-bot celestia-bot requested a review from a team May 17, 2024 15:10
Comment on lines 295 to 302
err = n.Instance.Commit()
if err != nil {
return fmt.Errorf("committing instance: %w", err)
}

if err = n.Instance.AddFolder(nodeDir, remoteRootDir, "10001:10001"); err != nil {
return fmt.Errorf("copying over node %s directory: %w", n.Name, err)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding files after commit speeds up the set-up process

go.mod Outdated

require (
cosmossdk.io/errors v1.0.1
cosmossdk.io/math v1.3.0
github.com/celestiaorg/blobstream-contracts/v3 v3.1.0
github.com/celestiaorg/go-square v1.0.1
github.com/celestiaorg/go-square/merkle v0.0.0-20240117232118-fd78256df076
github.com/celestiaorg/knuu v0.13.2
github.com/celestiaorg/knuu v0.13.3-rc.9
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the latest fixes and improvements

Comment on lines -145 to 173
// copy over the keyring directory to the txsim instance
err = txsim.Instance.AddFolder(txsimKeyringDir, txsimRootDir, "10001:10001")
err = txsim.Instance.Commit()
if err != nil {
log.Err(err).
Str("directory", txsimKeyringDir).
Str("name", name).
Msg("error adding keyring dir to txsim")
Msg("error committing txsim")
return err
}
err = txsim.Instance.Commit()

// copy over the keyring directory to the txsim instance
err = txsim.Instance.AddFolder(txsimKeyringDir, txsimRootDir, "10001:10001")
if err != nil {
log.Err(err).
Str("directory", txsimKeyringDir).
Str("name", name).
Msg("error committing txsim")
Msg("error adding keyring dir to txsim")
return err
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding files after commit speeds up the set-up process

test/e2e/testnet/testnet.go Outdated Show resolved Hide resolved
Comment on lines 334 to 347
// start genesis nodes asynchronously
for _, node := range genesisNodes {
err := node.Start()
err := node.Instance.StartWithoutWait()
if err != nil {
return fmt.Errorf("node %s failed to start: %w", node.Name, err)
}
}
// wait for instances to be running
for _, node := range genesisNodes {
client, err := node.Client()
err := node.Instance.WaitInstanceIsRunning()
if err != nil {
return fmt.Errorf("failed to initialize node %s: %w", node.Name, err)
return fmt.Errorf("node %s failed to start: %w", node.Name, err)
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can start all nodes in an async way and then wait for all nodes to be started.
I contrast to start and wait for them one-by-one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, it is very useful!

@rootulp rootulp removed request for ramin and rootulp May 17, 2024 18:08
@rootulp
Copy link
Collaborator

rootulp commented May 17, 2024

@smuu can you please provide a PR description? Did you mean to tag @staheri14 for review b/c this PR merges to her branch.

@staheri14
Copy link
Contributor

staheri14 commented May 17, 2024

Thanks @smuu for the PR!

@smuu can you please provide a PR description? Did you mean to tag @staheri14 for review b/c this PR merges to her branch.

AFAIK, The purpose of this PR is to provide fixes for some issues that occur when using the latest release candidate of knuu for the big block tests (and e2e tests in general). That is why it is based off celestiaorg:sanaz/big-block-test .

Copy link
Member

@rach-id rach-id left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Left some comments to understand more.

Also, if we want these changes, maybe it makes sense to make an official knuu release instead of depending on an RC

Comment on lines 428 to 432
if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
return 0, err
}

return 0, fmt.Errorf("error getting height: %w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit]
What's the difference between the two returns? one is adding the string "error getting height:", but does it matter? is it being parsed somewhere?

Comment on lines +70 to +85
resultData, ok := result["result"].(map[string]interface{})
if !ok {
return 0, fmt.Errorf("error getting result from status")
}
syncInfo, ok := resultData["sync_info"].(map[string]interface{})
if !ok {
return 0, fmt.Errorf("error getting sync info from status")
}
latestBlockHeight, ok := syncInfo["latest_block_height"].(string)
if !ok {
return 0, fmt.Errorf("error getting latest block height from sync info")
}
latestBlockHeightInt, err := strconv.ParseInt(latestBlockHeight, 10, 64)
if err != nil {
return 0, fmt.Errorf("error converting latest block height to int: %w", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of these gymnastics, why not parse the whole response to ResultStatus from https://github.com/celestiaorg/celestia-core/blob/a281e871c70f4c8bd4f6ab7c7e39e28c715d65b2/rpc/core/types/responses.go#L123-L128, and get the field you want from it.

Comment on lines +20 to +68
type JSONRPCError struct {
Code int
Message string
Data string
}

func (e *JSONRPCError) Error() string {
return fmt.Sprintf("JSONRPC Error - Code: %d, Message: %s, Data: %s", e.Code, e.Message, e.Data)
}

func getStatus(executor *knuu.Executor, app *knuu.Instance) (string, error) {
nodeIP, err := app.GetIP()
if err != nil {
return "", fmt.Errorf("error getting node ip: %w", err)
}

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
defer cancel()
status, err := executor.ExecuteCommandWithContext(ctx, "wget", "-q", "-O", "-", fmt.Sprintf("%s:26657/status", nodeIP))
if err != nil {
return "", fmt.Errorf("error executing command: %w", err)
}
return status, nil
}

func latestBlockHeightFromStatus(status string) (int64, error) {
var result map[string]interface{}
err := json.Unmarshal([]byte(status), &result)
if err != nil {
return 0, fmt.Errorf("error unmarshalling status: %w", err)
}

if errorField, ok := result["error"]; ok {
errorData, ok := errorField.(map[string]interface{})
if !ok {
return 0, fmt.Errorf("error field exists but is not a map[string]interface{}")
}
jsonError := &JSONRPCError{}
if errorCode, ok := errorData["code"].(float64); ok {
jsonError.Code = int(errorCode)
}
if errorMessage, ok := errorData["message"].(string); ok {
jsonError.Message = errorMessage
}
if errorData, ok := errorData["data"].(string); ok {
jsonError.Data = errorData
}
return 0, jsonError
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar, instead of defining these here and doing all of the heavy lifting, you could use these existing types: https://github.com/celestiaorg/celestia-core/blob/a281e871c70f4c8bd4f6ab7c7e39e28c715d65b2/rpc/jsonrpc/types/types.go#L140-L159
And, you could simply parse the response from the command using:

recv := new(types.RPCResponse)
err = json.Unmarshal(status, recv)

err = txsim.Instance.Commit()

// copy over the keyring directory to the txsim instance
err = txsim.Instance.AddFolder(txsimKeyringDir, txsimRootDir, "10001:10001")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[microscopic nit]
it would be nice to define some constants for UID and GID and explain in their docs that these are the permissions given in the Celestia-app docker images

@smuu smuu requested a review from staheri14 May 21, 2024 10:22
Copy link
Member

@ninabarbakadze ninabarbakadze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're making some of these improvements in my pr already #3487

@celestia-bot celestia-bot requested review from a team and ninabarbakadze and removed request for a team May 21, 2024 13:13
@evan-forbes evan-forbes added the WS: Big Blonks 🔭 Improving consensus critical gossiping protocols label May 21, 2024
@celestia-bot celestia-bot requested a review from a team May 22, 2024 06:34
@smuu smuu marked this pull request as draft May 22, 2024 06:34
Comment on lines -160 to -162
log.Println("Reading blockchain")
blockchain, err := testnode.ReadBlockHeights(context.Background(),
b.Node(0).AddressRPC(), 1, 5)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before, the chain was only read until height 5. As the txsim instances start after the validators, the first blocks do not contain any txs. Now, we are reading the whole blockchain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Samuel, it was intentional to reduce the height to 5 as otherwise it would take a long time to read the blockchain specially for large size blocks, and would fail from one run to another. But if it works for the entire blockchain, that would be even better, thanks!

for _, node := range genesisNodes {
err := node.Start()
err := node.StartAsync()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to speed up the start-up phase, we can start all validator instances async and then wait for them being started.

Comment on lines -364 to -385
for _, node := range t.nodes {
if node.Instance.IsInState(knuu.Started) {
if err := node.Instance.Stop(); err != nil {
log.Err(err).
Str("name", node.Name).
Msg("node failed to stop")
continue
}
if err := node.Instance.WaitInstanceIsStopped(); err != nil {
log.Err(err).
Str("name", node.Name).
Msg("node failed to stop")
continue
}
}
if node.Instance.IsInState(knuu.Started, knuu.Stopped) {
err := node.Instance.Destroy()
if err != nil {
log.Err(err).
Str("name", node.Name).
Msg("node failed to cleanup")
}
Copy link
Member Author

@smuu smuu May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to stop the instance before destroying it; we can simply destroy it.

@ninabarbakadze
Copy link
Member

we're making some of these improvements in my pr already #3487

ignore this, i reverted those improvements.

@staheri14 staheri14 merged commit 3842cf1 into celestiaorg:sanaz/big-block-test May 22, 2024
31 of 33 checks passed
@staheri14
Copy link
Contributor

@smuu I apologize for any confusion. I did not intend to unilaterally close the PR. My goal was to merge the changes in this branch into sanaz/big-block-test as you requested. I'm not sure how the PR ended up getting closed.

@smuu smuu deleted the smuu/improvements-to-big-block-tests branch June 3, 2024 07:32
staheri14 added a commit that referenced this pull request Jun 19, 2024
Closes #3480 
This PR backports many optimizations introduced in the big block tests
via #3493 to the `main`
branch.

Will be ready for review after merging
#3514

---------

Co-authored-by: Evan Forbes <[email protected]>
0xchainlover pushed a commit to celestia-org/celestia-app that referenced this pull request Aug 1, 2024
Closes #3480 
This PR backports many optimizations introduced in the big block tests
via celestiaorg/celestia-app#3493 to the `main`
branch.

Will be ready for review after merging
celestiaorg/celestia-app#3514

---------

Co-authored-by: Evan Forbes <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WS: Big Blonks 🔭 Improving consensus critical gossiping protocols
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants