Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epoching: fix genesis, param and bootstrapping #41

Merged
merged 15 commits into from
Jul 1, 2022

Conversation

SebastianElvis
Copy link
Member

@SebastianElvis SebastianElvis commented Jun 30, 2022

Fixes #36 and https://babylon-chain.atlassian.net/browse/BM-48

This PR implements the bootstrapping process for the epoching module such as genesis, parameters, and fixes a bootstrapping issue caused by NewDropValidatorMsgDecorator.

Root cause of the issue: Genesis block includes a set of txs for boostrapping the initial validator. These txs include msg types that are related to the validator set and thus are rejected by NewDropValidatorMsgDecorator. To fix this, we will have to allow such validator-related msgs in the genesis. Other fixes might require touching the genesis state generation in Cosmos SDK, and thus are not desired for us.

@SebastianElvis SebastianElvis marked this pull request as ready for review June 30, 2022 12:05
@vitsalis
Copy link
Member

Thanks for the fix @SebastianElvis ! Looking at the build it seems that it is failing due to the make test command. I replicated this behavior on my machine as well.

@SebastianElvis
Copy link
Member Author

Thanks for the fix @SebastianElvis ! Looking at the build it seems that it is failing due to the make test command. I replicated this behavior on my machine as well.

Yeah somehow I broke some other stuff in the module... Will let you know when I fix it.

@SebastianElvis
Copy link
Member Author

SebastianElvis commented Jun 30, 2022

@vitsalis After the last commit my local computer can run the CI pipelines without error. However the CI seems to still fail. Could you please try it locally and see if you can replicate the error? Thanks!

@vitsalis
Copy link
Member

For me, both the replicated CI commands and ignite chain serve --reset-once --verbose fail with the same error. Maybe it has something to do with the cache?

@SebastianElvis
Copy link
Member Author

SebastianElvis commented Jun 30, 2022

It's weird. make localnet-start can start 4 nodes that regularly produce new blocks. The CI commands also work fine from my side.

╭─rhan0013 at MU00152739X in ⌁/Projects/Babylon/babylon (epoching-fix-localnet-start ✚4⚑1)
╰─λ ./build/babylond start --home ./output/node0/babylond --halt-height 1                                               0 (0.209s) < 23:05:52
11:05PM INF starting node with ABCI Tendermint in-process
11:05PM INF Starting multiAppConn service impl=multiAppConn module=proxy
11:05PM INF Starting localClient service connection=query impl=localClient module=abci-client
11:05PM INF Starting localClient service connection=snapshot impl=localClient module=abci-client
11:05PM INF Starting localClient service connection=mempool impl=localClient module=abci-client
11:05PM INF Starting localClient service connection=consensus impl=localClient module=abci-client
11:05PM INF Starting EventBus service impl=EventBus module=events
11:05PM INF Starting PubSub service impl=PubSub module=pubsub
11:05PM INF Starting IndexerService service impl=IndexerService module=txindex
11:05PM INF ABCI Handshake App Info hash="�}<�\v;\x128\x00�}m��g�8��\x01���pN�\fN7�dK" height=2 module=consensus protocol-version=0 software-version=ac79e52
11:05PM INF ABCI Replay Blocks appHeight=2 module=consensus stateHeight=2 storeHeight=2
11:05PM INF Completed ABCI Handshake - Tendermint and App are synced appHash="�}<�\v;\x128\x00�}m��g�8��\x01���pN�\fN7�dK" appHeight=2 module=consensus
11:05PM INF Version info block=11 p2p=8 tendermint_version=0.34.19
11:05PM INF This node is a validator addr=BFDD4B757FF0A30471B8B82A586389EA18A24FBC module=consensus pubKey=N/o77SSVNdPignXJivxLa+xLOdxd4alFvQBY0gAG+GI=
11:05PM INF P2P Node ID ID=63f616d7f8b60cbfd9e9388eb30c329f076c60f2 file=output/node0/babylond/config/node_key.json module=p2p
11:05PM INF Adding persistent peers addrs=[] module=p2p
11:05PM INF Adding unconditional peer ids ids=[] module=p2p
11:05PM INF Add our address to book addr={"id":"63f616d7f8b60cbfd9e9388eb30c329f076c60f2","ip":"0.0.0.0","port":26656} book=output/node0/babylond/config/addrbook.json module=p2p
11:05PM INF Starting Node service impl=Node
11:05PM INF Starting pprof server laddr=localhost:6060
11:05PM INF Starting RPC HTTP server on [::]:26657 module=rpc-server
11:05PM INF Starting P2P Switch service impl="P2P Switch" module=p2p
11:05PM INF Starting Mempool service impl=Mempool module=mempool
11:05PM INF Starting BlockchainReactor service impl=BlockchainReactor module=blockchain
11:05PM INF Starting Consensus service impl=ConsensusReactor module=consensus
11:05PM INF Reactor  module=consensus waitSync=false
11:05PM INF Starting State service impl=ConsensusState module=consensus
11:05PM INF Starting baseWAL service impl=baseWAL module=consensus wal=output/node0/babylond/data/cs.wal/wal
11:05PM INF Starting Group service impl=Group module=consensus wal=output/node0/babylond/data/cs.wal/wal
11:05PM INF Starting TimeoutTicker service impl=TimeoutTicker module=consensus
11:05PM INF Searching for height height=3 max=0 min=0 module=consensus wal=output/node0/babylond/data/cs.wal/wal
11:05PM INF Searching for height height=2 max=0 min=0 module=consensus wal=output/node0/babylond/data/cs.wal/wal
11:05PM INF Found height=2 index=0 module=consensus wal=output/node0/babylond/data/cs.wal/wal
11:05PM INF Catchup by replaying consensus messages height=3 module=consensus
11:05PM INF Replay: Done module=consensus
11:05PM INF Starting Evidence service impl=Evidence module=evidence
11:05PM INF Starting StateSync service impl=StateSync module=statesync
11:05PM INF Starting PEX service impl=PEX module=pex
11:05PM INF Starting AddrBook service book=output/node0/babylond/config/addrbook.json impl=AddrBook module=p2p
11:05PM INF Saving AddrBook to file book=output/node0/babylond/config/addrbook.json module=p2p size=0
11:05PM INF Ensure peers module=pex numDialing=0 numInPeers=0 numOutPeers=0 numToDial=10
11:05PM INF No addresses to dial. Falling back to seeds module=pex
11:05PM INF starting API server... module=api-server
11:05PM INF Starting RPC HTTP server on [::]:1317 module=api-server
11:06PM INF Timed out dur=4970.919 height=3 module=consensus round=0 step=1
11:06PM INF received proposal module=consensus proposal={"Type":32,"block_id":{"hash":"5DB2B3A74CEE7774B8069A86CFBCA76A464EDD3FC8F61AAF2A99313CE491BB95","parts":{"hash":"96ABC25CA2768CC182192C66AE02B6413D0A112807989A5CCA0B84CF48282181","total":1}},"height":3,"pol_round":-1,"round":0,"signature":"S+ACHvLDEWF9yFitlLOk3RLwmLX25oJ+jwA6V9kkiOtj5Q3lpuzIW8nEhK0KIhsDEXo2QHsqaiTIoA7e8LsFAw==","timestamp":"2022-06-30T13:06:01.915338Z"}
11:06PM INF received complete proposal block hash=5DB2B3A74CEE7774B8069A86CFBCA76A464EDD3FC8F61AAF2A99313CE491BB95 height=3 module=consensus
11:06PM INF finalizing commit of block hash=5DB2B3A74CEE7774B8069A86CFBCA76A464EDD3FC8F61AAF2A99313CE491BB95 height=3 module=consensus num_txs=0 root=8F7D3CBC0B3B123800F07D6DB7E067A238E68D0196ACA3704EA30C4E37F7644B
11:06PM INF minted coins from module account amount=10stake from=mint module=x/bank
11:06PM INF executed block height=3 module=state num_invalid_txs=0 num_valid_txs=0
11:06PM INF commit synced commit=436F6D6D697449447B5B313820313438203733203139203733203131322031383020313638203233362038382031323420393120313733203235302033362031342031363320323620353520323333203234352032313620313237203835203539203138203539203830203133312030203538203231335D3A337D
11:06PM INF halting node per configuration height=1
11:06PM INF committed state app_hash=129449134970B4A8EC587C5BADFA240EA31A37E9F5D87F553B123B5083003AD5 height=3 module=state num_txs=0

The problem can also be triggered by ./build/babylond export, with the following error

╭─rhan0013 at MU00152739X in ⌁/Projects/Babylon/babylon (epoching-fix-localnet-start ✚4⚑1)
╰─λ ./build/babylond export                                                                                           130 (5.913s) < 23:06:02
panic: UnmarshalJSON cannot decode empty bytes

goroutine 1 [running]:
github.com/cosmos/cosmos-sdk/x/params/types.Subspace.Get({{0x2ef6b7f0, 0xc0001f8230}, 0xc000022260, {0x5dd1ad0, 0xc000de4480}, {0x5dd1b20, 0xc000de4530}, {0xc0005b6e00, 0x4, 0x1a}, ...}, ...)
        github.com/cosmos/[email protected]/x/params/types/subspace.go:109 +0x307
github.com/cosmos/cosmos-sdk/x/params/types.Subspace.GetParamSet({{0x2ef6b7f0, 0xc0001f8230}, 0xc000022260, {0x5dd1ad0, 0xc000de4480}, {0x5dd1b20, 0xc000de4530}, {0xc0005b6e00, 0x4, 0x1a}, ...}, ...)
        github.com/cosmos/[email protected]/x/params/types/subspace.go:222 +0x145
github.com/cosmos/cosmos-sdk/x/auth/keeper.AccountKeeper.GetParams(...)
        github.com/cosmos/[email protected]/x/auth/keeper/params.go:15
github.com/cosmos/cosmos-sdk/x/auth.ExportGenesis({{0x5de1ce8, 0xc0001a6008}, {0x5decc40, 0xc0001e6f80}, {{0x0, 0x0}, {0x0, 0x0}, 0x0, {0x0, ...}, ...}, ...}, ...)
        github.com/cosmos/[email protected]/x/auth/genesis.go:32 +0x125
github.com/cosmos/cosmos-sdk/x/auth.AppModule.ExportGenesis({{}, {{0x5dd1ad0, 0xc000de4410}, {0x2ef6b7f0, 0xc0001f8230}, {{0x2ef6b7f0, 0xc0001f8230}, 0xc000022260, {0x5dd1ad0, 0xc000de4480}, ...}, ...}, ...}, ...)
        github.com/cosmos/[email protected]/x/auth/module.go:151 +0xb8
github.com/cosmos/cosmos-sdk/types/module.(*Manager).ExportGenesis(_, {{0x5de1ce8, 0xc0001a6008}, {0x5decc40, 0xc0001e6f80}, {{0x0, 0x0}, {0x0, 0x0}, 0x0, ...}, ...}, ...)
        github.com/cosmos/[email protected]/types/module/module.go:341 +0x125
github.com/babylonchain/babylon/app.(*BabylonApp).ExportAppStateAndValidators(0xc000244d00, 0x0, {0x6a5faa0, 0x0, 0x0})
        github.com/babylonchain/babylon/app/export.go:32 +0x20b
github.com/babylonchain/babylon/cmd/babylond/cmd.appCreator.appExport({{{0x5de5868, 0xc00048e0c0}, {0x5dee638, 0xc0001f8230}, {0x5de9998, 0xc000120040}, 0xc000022260}}, {0x5de2a40, 0xc0005b8e40}, {0x5dec4d0, ...}, ...)
        github.com/babylonchain/babylon/cmd/babylond/cmd/root.go:300 +0x36a
github.com/cosmos/cosmos-sdk/server.ExportCmd.func1(0xc000da4000?, {0x6a5faa0?, 0x0?, 0x0?})
        github.com/cosmos/[email protected]/server/export.go:71 +0x31c
github.com/spf13/cobra.(*Command).execute(0xc000da4000, {0x6a5faa0, 0x0, 0x0})
        github.com/spf13/[email protected]/command.go:856 +0x67c
github.com/spf13/cobra.(*Command).ExecuteC(0xc000f5c280)
        github.com/spf13/[email protected]/command.go:974 +0x3b4
github.com/spf13/cobra.(*Command).Execute(...)
        github.com/spf13/[email protected]/command.go:902
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        github.com/spf13/[email protected]/command.go:895
github.com/cosmos/cosmos-sdk/server/cmd.Execute(0x0?, {0xc000eb8e00, 0x1b})
        github.com/cosmos/[email protected]/server/cmd/execute.go:36 +0x1eb
main.main()
        github.com/babylonchain/babylon/cmd/babylond/main.go:18 +0x31

The stacktrace basically says that some genesis is empty and cannot be deserialised. It's quite weird that the error comes from other modules, e.g., auth.

@SebastianElvis
Copy link
Member Author

SebastianElvis commented Jul 1, 2022

My last commit fixes all errors in my local machine and Docker.

It turns out that to reproduce this error, one needs to remove the entire database (.testnets/ for make localnet-test and output/ for running CI commands locally). Otherwise, the node will start from a non-genesis block, skipping the execution of the genesis block.

The root cause is still the AnteHandler: initialising AnteHandlers happens earlier than exporting genesis states. My initial AnteHandler implementation needs to retrieve the epoch number and thus needs to query the DB, which at that moment is still not initilaised yet. A straightforward fix is to use block height in ctx to check if we are at the genesis block or not.

However, the CI still fails with the output Received "interrupt" signal. I guess this is because Cosmos SDK handles --halt-height 1 by sending an interrupt to the process. How do we get around with this? @vitsalis

Log in CI:

#!/bin/bash -eo pipefail
./build/babylond testnet --v 1 --output-dir ./output --starting-ip-address 192.168.10.2 --keyring-backend test &&
./build/babylond start --home ./output/node0/babylond --halt-height 1

Successfully initialized 1 node directories
12:06AM INF starting node with ABCI Tendermint in-process
12:06AM INF Starting multiAppConn service impl=multiAppConn module=proxy
12:06AM INF Starting localClient service connection=query impl=localClient module=abci-client
12:06AM INF Starting localClient service connection=snapshot impl=localClient module=abci-client
12:06AM INF Starting localClient service connection=mempool impl=localClient module=abci-client
12:06AM INF Starting localClient service connection=consensus impl=localClient module=abci-client
12:06AM INF Starting EventBus service impl=EventBus module=events
12:06AM INF Starting PubSub service impl=PubSub module=pubsub
12:06AM INF Starting IndexerService service impl=IndexerService module=txindex
12:06AM INF ABCI Handshake App Info hash= height=0 module=consensus protocol-version=0 software-version=6761dbe
12:06AM INF ABCI Replay Blocks appHeight=0 module=consensus stateHeight=0 storeHeight=0
12:06AM INF asserting crisis invariants inv=0/11 module=x/crisis name=distribution/nonnegative-outstanding
12:06AM INF asserting crisis invariants inv=1/11 module=x/crisis name=distribution/can-withdraw
12:06AM INF asserting crisis invariants inv=2/11 module=x/crisis name=distribution/reference-count
12:06AM INF asserting crisis invariants inv=3/11 module=x/crisis name=distribution/module-account
12:06AM INF asserting crisis invariants inv=4/11 module=x/crisis name=bank/nonnegative-outstanding
12:06AM INF asserting crisis invariants inv=5/11 module=x/crisis name=bank/total-supply
12:06AM INF asserting crisis invariants inv=6/11 module=x/crisis name=gov/module-account
12:06AM INF asserting crisis invariants inv=7/11 module=x/crisis name=staking/module-accounts
12:06AM INF asserting crisis invariants inv=8/11 module=x/crisis name=staking/nonnegative-power
12:06AM INF asserting crisis invariants inv=9/11 module=x/crisis name=staking/positive-delegation
12:06AM INF asserting crisis invariants inv=10/11 module=x/crisis name=staking/delegator-shares
12:06AM INF asserted all invariants duration=0.429097 height=0 module=x/crisis
12:06AM INF Completed ABCI Handshake - Tendermint and App are synced appHash= appHeight=0 module=consensus
12:06AM INF Version info block=11 p2p=8 tendermint_version=0.34.19
12:06AM INF This node is a validator addr=1E57BFFDFA07CFBB93908F67567EF76B03A15509 module=consensus pubKey=xZYMSag+hu0ovYbE5u5gE1FwG52mZESfhZLWs0obQxw=
12:06AM INF P2P Node ID ID=7ddf4a186600be56fe38fe06bdc6af103e62347b file=output/node0/babylond/config/node_key.json module=p2p
12:06AM INF Adding persistent peers addrs=[] module=p2p
12:06AM INF Adding unconditional peer ids ids=[] module=p2p
12:06AM INF Add our address to book addr={"id":"7ddf4a186600be56fe38fe06bdc6af103e62347b","ip":"0.0.0.0","port":26656} book=output/node0/babylond/config/addrbook.json module=p2p
12:06AM INF Starting Node service impl=Node
12:06AM INF Starting pprof server laddr=localhost:6060
12:06AM INF Starting P2P Switch service impl="P2P Switch" module=p2p
12:06AM INF Starting BlockchainReactor service impl=BlockchainReactor module=blockchain
12:06AM INF Starting Consensus service impl=ConsensusReactor module=consensus
12:06AM INF Reactor  module=consensus waitSync=false
12:06AM INF Starting State service impl=ConsensusState module=consensus
12:06AM INF Starting RPC HTTP server on [::]:26657 module=rpc-server
12:06AM INF Starting baseWAL service impl=baseWAL module=consensus wal=output/node0/babylond/data/cs.wal/wal
12:06AM INF Starting Group service impl=Group module=consensus wal=output/node0/babylond/data/cs.wal/wal
12:06AM INF Starting TimeoutTicker service impl=TimeoutTicker module=consensus
12:06AM INF Searching for height height=1 max=0 min=0 module=consensus wal=output/node0/babylond/data/cs.wal/wal
12:06AM INF Searching for height height=0 max=0 min=0 module=consensus wal=output/node0/babylond/data/cs.wal/wal
12:06AM INF Found height=0 index=0 module=consensus wal=output/node0/babylond/data/cs.wal/wal
12:06AM INF Catchup by replaying consensus messages height=1 module=consensus
12:06AM INF Replay: Done module=consensus
12:06AM INF Starting Evidence service impl=Evidence module=evidence
12:06AM INF Starting StateSync service impl=StateSync module=statesync
12:06AM INF Starting PEX service impl=PEX module=pex
12:06AM INF Starting AddrBook service book=output/node0/babylond/config/addrbook.json impl=AddrBook module=p2p
12:06AM INF Starting Mempool service impl=Mempool module=mempool
12:06AM INF Saving AddrBook to file book=output/node0/babylond/config/addrbook.json module=p2p size=0
12:06AM INF Ensure peers module=pex numDialing=0 numInPeers=0 numOutPeers=0 numToDial=10
12:06AM INF No addresses to dial. Falling back to seeds module=pex
12:06AM INF starting API server... module=api-server
12:06AM INF Starting RPC HTTP server on [::]:1317 module=api-server
12:06AM INF Timed out dur=4998.621983 height=1 module=consensus round=0 step=1
12:06AM INF received proposal module=consensus proposal={"Type":32,"block_id":{"hash":"03122BC20283375D768486FAA8E27641393EE1E38913948C528BBF04F464C4BF","parts":{"hash":"AE687C8FE1BC504EBD83B7E6BA79751877416DF278A4519EBA58FB116A7E4CCF","total":1}},"height":1,"pol_round":-1,"round":0,"signature":"+dF+oi+FRT0JQqEh6LjXInOu97x3nZ6yQT0QPJYikZkqxLP+4Ns/qdyE5n484rRCqek6VSSPYfQtmKMHIjrAAQ==","timestamp":"2022-07-01T00:06:43.635206906Z"}
12:06AM INF received complete proposal block hash=03122BC20283375D768486FAA8E27641393EE1E38913948C528BBF04F464C4BF height=1 module=consensus
12:06AM INF finalizing commit of block hash=03122BC20283375D768486FAA8E27641393EE1E38913948C528BBF04F464C4BF height=1 module=consensus num_txs=0 root=E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855
12:06AM INF minted coins from module account amount=10stake from=mint module=x/bank
12:06AM INF executed block height=1 module=state num_invalid_txs=0 num_valid_txs=0
12:06AM INF commit synced commit=436F6D6D697449447B5B32303820313331203535203232352036332031203531203132372031383820393120383920313320313430203230312031353520323436203138342032343820313137203134342036203630203436203139302031383920353820313720383220323331203536203139372034315D3A317D
12:06AM INF halting node per configuration height=1

Received "interrupt" signal

@SebastianElvis
Copy link
Member Author

I guess this is because Cosmos SDK handles --halt-height 1 by sending an interrupt to the process.

Confirmed, BaseApp.halt() function in https://github.com/cosmos/cosmos-sdk/blob/v0.45.5/baseapp/abci.go#L348-L368 sends SIGINT and SIGTERM signals to the process. CI considers this as undesired so fails the check.

@vitsalis
Copy link
Member

vitsalis commented Jul 1, 2022

Indeed, the issue is with the build right now. Reverted the commit until a more stable fix for the build is found.

@SebastianElvis
Copy link
Member Author

After merging the current main branch now this PR passes CI! Feel free to review the code and provide comments.

return nil
}

func validateEpochInterval(i interface{}) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this method takes an interface{}. It's defined as uint64.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a convention in Cosmos SDK (e.g., https://github.com/cosmos/cosmos-sdk/blob/v0.45.5/x/staking/types/params.go). I guess this extra type check aims to rule out the issues during serialisation/deserialisation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How odd. You would think the reason to have Protobuf have a schema is so we don't have to worry about stuff like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely an uin64 in https://github.com/babylonchain/babylon/blob/main/x/epoching/types/params.pb.go#L29

Probably the only use case for this test is to prevent changing the type in the proto.
Hopefully there don't need to be such tests for each field of each type 😖

Copy link
Contributor

@aakoshh aakoshh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@aakoshh aakoshh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@SebastianElvis SebastianElvis merged commit 468dc96 into main Jul 1, 2022
@SebastianElvis
Copy link
Member Author

Thanks for the entire process Vitalis and Akosh! Learned a lot.

@SebastianElvis SebastianElvis deleted the epoching-fix-localnet-start branch July 1, 2022 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Epoching AnteHandler rejects genesis staking transactions
3 participants