Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quorum-Raft fall into leader change loop in stress test #927

Closed
WesleyLiu0717 opened this issue Jan 15, 2020 · 19 comments
Closed

Quorum-Raft fall into leader change loop in stress test #927

WesleyLiu0717 opened this issue Jan 15, 2020 · 19 comments
Assignees

Comments

@WesleyLiu0717
Copy link

Hi there,
We are doing stress test for quorum-raft.
We send 1000 contract function transactions per second to 4 nodes last for 16 hours.
(250Tx/s to each node)
In the beginning, all the transactions are confirmed.
But after while, one of the raft nodes broadcasts leader change msg, all the nodes fall into "leader change loop" and can't mine the block.

System information

Quorum version: v2.4.0
OS & Version: Ubuntu 18.04

Expected behaviour

Leader change mechanism can exactly decide a leader.

Actual behaviour

The election organizer(node4) can't receive the MsgVoteResp from some other nodes.
The following is part of the log, the raft log and block information are in the repository:
https://github.com/WesleyLiu0717/quorum-raft-experiment

node1
raft2020/01/14 18:52:39 INFO: 1 [term: 3] received a MsgVote message with higher term from 4 [term: 4]
raft2020/01/14 18:52:39 INFO: 1 became follower at term 4
raft2020/01/14 18:52:39 INFO: 1 [logterm: 3, index: 6070, vote: 0] cast MsgVote for 4 [logterm: 3, index: 6070] at term 4
raft2020/01/14 18:52:39 INFO: raft.node: 1 lost leader 2 at term 4
raft2020/01/14 18:52:39 INFO: 1 [term: 4] ignored a MsgApp message with lower term from 2 [term: 3]
raft2020/01/14 18:52:40 INFO: 1 [term: 4] received a MsgVote message with higher term from 4 [term: 5]
raft2020/01/14 18:52:40 INFO: 1 became follower at term 5
raft2020/01/14 18:52:40 INFO: 1 [logterm: 3, index: 6070, vote: 0] cast MsgVote for 4 [logterm: 3, index: 6070] at term 5
raft2020/01/14 18:52:41 INFO: 1 [term: 5] received a MsgVote message with higher term from 4 [term: 6]
node2
raft2020/01/14 18:52:39 INFO: 2 [term: 3] received a MsgVote message with higher term from 4 [term: 4]
raft2020/01/14 18:52:39 INFO: 2 became follower at term 4
raft2020/01/14 18:52:39 INFO: 2 [logterm: 3, index: 6070, vote: 0] cast MsgVote for 4 [logterm: 3, index: 6070] at term 4
raft2020/01/14 18:52:39 INFO: raft.node: 2 lost leader 2 at term 4
raft2020/01/14 18:52:40 INFO: 2 [term: 4] received a MsgVote message with higher term from 4 [term: 5]
raft2020/01/14 18:52:40 INFO: 2 became follower at term 5
raft2020/01/14 18:52:40 INFO: 2 [logterm: 3, index: 6070, vote: 0] cast MsgVote for 4 [logterm: 3, index: 6070] at term 5
raft2020/01/14 18:52:41 INFO: 2 [term: 5] received a MsgVote message with higher term from 4 [term: 6]
node3
raft2020/01/14 18:52:39 INFO: 3 [term: 3] received a MsgVote message with higher term from 4 [term: 4]
raft2020/01/14 18:52:39 INFO: 3 became follower at term 4
raft2020/01/14 18:52:39 INFO: 3 [logterm: 3, index: 4388, vote: 0] cast MsgVote for 4 [logterm: 3, index: 6070] at term 4
raft2020/01/14 18:52:39 INFO: raft.node: 3 lost leader 2 at term 4
raft2020/01/14 18:52:40 INFO: 3 [term: 4] received a MsgVote message with higher term from 4 [term: 5]
raft2020/01/14 18:52:40 INFO: 3 became follower at term 5
raft2020/01/14 18:52:40 INFO: 3 [logterm: 3, index: 4388, vote: 0] cast MsgVote for 4 [logterm: 3, index: 6070] at term 5
raft2020/01/14 18:52:41 INFO: 3 [term: 5] received a MsgVote message with higher term from 4 [term: 6]
node4
raft2020/01/14 18:52:38 INFO: 4 is starting a new election at term 3
raft2020/01/14 18:52:38 INFO: 4 became candidate at term 4
raft2020/01/14 18:52:38 INFO: 4 received MsgVoteResp from 4 at term 4
raft2020/01/14 18:52:38 INFO: 4 [logterm: 3, index: 6070] sent MsgVote request to 2 at term 4
raft2020/01/14 18:52:38 INFO: 4 [logterm: 3, index: 6070] sent MsgVote request to 3 at term 4
raft2020/01/14 18:52:38 INFO: 4 [logterm: 3, index: 6070] sent MsgVote request to 1 at term 4
raft2020/01/14 18:52:38 INFO: raft.node: 4 lost leader 2 at term 4
raft2020/01/14 18:52:39 INFO: 4 [term: 4] ignored a MsgApp message with lower term from 2 [term: 3]
raft2020/01/14 18:52:39 INFO: 4 received MsgVoteResp from 2 at term 4
raft2020/01/14 18:52:39 INFO: 4 [quorum:3] has received 2 MsgVoteResp votes and 0 vote rejections
raft2020/01/14 18:52:40 INFO: 4 is starting a new election at term 4
raft2020/01/14 18:52:40 INFO: 4 became candidate at term 5
raft2020/01/14 18:52:40 INFO: 4 received MsgVoteResp from 4 at term 5
raft2020/01/14 18:52:40 INFO: 4 [logterm: 3, index: 6070] sent MsgVote request to 1 at term 5
raft2020/01/14 18:52:40 INFO: 4 [logterm: 3, index: 6070] sent MsgVote request to 2 at term 5
raft2020/01/14 18:52:40 INFO: 4 [logterm: 3, index: 6070] sent MsgVote request to 3 at term 5
raft2020/01/14 18:52:41 INFO: 4 is starting a new election at term 5
raft2020/01/14 18:52:41 INFO: 4 became candidate at term 6

Steps to reproduce the behaviour

1.Deploy the following smart contract.
2.Use 1000 different address to send 250 "create" contract function transaction to each node in one second last for more than 2 hours.
Note:
We have tried blockTime 50ms/250ms, and the same problem occurs.

Smart Contract

pragma solidity ^0.4.25;

contract TestContract {

    uint SNLength = 0;

    function create(bytes32 SN, bytes32 operator, bytes32 dataHash, uint timestamp) external returns(SNLog) {
        SNLength++;
        return new SNLog(SN, operator, dataHash, timestamp);
    }

    function retrieveSNLength() external view returns (uint) {
        return SNLength;
    }
}

contract SNLog {
    
    bytes32 public SN;
    bytes32 operator;
    bytes32 dataHash;
    uint timestamp;

    constructor(bytes32 _SN, bytes32 _operator, bytes32  _dataHash, uint _timestamp) public {
        SN = _SN;
        operator = _operator;
        dataHash = _dataHash;
        timestamp = _timestamp;
    }
}
@amalrajmani
Copy link
Contributor

@WesleyLiu0717 can you share more details about how you were sending transactions ?

@WesleyLiu0717
Copy link
Author

Hi @amalrajmani,

The code has been pushed in the following repository:
https://github.com/WesleyLiu0717/quorum-raft-experiment/tree/master/bin

Two files are needed:
opt.json:

{
  "coinbase": "0xtestAddress",
  "serv": {
    "hosts": [
      "ip1",
      "ip2",
      "ip3",
      "ip4"
    ],
    "rpc_port": "rpcport",
    "ws_port": "wsport"
  },
  "signature": {
    "pass_phrase": "YourPassWord"
  },
  "sol_settings": {
    "compiler_version": "v0.4.25+commit.59dbf8f1"
  },
  "smart_contract": {
    "abi": "contract abi",
    "address": "contract address",
    "bytecode": "contract bytecode",
    "document": {
      "id_beginning": 10000
    },
    "employee": {
      "id_beginning": 1012200,
      "overall_amount": 1000
    }
  }
}

privateKeys.json:

[
{
    "address": "testAddress1",
    "private_key": "testPrivateKey1"
},
{
    "address": "testAddress2",
    "private_key": "testPrivateKey2"
}
]

(You need to prepare 1000 accounts for 1000 Tx/s)

  1. Use deploy.js to deploy the smart contract
  2. Use command "node index.js -a 0 -c 1000 -r 14400" to start the test
    (-c 1000 => Input rate 1000Tx/s, -r 7200 => duration(s) last for 4 hours)

@vsmk98
Copy link
Contributor

vsmk98 commented Jan 21, 2020

@WesleyLiu0717 thanks for sharing all the information. We were doing some simulations in our stress test environment. Though there were leader elections, none of these loops were beyond 100. However from the logs shared I notice that the leader election has run for several hours in your case before the network started processing transactions again. From the logs I also notice that node4 initiates the leader election and the other nodes respond immediately. However, the response is not processed by node4 which results in another leader election loop. There is few hours delay in processing the response from other nodes which has resulted in this loop.
We are trying replicate the same with your scripts in our environment and will revert back with updates.

@vsmk98
Copy link
Contributor

vsmk98 commented Jan 28, 2020

@WesleyLiu0717 we have been able to reproduce the issue at our end and we are checking for possible solutions. Will revert back. Thanks for your patience.

@WesleyLiu0717
Copy link
Author

@vsmk98 Thanks! We will continue the experiments. if there is any new result, I'll update here.

@amalrajmani
Copy link
Contributor

amalrajmani commented Feb 3, 2020

@WesleyLiu0717 we have executed several stress tests in the last few days and based on the test result we feel that the issue is with the way web3.js handles websockets where in it waits for the response after every SendTransaction call and holds the resources. This leads to machine being constrained of resources resulting in the nodes going out of sync and leader re-election loop at Raft level. We have executed the following tests with JMeter and Chainhammer

  • With HTTP end point - the network worked fine with a TPS of 1500+ with very few leader re-election
  • With Websocket end point - (please note that JMeter does not support Websocket by default. We used JMeterWebSocketSampler plugin with JMeter). For the first run, we renewed the Websocket connection for every call. The network was stable and the TPS observed was 1000+
  • With Websocket end point without renewing the Websocket connection - This resulted in nodes going out of sync and leader election getting triggered after sometime.

Can you please execute your tests with JMeter HTTP provider or Websocket and let us know the test results?

@WesleyLiu0717
Copy link
Author

@amalrajmani
We will try to use the following setting to execute the test

  1. Use web3 websocket provider to send the transaction and disconnect it in each round
  2. Use http provider to send the transaction

And there are some questions about your test

  1. Are your transactions (1000+ TPS) ether transfers or contract transactions?
  2. We ran the test with 1000 TPS and the node failed after 1.5 hour, how long does your test run?

@amalrajmani
Copy link
Contributor

@WesleyLiu0717
Please find the details below:

  1. I tested with simple storage contract creation calls only
  2. I ran Jmeter test with Http provider for over 1.5hrs and with websocket provider with connection renewal for more than 4hrs

Please note that my network set up is as follows:
4 nodes (each node is running on a separate AWS EC2 instance) and JMeter test was running from another AWS EC2 instance

@WesleyLiu0717
Copy link
Author

Hi, @amalrajmani
We followed your suggestion to do the experiment. (But we do not use JMeter, we are surveying it.)
The results are:

  1. We replace websocket with http, Invalid JSON RPC response: "" occurs.
    The leader change loop also occurs, but not so serious as before. (Will find the leader in 3 minutes)
  2. We reconnect websocket (web3.currentProvider.connection.close()) every 10 minutes, the experiment still has problems as before.

There are some observations of the experiments:

  1. The timestamps of "Imported new chain segment" in each node are not consistency, especially in the computers with different specifications. (https://github.com/WesleyLiu0717/quorum-raft-experiment blockInfo)
    Even if the raft nodes run in the same 4 google cloud computers (n1-highmem-2), this problem still occurs.
    ex: Node1 imported 91489 block at 02-06|16:27:35.049, but the node3 imported it at 02-06|16:35:47.831.
  2. When the raft node restarts, the synchronization is very slow.

I'm not sure if the above observations are related to this issue.
If the log is needed, please tell us keywords which you are interested in from the log.
Many thanks!

@amalrajmani
Copy link
Contributor

amalrajmani commented Feb 7, 2020

Hi @WesleyLiu0717,
Please note that, for both websocket and httpprovider, sendTransaction returns a promise event and waits for response to come back from node. This can cause performance bottlenecks.
Further, there is a known issue with websocket handling by web3js [https://github.com/web3/web3.js/issues/2882]. Even when you are closing the socket connection after 10 mins, this potentially still can cause the performance issue that you are observing with websocket test.
Regarding the 2 observations:

  1. If leader election is happening and taking a long time, it is quite possible that one of the nodes is lagging behind and is out of sync with network and hence the import can happen at a much later time.
  2. For the slowness in sync, can you share geth logs for one of the nodes[with verbosity 3 or more]?

Many thanks!

@fixanoid
Copy link
Contributor

fixanoid commented Feb 7, 2020

Per @amalrajmani comment, if you are trying to get performance stats from Ethereum / Quorum, using a wrapper implementation such as web3js is going to be very slow -- it is simply not optimized for performance but for usability. If you are looking to get metrics of what a node / chain can do you should hit the JRPC end points directly

@WesleyLiu0717
Copy link
Author

Hi @amalrajmani @fixanoid,
Thanks for your suggestion,I replaced web3 sending transaction with JRPC end points.
Under 1000 Tx/s input put rate, the leader change seems stable.
Since the leader change is stable, one node stuck after 4 hours. I think maybe 1000Tx/s(each transaction costs 262059 gas) is too high for the system, I'll try using lower input rate. If there is still some problem in lower input rate, I'll update here.

For the slowness in sync, it looks faster after replacing web3 with JRPC.
But the syncing speed is slower than TPS in our experiment, the syncing nodes may never reach the latest state. (TPS 750/s, syncing speed 2000+/10s)
Is the "fast" sync mode available in raft consensus?

INFO [02-11|16:46:12.362] Imported new chain segment               blocks=7  txs=1826  mgas=478.624  elapsed=9.022s    mgasps=53.049  number=32045 hash=9da1c1…0e9eff cache=268.25mB
INFO [02-11|16:46:22.787] Imported new chain segment               blocks=10 txs=2250  mgas=589.761  elapsed=10.425s   mgasps=56.570  number=32055 hash=84eb59…a18aa8 cache=268.27mB
INFO [02-11|16:46:34.306] Imported new chain segment               blocks=9  txs=2250  mgas=589.761  elapsed=11.518s   mgasps=51.201  number=32064 hash=482a82…c2adab cache=267.72mB
INFO [02-11|16:46:42.719] Imported new chain segment               blocks=8  txs=1754  mgas=459.751  elapsed=8.413s    mgasps=54.642  number=32072 hash=605b3c…869d04 cache=267.54mB

@vsmk98
Copy link
Contributor

vsmk98 commented Feb 12, 2020

Hi @WesleyLiu0717 - To support private transactions, where in the private state can be different across different nodes, Quorum does not use fast sync. Raft uses boundedFullSync which is a variation of fullSync. In boundedFullSync the block height to sync is passed by the Raft consensus engine.

@WesleyLiu0717
Copy link
Author

Hi,
We did many experiments in google cloud platform these days.
The transaction is signed by web3 websocket and sent by request-promise package(calling geth jrpc).
There are some results in the following table
Screen Shot 2020-02-24 at 5 35 14 PM

I'm not sure why the experiments are always stuck after we continuously send transactions for a period of time. Although we upgrade the hardware, the chain finally get stuck.

The following graph is the cpu usage of the experiments
Screen Shot 2020-02-24 at 5 41 37 PM

We grep 'err', 'raft', 'Imported' messages from the raft logs of these experiments and put in the following link: (The file is too big to put on the github. 1GB)
https://drive.google.com/open?id=1-WfxmLyyXUtwv9Nb_19VL6i8sRx4VAat

We want to use raft in our product. Is there any suggestion for the setting?

@vsmk98
Copy link
Contributor

vsmk98 commented Feb 26, 2020

Hi @WesleyLiu0717, I have looked at the logs and notice the following errors in the log:

  • nonce too high
  • nonce too low
  • gas limit reached

Also in all the runs, I see large number of leader re-elections.

Can you please share the the geth start parameters that you are using? I want to know the values you are passing for --miner.gastarget and --miner.gaslimit. Ideally the recommendation is to set these values in line with expected TPS.

Thanks

@WesleyLiu0717
Copy link
Author

Hi @vsmk98,
The following is the geth start parameters:

PRIVATE_CONFIG=ignore nohup geth --datadir node1/data --txpool.accountslots 1800 --txpool.globalslots  100000 --txpool.accountqueue 1800 --txpool.globalqueue 100000 --miner.gastarget 700000000 --miner.gaslimit 700000000 --nodiscover --verbosity 3 --networkid 10 --raft --raftport 50001 --raftblocktime 250 --rpc --rpcapi eth,net,personal,web3 --rpcaddr 0.0.0.0 --emitcheckpoints --port 30301 --ws --wsorigins "*" --wsaddr 0.0.0.0 >> log 2>&1 &

It costs 262059 gas in each test transactions, so there are at most 700000000/262059=2671.xx in one block. If more than 2671 transactions are packed in a block by minter, "gas limit reached" error occurs. I'm not sure the transaction, which is not packed in the block if this error occurs, is thrown away or still put in the transaction pool. If it is thrown away, I think this is why "nonce too high" error occurring because the nonce of our transactions are maintained in local.
I think that "nonce too low" error occurs when minter is doing "Speculative minting". Once a transaction is packed in a "future" block, the "nonce too low" error occurs when the minter receives the same transaction broadcasted from other nodes.

In my observations, leader re-election can soon finish at the beginning of the experiment. But after a period of time, it can't find the leader.

We are confused that why 300 input rate experiment can successfully run one hour, but it fail after three hours. We can't find a input rate such that the system can work continuously for a long time.

@vsmk98
Copy link
Contributor

vsmk98 commented Feb 26, 2020

Hi @WesleyLiu0717 - thanks for sharing the information. Its bit surprising for me as well as we had run with http calls for 4.5 hours and could see a TPS of 1000+ as mentioned by @amalrajmani earlier. Will it be possible for you to join Quorum slack channel - http://bit.ly/quorum-slack to discuss this further? Please let me know.
We are running some more tests and will revert back with our results.

@WesleyLiu0717
Copy link
Author

Hi @vsmk98,
I have joined the slack group.

Thanks!

@fixanoid
Copy link
Contributor

fixanoid commented May 5, 2020

Closing. Please reopen if more info is needed.

@fixanoid fixanoid closed this as completed May 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants