Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

私链部署:因断电导致主机全部关机,启动后所有节点停止出块。 #152

Open
jerk188 opened this issue Oct 11, 2021 · 1 comment
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@jerk188
Copy link

jerk188 commented Oct 11, 2021

System information

自己部署了一条私链,但是因电源问题。导致所有节点主机全部关机,等待电源恢复时,主机启动发现节点进程无法正常启动,通过日志排查,发现可能是出现了数据紊乱,日志下图:

image

于是将data/platon目录下的wal目录重命名为wal.bak,将节点进程进行重启。重启后发现启动成功,但是全部节点停止在一个块高上,经排查所有节点连接数正常。
检查所有共识节点(控制台命令:alaya attach http://localhost:6789 -exec 'debug.consensusStatus().state.view.epoch')发现得到的值全部是7828。
检查所有共识节点(控制台命令:alaya attach http://localhost:6789 -exec 'debug.consensusStatus().state.view.viewNumber')发现得到的值有3有4,且数值没有变化。
得到4值的节点主机日志发现包含‘Not found lastViewChangeQC’的日志行数在不停增长。

@jerk188 jerk188 added the bug Something isn't working label Oct 11, 2021
@niuxiaojie81
Copy link
Collaborator

解决方案:
1、当链上数据和wal数据出现不一致情况时,以链上数据为准,回滚wal数据(如果网络中大部分节点都出现这种情况,那么可能造成区块分叉,因为会丢失重启之前已经确认的区块)
2、节点无法从view-3切换到view-4,是因为view-4节点的内存中没有view-3的viewChangeQC,在viewChange的同步逻辑中此处可以优化成先从内存取,取不到再从wal中取一次

@benbaley benbaley added the enhancement New feature or request label Oct 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants