Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Archive node loses peers (stop syncing blocks) #10724

Closed
APshenkin opened this issue Jun 5, 2019 · 100 comments
Closed

Archive node loses peers (stop syncing blocks) #10724

APshenkin opened this issue Jun 5, 2019 · 100 comments
Labels
A3-stale 🍃 Pull request did not receive any updates in a long time. No review needed at this stage. Close it. F2-bug 🐞 The client fails to follow expected behavior. M4-core ⛓ Core client code / Rust.

Comments

@APshenkin
Copy link

  • Parity Ethereum version: 2.4.6
  • Operating system: Linux
  • Installation: docker
  • Fully synchronized: yes
  • Network: ethereum mainnet
  • Restarted: yes

Machine spec: 8 CPU 64GB ram, parity cache-size= 32GB

We have launched archive node and complete its sync. Today we update the node from 2.4.5. to 2.4.6 and run some software that fetch blocks information.

It's working, but peers connection is unstable and after some time node stop syncing blocks (it just displays Syncing 0.00 0.00 blk/s )

Below is examples of peers pump and down logs and last blocks (reversed order)

peers info



2019-06-05 20:32:52 UTC   12/25 peers     60 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 34380 µs
--
2019-06-05 20:32:12 UTC   10/25 peers     54 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 32475 µs
2019-06-05 20:31:40 UTC   10/25 peers     53 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 32475 µs
2019-06-05 20:31:05 UTC    9/25 peers     53 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 24494 µs
2019-06-05 20:30:34 UTC    9/25 peers     53 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 17362 µs
2019-06-05 20:29:55 UTC    9/25 peers     53 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 18661 µs
2019-06-05 20:29:20 UTC    9/25 peers     53 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    7 req/s, 17362 µs
2019-06-05 20:28:49 UTC    9/25 peers     52 MiB chain   16 MiB db  0 bytes queue   82 KiB sync  RPC:  1 conn,    1 req/s, 32642 µs
2019-06-05 20:28:15 UTC   10/25 peers     49 MiB chain   16 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 31118 µs
2019-06-05 20:27:41 UTC    6/25 peers     48 MiB chain   16 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    2 req/s, 25000 µs
2019-06-05 20:27:05 UTC    5/25 peers     48 MiB chain   16 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 25000 µs
2019-06-05 20:26:33 UTC    5/25 peers     48 MiB chain   16 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 20223 µs
2019-06-05 20:26:00 UTC    5/25 peers     48 MiB chain   16 MiB db  0 bytes queue   80 KiB sync  RPC:  1 conn,    1 req/s, 33003 µs
2019-06-05 20:26:00 UTC    4/25 peers     48 MiB chain   16 MiB db  0 bytes queue   80 KiB sync  RPC:  1 conn,    1 req/s, 33003 µs
2019-06-05 20:25:11 UTC    5/25 peers     48 MiB chain   16 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 33558 µs
2019-06-05 20:24:33 UTC    6/25 peers     47 MiB chain   16 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 33558 µs
2019-06-05 20:23:53 UTC    8/25 peers     47 MiB chain   16 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 30569 µs
2019-06-05 20:23:53 UTC    8/25 peers     47 MiB chain   16 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 30569 µs
2019-06-05 20:23:11 UTC    8/25 peers     47 MiB chain   16 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 26161 µs
2019-06-05 20:22:43 UTC Updated conversion rate to Ξ1 = US$248.89 (19132568 wei/gas)
2019-06-05 20:22:40 UTC    7/25 peers     47 MiB chain   16 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 17618 µs
2019-06-05 20:22:16 UTC    8/25 peers     47 MiB chain   16 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 15189 µs
2019-06-05 20:21:37 UTC    8/25 peers     46 MiB chain   16 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    4 req/s, 21464 µs
2019-06-05 20:21:02 UTC    7/25 peers     44 MiB chain   15 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 25221 µs
2019-06-05 20:21:02 UTC    7/25 peers     44 MiB chain   15 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 25221 µs
2019-06-05 20:20:31 UTC    7/25 peers     43 MiB chain   15 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 25221 µs
2019-06-05 20:19:50 UTC    5/25 peers     43 MiB chain   15 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    3 req/s, 25221 µs



stuck on block


2019-06-05 20:36:00 UTC Syncing #7901361 0xd3df…ad5c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7895893   21/25 peers     62 MiB chain   17 MiB db  0 bytes queue  443 MiB sync  RPC:  1 conn,    0 req/s, 46705 µs
2019-06-05 20:35:55 UTC Syncing #7901361 0xd3df…ad5c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7893988   21/25 peers     62 MiB chain   17 MiB db  0 bytes queue  355 MiB sync  RPC:  1 conn,    1 req/s, 46705 µs
2019-06-05 20:35:50 UTC Syncing #7901361 0xd3df…ad5c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7893988   21/25 peers     62 MiB chain   17 MiB db  0 bytes queue  176 MiB sync  RPC:  1 conn,    1 req/s, 46705 µs
2019-06-05 20:35:45 UTC Syncing #7901361 0xd3df…ad5c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7892083   20/25 peers     62 MiB chain   17 MiB db  0 bytes queue  147 MiB sync  RPC:  1 conn,    1 req/s, 38701 µs
2019-06-05 20:35:40 UTC Syncing #7901361 0xd3df…ad5c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7885225   21/25 peers     62 MiB chain   17 MiB db  0 bytes queue  551 MiB sync  RPC:  1 conn,    0 req/s, 38701 µs
2019-06-05 20:35:35 UTC Syncing #7901361 0xd3df…ad5c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7885225   19/25 peers     62 MiB chain   17 MiB db  0 bytes queue  401 MiB sync  RPC:  1 conn,    1 req/s, 38701 µs
2019-06-05 20:35:30 UTC Syncing #7901361 0xd3df…ad5c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7885225   19/25 peers     62 MiB chain   17 MiB db  0 bytes queue  318 MiB sync  RPC:  1 conn,    1 req/s, 38701 µs
2019-06-05 20:35:25 UTC Syncing #7901361 0xd3df…ad5c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7885225   19/25 peers     62 MiB chain   17 MiB db  0 bytes queue  212 MiB sync  RPC:  1 conn,    0 req/s, 33990 µs
2019-06-05 20:35:20 UTC Syncing #7901361 0xd3df…ad5c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7884971   17/25 peers     62 MiB chain   17 MiB db  0 bytes queue  140 MiB sync  RPC:  1 conn,    1 req/s, 33990 µs
2019-06-05 20:35:15 UTC Syncing #7901361 0xd3df…ad5c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7884971   16/25 peers     61 MiB chain   17 MiB db  0 bytes queue   60 MiB sync  RPC:  1 conn,    2 req/s, 33990 µs
2019-06-05 20:35:10 UTC Syncing #7901361 0xd3df…ad5c     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7884971   18/25 peers     61 MiB chain   17 MiB db  0 bytes queue   87 KiB sync  RPC:  1 conn,    1 req/s, 33990 µs
2019-06-05 20:35:08 UTC Syncing #7901361 0xd3df…ad5c     0.60 blk/s   59.2 tx/s    4.8 Mgas/s      0+    0 Qed  #7884971   17/25 peers     61 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    2 req/s, 32341 µs
2019-06-05 20:35:01 UTC Reorg to #7901360 0xba42…c2a0 (0x8d58…2732 #7901358 0x9dc9…e404 0x3b25…123b)
2019-06-05 20:35:00 UTC Syncing #7901359 0x8d58…2732     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    1 Qed  #7901360   13/25 peers     61 MiB chain   17 MiB db   98 KiB queue   80 KiB sync  RPC:  1 conn,    0 req/s, 33990 µs
2019-06-05 20:34:55 UTC Syncing #7901359 0x8d58…2732     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7901358   11/25 peers     61 MiB chain   17 MiB db  0 bytes queue   81 KiB sync  RPC:  1 conn,    1 req/s, 29244 µs
2019-06-05 20:34:51 UTC Syncing #7901359 0x8d58…2732     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7901358   11/25 peers     61 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 29244 µs
2019-06-05 20:34:46 UTC Syncing #7901359 0x8d58…2732     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7901358   11/25 peers     61 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 29244 µs
2019-06-05 20:34:45 UTC Syncing #7901359 0x8d58…2732     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7901358   11/25 peers     61 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 29244 µs
2019-06-05 20:34:45 UTC Syncing #7901359 0x8d58…2732     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7901358   10/25 peers     61 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 29244 µs
2019-06-05 20:34:44 UTC Syncing #7901359 0x8d58…2732     0.06 blk/s    6.6 tx/s    0.4 Mgas/s      0+    0 Qed  #7901358   11/25 peers     61 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 29244 µs
2019-06-05 20:34:30 UTC Imported #7901359 0x8d58…2732 (102 txs, 7.99 Mgas, 709 ms, 29.14 KiB)
2019-06-05 20:34:09 UTC Imported #7901358 0x9dc9…e404 (136 txs, 7.99 Mgas, 632 ms, 34.54 KiB)
2019-06-05 20:34:09 UTC   10/25 peers     61 MiB chain   17 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 23838 µs
2019-06-05 20:33:43 UTC Imported #7901357 0xec21…a3fd (118 txs, 8.00 Mgas, 563 ms, 28.45 KiB)

Also we have usual full node and it's not faced this issue (it's just continue syncing normaly)
After archive node restart it continuous syncing

Is it a bug? Or maybe we can do something to prevent restarting every two hours?

Thanks in advance

@APshenkin APshenkin changed the title Archive node loses peers (stop sincyng blocks) Archive node loses peers (stop syncing blocks) Jun 5, 2019
@APshenkin
Copy link
Author

image

So found another strange behaviour. It looks like archive node take all 32GB cache. And when it uses all cache then it stucks

@jam10o-new
Copy link
Contributor

@APshenkin if you specify a cache_size, parity will try to use all of that as cache, that is expected. Some memory will also be used beyond that during normal operation.

Things I am concerned about -
The node getting stuck after its cache fills
The large reorg that happened right before your node got stuck

@jam10o-new jam10o-new added F3-annoyance 💩 The client behaves within expectations, however this “expected behaviour” itself is at issue. M4-core ⛓ Core client code / Rust. labels Jun 5, 2019
@APshenkin
Copy link
Author

Yes I'm agreed, but it strange, that it stucks each time when cache fills (this was several times today already)

Feel free to ask any other questions

Will post some info here if this will occur again

@APshenkin
Copy link
Author

@joshua-mir So this occurs again


2019-06-06 02:10:45 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #49400   24/25 peers    410 MiB chain   49 MiB db  0 bytes queue   50 MiB sync  RPC:  1 conn,    2 req/s, 44139 µs
--
2019-06-06 02:10:40 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #49400   24/25 peers    402 MiB chain   49 MiB db  0 bytes queue   45 MiB sync  RPC:  1 conn,    1 req/s, 44139 µs
2019-06-06 02:10:35 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #49400   23/25 peers    381 MiB chain   49 MiB db  0 bytes queue   28 MiB sync  RPC:  1 conn,    0 req/s, 44139 µs
2019-06-06 02:10:30 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #46733   23/25 peers    366 MiB chain   49 MiB db  0 bytes queue   18 MiB sync  RPC:  1 conn,    1 req/s, 44139 µs
2019-06-06 02:10:25 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #32382   20/25 peers    351 MiB chain   49 MiB db  0 bytes queue   16 MiB sync  RPC:  1 conn,    0 req/s, 44139 µs
2019-06-06 02:10:20 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #57149   22/25 peers    341 MiB chain   49 MiB db  0 bytes queue   24 MiB sync  RPC:  1 conn,    2 req/s, 44139 µs
2019-06-06 02:10:15 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #55119   17/25 peers    340 MiB chain   49 MiB db  0 bytes queue   18 MiB sync  RPC:  1 conn,    1 req/s, 44139 µs
2019-06-06 02:10:10 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #47370   18/25 peers    337 MiB chain   49 MiB db  0 bytes queue   17 MiB sync  RPC:  1 conn,    0 req/s, 44139 µs
2019-06-06 02:10:05 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #36835   18/25 peers    334 MiB chain   49 MiB db  0 bytes queue   18 MiB sync  RPC:  1 conn,    1 req/s, 44139 µs
2019-06-06 02:10:00 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #54610   18/25 peers    333 MiB chain   49 MiB db  0 bytes queue   29 MiB sync  RPC:  1 conn,    0 req/s, 44139 µs
2019-06-06 02:09:55 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #54610   12/25 peers    328 MiB chain   49 MiB db  0 bytes queue   27 MiB sync  RPC:  1 conn,    1 req/s, 44139 µs
2019-06-06 02:09:50 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #54610   11/25 peers    328 MiB chain   49 MiB db  0 bytes queue   13 MiB sync  RPC:  1 conn,    1 req/s, 44139 µs
2019-06-06 02:09:45 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #50805   10/25 peers    327 MiB chain   49 MiB db  0 bytes queue   11 MiB sync  RPC:  1 conn,    0 req/s, 44102 µs
2019-06-06 02:09:40 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #45214    9/25 peers    327 MiB chain   49 MiB db  0 bytes queue   12 MiB sync  RPC:  1 conn,    1 req/s, 44102 µs
2019-06-06 02:09:35 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #38751    9/25 peers    326 MiB chain   49 MiB db  0 bytes queue    9 MiB sync  RPC:  1 conn,    0 req/s, 43872 µs
2019-06-06 02:09:30 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #35941    8/25 peers    326 MiB chain   49 MiB db  0 bytes queue    7 MiB sync  RPC:  1 conn,    1 req/s, 43872 µs
2019-06-06 02:09:25 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #29726    6/25 peers    325 MiB chain   49 MiB db  0 bytes queue    8 MiB sync  RPC:  1 conn,    0 req/s, 43800 µs
2019-06-06 02:09:20 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #25527    6/25 peers    324 MiB chain   49 MiB db  0 bytes queue    5 MiB sync  RPC:  1 conn,    2 req/s, 43800 µs
2019-06-06 02:09:15 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #19685    4/25 peers    324 MiB chain   49 MiB db  0 bytes queue    4 MiB sync  RPC:  1 conn,    2 req/s, 43800 µs
2019-06-06 02:09:10 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #16025    4/25 peers    324 MiB chain   49 MiB db  0 bytes queue    3 MiB sync  RPC:  1 conn,    1 req/s, 43800 µs
2019-06-06 02:09:05 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed    #11811    4/25 peers    324 MiB chain   49 MiB db  0 bytes queue    2 MiB sync  RPC:  1 conn,    1 req/s, 43800 µs
2019-06-06 02:09:00 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #9525    4/25 peers    324 MiB chain   49 MiB db  0 bytes queue    2 MiB sync  RPC:  1 conn,    1 req/s, 43328 µs
2019-06-06 02:08:55 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #6731    3/25 peers    324 MiB chain   49 MiB db  0 bytes queue    1 MiB sync  RPC:  1 conn,    1 req/s, 43328 µs
2019-06-06 02:08:50 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed     #4462    3/25 peers    324 MiB chain   49 MiB db  0 bytes queue    1 MiB sync  RPC:  1 conn,    2 req/s, 40224 µs
2019-06-06 02:08:45 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed      #896    5/25 peers    320 MiB chain   49 MiB db  0 bytes queue    1 MiB sync  RPC:  1 conn,    1 req/s, 40224 µs
2019-06-06 02:08:40 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed        #0    3/25 peers    320 MiB chain   49 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 40224 µs
2019-06-06 02:08:35 UTC Syncing #7902878 0x4a3b…110b     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+    0 Qed  #7902859    3/25 peers    319 MiB chain   49 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 33841 µs
2019-06-06 02:08:10 UTC    0/25 peers    319 MiB chain   49 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 33841 µs
2019-06-06 02:08:00 UTC Imported #7902878 0x4a3b…110b (184 txs, 6.98 Mgas, 688 ms, 24.98 KiB)
2019-06-06 02:07:46 UTC Imported #7902877 0xb5de…c514 (35 txs, 3.82 Mgas, 224 ms, 7.07 KiB) + another 1 block(s) containing 207 tx(s)
2019-06-06 02:07:31 UTC    1/25 peers    318 MiB chain   49 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 33841 µs
2019-06-06 02:07:20 UTC Imported #7902874 0xc496…7548 (129 txs, 7.99 Mgas, 3102 ms, 20.17 KiB) + another 6 block(s) containing 765 tx(s)
2019-06-06 02:07:07 UTC Reorg to #7902867 0xd356…d615 (0x8082…84ab #7902866 0x794a…7153 )
2019-06-06 02:06:55 UTC    0/25 peers    318 MiB chain   48 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    0 req/s, 38258 µs
2019-06-06 02:06:24 UTC    1/25 peers    318 MiB chain   48 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 38258 µs
2019-06-06 02:05:32 UTC Imported #7902867 0x8082…84ab (62 txs, 7.99 Mgas, 415 ms, 7.37 KiB)
2019-06-06 02:05:31 UTC    4/25 peers    318 MiB chain   48 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    3 req/s, 30119 µs
2019-06-06 02:05:31 UTC    4/25 peers    318 MiB chain   48 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    3 req/s, 30119 µs
2019-06-06 02:05:31 UTC Imported #7902866 0x794a…7153 (138 txs, 7.99 Mgas, 582 ms, 20.54 KiB)
2019-06-06 02:05:19 UTC Imported #7902862 0x91d9…c746 (93 txs, 8.01 Mgas, 894 ms, 15.32 KiB) + another 1 block(s) containing 139 tx(s)
2019-06-06 02:05:04 UTC Imported #7902860 0x34ce…49cc (203 txs, 8.00 Mgas, 564 ms, 34.38 KiB)
2019-06-06 02:04:40 UTC    4/25 peers    317 MiB chain   48 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    2 req/s, 21974 µs
2019-06-06 02:04:40 UTC    4/25 peers    317 MiB chain   48 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    2 req/s, 21974 µs
2019-06-06 02:04:40 UTC Imported #7902859 0x66cb…b7a0 (150 txs, 8.00 Mgas, 572 ms, 18.83 KiB)
2019-06-06 02:04:26 UTC Imported #7902858 0xbaed…6107 (242 txs, 7.92 Mgas, 504 ms, 36.30 KiB)
2019-06-06 02:04:11 UTC    4/25 peers    317 MiB chain   48 MiB db  0 bytes queue   79 KiB sync  RPC:  1 conn,    1 req/s, 18733 µs

And in this case parity wasn't use all memory

image

@jam10o-new jam10o-new added F2-bug 🐞 The client fails to follow expected behavior. and removed F3-annoyance 💩 The client behaves within expectations, however this “expected behaviour” itself is at issue. labels Jun 6, 2019
@APshenkin
Copy link
Author

APshenkin commented Jun 6, 2019

Also maybe it will help somehow: We use trace_replayBlockTransactions API for each new generated block.

@jam10o-new
Copy link
Contributor

that's an extremely heavy rpc method - it could very well be one of the causes of the problem here if you see it taking a while to return a response.

@APshenkin
Copy link
Author

APshenkin commented Jun 6, 2019

No, It's completes very fast. E.g. after syncing node after restart it completes ~ 150 success request per minute

@APshenkin
Copy link
Author

It's strange, but looks like increase disk space solve the issue.
There were ~120 GB free space. After increasing to ~ 500 GB issue was solved.
Our parity node doesn't stuck 4 hours already (previosly it stucks in 1-2 hours)

Also it's strange, but after increating disk space Active memory value was droped down 🤔
image

Will continue monitoring

@jam10o-new
Copy link
Contributor

Just to warn you, 500gb will not be sufficient for an archive node (once you are in sync) - you will need upwards of 2tb. You don't want to hit a full disk either because it's a known issue that full disks can cause database corruption.

@APshenkin
Copy link
Author

APshenkin commented Jun 6, 2019

Our node is already synced up 😄
Current disk size 3.328Tb

parity folder size 2.6Tb

@jam10o-new
Copy link
Contributor

Ah, you increased by 500gb, not to 500gb, misunderstood that 😅

@APshenkin
Copy link
Author

Yes, sorry for that 🤷‍♂️

@APshenkin
Copy link
Author

APshenkin commented Jun 6, 2019

So 9 hours without stopping blocks import!
I still think that it's strange behaviour (there was ~128 GB free disk space) but parity started working well only after increasing this value to ~ 500GB

@lacasian
Copy link

lacasian commented Jun 7, 2019

So 9 hours without stopping blocks import!
I still think that it's strange behaviour (there was ~128 GB free disk space) but parity started working well only after increasing this value to ~ 500GB

We've got more than 1TB free and I've seen parity getting stuck this morning. Doesn't seem like the free space has any influence on the issue.

@APshenkin
Copy link
Author

Agreed. Today we rebooted our node and after this we noticed, that after blocks syncing it lost all peers and stucks again (

@dafky2000
Copy link

Related to #10626. I am having the exact same issue - 800GB free disk on 2 x 2TB SSD in raid0.

@iFA88
Copy link

iFA88 commented Jun 17, 2019

Hi! I have like the same issue with my archive node, but after some time the peers come back and the parity prints some empty newlines instead of status message.

Here is the log, the issue starts around 2019-06-17 04:21:49, please take a look:
https://www.fusionsolutions.io/doc/ethlog.tar.gz

This happens weekly 1-2 times.

@iFA88
Copy link

iFA88 commented Jun 17, 2019

Today the node has stopped about 10 times. I have added reversed peer which points to my other parity node on the same server, so have "survived" many peer drop period. But now has been stopped syncing:

2019-06-17 21:19:57  Syncing #7977726 0x4195…29fe     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+   20 Qed  #7977745   42/256 peers    482 MiB chain   80 MiB db    3 MiB queue  155 KiB sync  RPC:  0 conn,   97 req/s,
2019-06-17 21:19:57  Syncing #7977726 0x4195…29fe     0.00 blk/s    0.0 tx/s    0.0 Mgas/s      0+   20 Qed  #7977745   42/256 peers    482 MiB chain   80 MiB db    3 MiB queue  155 KiB sync  RPC:  0 conn,   97 req/s,

htop says that the node uses a lot of disk r/w:

NI   PID USER      PRI  VIRT   RES CPU% MEM%   TIME+     DISK R/W NLWP Command
  0  5744 nodes      20 15.5G 12.1G 31.0 19.2  3h23:18  136.54 M/s   66 parity --reserved-p

Only the restart helps, if the devs needs any trace i can help.

@grbIzl
Copy link
Collaborator

grbIzl commented Jun 20, 2019

@iFA88 @APshenkin can you please reproduce the issue with this trace enabled: -l sync=trace (add this to your Parity's startup parameters)? Thanks!

@iFA88
Copy link

iFA88 commented Jun 20, 2019

@iFA88 @APshenkin can you please reproduce the issue with this trace enabled: -l sync=trace (add this to your Parity's startup parameters)? Thanks!

Yes, I ran now with that parameter.

When i use the following parameters --interface local --no-discovery --reserved-only and give reversed peer to my other full node, then it works fine.

UPDATE
May this flag is not an option.. that generates too much logs (2 mb / min) no one ever will read that, even i don't see if hangs or not.

@grbIzl
Copy link
Collaborator

grbIzl commented Jun 21, 2019

May this flag is not an option.. that generates too much logs (2 mb / min) no one ever will read that, even i don't see if hangs or not.

I will try to manage it.

@iFA88
Copy link

iFA88 commented Jun 21, 2019

About 24 hour log: https://www.fusionsolutions.io/doc/paritylog.tar.gz (1.2gb)
The node is currently in sync, i don't see when the node loses the sync during the time.

@grbIzl
Copy link
Collaborator

grbIzl commented Jun 21, 2019

@iFA88 Thanks for the log! As I can see, the sync process in it is happening without stuck:

  • The logging ends at 2019-06-21 17:21:40
  • The last imported block before it was at:
    2019-06-21 17:21:33 Verifier ChainConfig #11 INFO import Imported #8002255 0x63ba…08ed (181 txs, 7.99 Mgas, 1412 ms, 33.55 KiB)
    Your node seems to be fully synced (the last block on mainnet right now is Trie #8,002,908)

@iFA88
Copy link

iFA88 commented Jun 21, 2019

I have written now a script what gets the current block in every minute from my nodes, example:
[2019-06-21 20:05:36.065820] OK [archive: 8002961] [full: 8002961] [peers: 30/50]
When its stuck we can see them easy.

@iFA88
Copy link

iFA88 commented Jun 24, 2019

Hey, I think i have reproduced a little sister of the issue:
https://www.fusionsolutions.io/doc/archivelog.tar.gz
Little sister, because in 5 mins the sync has continued, but the peers has been dropped/disconnected. I hope it helps something. Previous when the sync has been stopped and all peer dropped, then I got only peers who not synced full for hours.

@grbIzl
Copy link
Collaborator

grbIzl commented Jul 16, 2019

@iFA88 can you please help with testing of the next patch? This time I'd like to apply the patch to your full node, the binary is here: https://gitlab.parity.io/parity/parity-ethereum/-/jobs/185203/artifacts/download Please keep the previously patched version for the archive node (if possible) and run both nodes with the same trace parameters as before (sync for archive, usual logging for full). In this build (except patch itself) one more log message added, that can spam your full node's log a little bit. I hope, that it's ok for you.

@iFA88
Copy link

iFA88 commented Jul 17, 2019

So its something went very wrong..
What I have now:
archiveNode => specParity (Parity-Ethereum/v2.7.0-nightly-b82b0ad-20190709/x86_64-linux-gnu/rustc1.36.0)
fullNode => specialFullParity (Parity-Ethereum/v2.7.0-nightly-5277eb3-20190716/x86_64-linux-gnu/rustc1.36.0)

Logs:
https://www.fusionsolutions.io/doc/speclog.tar.gz

Result:
The fullNode works with few new log output. The archive node runs for few mins and quits, I have tried restart again and again, but nothing helps.

@grbIzl
Copy link
Collaborator

grbIzl commented Jul 17, 2019

@iFA88 Thank again! Your help is invaluable! Is it possible to launch a short log session with both nodes patched and sync trace enabled on both as well? Just run them both and wait 1-2 mins

@iFA88
Copy link

iFA88 commented Jul 17, 2019

Here is:
https://www.fusionsolutions.io/doc/specissue2.tar.gz

@grbIzl
Copy link
Collaborator

grbIzl commented Jul 17, 2019

@iFA88 you're saying, that archive node quits? Can you please explain in more details? Does it panic? How does it look like? From the logs it seems, that the node was simply shut down after several seconds :-(

@iFA88
Copy link

iFA88 commented Jul 17, 2019

@iFA88
Copy link

iFA88 commented Jul 17, 2019

I will try recover my parity database. May that was for that I can only shut down the parity with SIGKILL..

UPDATE:
I need resync from scratch, it will take about 10 days when sync issue doesn't appears.

@grbIzl
Copy link
Collaborator

grbIzl commented Jul 17, 2019

I need resync from scratch, it will take about 10 days when sync issue doesn't appears.

:-( I would also recommend check your disk and memory, in order to make sure, that it was not the reason for the corruption (fsck and memtest).

@iFA88
Copy link

iFA88 commented Aug 2, 2019

Still syncing.. It is on #5805037 block now.. At least going without issue :)

@grbIzl
Copy link
Collaborator

grbIzl commented Aug 13, 2019

Hey @iFA88 ! Has the sync completed?

@iFA88
Copy link

iFA88 commented Aug 13, 2019

@grbIzl Still syncing.. best block is 6743002 now.

@iFA88
Copy link

iFA88 commented Aug 13, 2019

Good that the node has cached many blocks, because the archive node drops lot of times my full node. I database is now 1.7T, so about 15% missing from the full size.. about 6-10 days.

@mnaboka
Copy link

mnaboka commented Aug 20, 2019

hey @APshenkin , i am facing the exact same problem with parity.
I noticed you have a nice graph of cache usage , would you mind sharing what tools did you use for this graph. Thanks!

@iFA88
Copy link

iFA88 commented Aug 24, 2019

I got critical error which i can not handle during syncing.
2019-08-24 16:50:16 Block import failed for #7418712 (0xd32f…fab9): Parent not found (0xeef9…5f22)
I have put all log to the archive file and i have also put the rocksdb LOG file:
https://www.fusionsolutions.io/doc/archiveEth.tar.gz

Update:
The database has anyhow corrupted. Now they try to repair it. Disks and ram are okay.

Update 2:
There is no chance to repair the database, i need sync from the beginning.. Tried several times, but got only this:

2019-08-24 20:11:05  DB has been previously marked as corrupted, attempting repair
2019-08-24 23:47:35  DB corrupted: Invalid argument: You have to open all column families. Column families not opened: col6, col5, col1, col0, col3, col4, col2, attempting repair

@grbIzl
Copy link
Collaborator

grbIzl commented Aug 26, 2019

I got critical error which i can not handle during syncing.
2019-08-24 16:50:16 Block import failed for #7418712 (0xd32f…fab9): Parent not found (0xeef9…5f22)
I have put all log to the archive file and i have also put the rocksdb LOG file:
https://www.fusionsolutions.io/doc/archiveEth.tar.gz

It's very sad to hear :-( It seems, that you bumped the bug in rocksDB implementation facebook/rocksdb#3509 We see its sometimes, but still not able to address. I'm thinking, what I can do in order to help with resolution, but frankly there are not so many options available now (and all of them are not quick)

@iFA88
Copy link

iFA88 commented Sep 14, 2019

Greetings! My archive node still syncing, but now the same issue has happened on my full node:

https://www.fusionsolutions.io/doc/ethlog.tar.gz

The issue has started on 2019-09-14 00:12:22 The node has dropped all peers and then probably they had requested ancient blocks from the new peers..

This is the first time that this happened with my full node.
Parity-Ethereum/v2.5.7-stable-6bd7db9-20190829/x86_64-linux-gnu/rustc1.36.0

@grbIzl
Copy link
Collaborator

grbIzl commented Sep 16, 2019

The issue has started on 2019-09-14 00:12:22 The node has dropped all peers and then probably they had requested ancient blocks from the new peers..

Yes, I see this stuck in logs, but I cannot say, what caused it without corresponding log targets enabled. I suggest for the sake of simplicity to track this problem with the full node separately and create another issue, if it repeats.

@iFA88
Copy link

iFA88 commented Sep 16, 2019

Sadly it is very hard to reproduce with the full node. I think the problem is the same for both, If my archive node finish with the sync then i can apply trace outputs. I need at least 30 days for that.. :(

@iFA88
Copy link

iFA88 commented Oct 23, 2019

My archive node has finished syncing (Parity-Ethereum/v2.5.9-stable-06c7096-20190926/x86_64-linux-gnu/rustc1.36.0) After 20 min after the sync was done the issue has arrived. We can now talk about how we can debug this bug.

@grbIzl
Copy link
Collaborator

grbIzl commented Oct 23, 2019

My archive node has finished syncing (Parity-Ethereum/v2.5.9-stable-06c7096-20190926/x86_64-linux-gnu/rustc1.36.0) After 20 min after the sync was done the issue has arrived. We can now talk about how we can debug this bug.

Ok. For the start, can you please confirm, which exact versions run on your both nodes (full and archive)? Some time ago I made patches for both of them, did you apply these patches? If not, please don't for now. I'll try to figure out first, if it's possible to help you with backup.

@iFA88
Copy link

iFA88 commented Oct 23, 2019

I run Parity-Ethereum/v2.5.9-stable-06c7096-20190926/x86_64-linux-gnu/rustc1.36.0 for all nodes. I can backup my full node, but I can not do any backup for my archive node because it is 3.2t. Yes I can apply patched clients and run it.

@grbIzl
Copy link
Collaborator

grbIzl commented Oct 24, 2019

@iFA88 would it be ok for you to use our backup (made and backuped on Parity archive node)? We use 2.5.8 there, but the diff doesn't affect the db

@iFA88
Copy link

iFA88 commented Oct 24, 2019

When i understand correctly, you want download my archive node db and use it in your local?

@grbIzl
Copy link
Collaborator

grbIzl commented Oct 24, 2019

Actually vice versa. If anything happens on your archive node, you may download backup from us and apply it on your node.

@iFA88
Copy link

iFA88 commented Oct 24, 2019

Alright, please, let we talk on gitter.

@adria0 adria0 added the A3-stale 🍃 Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Jul 27, 2020
@adria0 adria0 closed this as completed Jul 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A3-stale 🍃 Pull request did not receive any updates in a long time. No review needed at this stage. Close it. F2-bug 🐞 The client fails to follow expected behavior. M4-core ⛓ Core client code / Rust.
Projects
None yet
Development

No branches or pull requests

9 participants