-
Notifications
You must be signed in to change notification settings - Fork 1.7k
2.7.2 sync blocks extremely slow #11494
Comments
Turned on debug logging for sync module and I can't find any problem in there other than it is deactivating peers all the time?
|
I started another node to re-sync the entire chain with the exact same pod specs/docker image/configuration file, and weirdly it is able to utilize more CPUs and sync initially.
and
However, it ran into same issue after a while and now it looks like the same as my original nodes
and
Not sure if it is relevant at all, but I use a custom Dockerfile which just downloads the executable as following
|
The |
@dvdplm Thanks for replying! Unfortunately, due to the nature of the project I am working on, we need to keep 3 days worth of state data to be able to replay transactions. We have been running previous versions of Parity with the same pruning history for almost 2 years and never had any problem with it. With that said, I will try reducing the pruning history and get back to you here. I am just wondering if I reduce the pruning history right now and increase it later, will the node be able to get back to those state data retroactively? Or will it just then continue storing more since the point when I turn it back? |
@dvdplm I reduced my configuration to the following as you suggested # This config should be placed in following path:
# ~/.local/share/io.parity.ethereum/config.toml
[parity]
# Ethereum Mainnet Network
chain = "mainnet"
# Parity continously syncs the chain
mode = "active"
# No updates will be auto-installed
auto_update = "none"
# Disables auto downloading of new releases. Not recommended.
no_download = true
# Blockchain and settings will be stored in /data.
base_path = "/data"
[rpc]
# Allows Cross-Origin Requests from domain '*'.
cors = ["*"]
# JSON-RPC will be listening for connections on IP all.
interface = "all"
# Only selected APIs will be exposed over this interface.
apis = ["web3", "eth", "net", "traces"]
# Allow connections only using specified addresses.
hosts = ["all"]
# JSON-RPC over HTTP will be accessible on port 9091.
port = 9091
[websockets]
# JSON-RPC will be listening for connections on IP all.
interface = "all"
# Allows connecting from Origin 'all'.
origins = ["all"]
# Only selected APIs will be exposed over this interface.
apis = ["pubsub"]
# JSON-RPC over WebSockets will be accessible on port 9092.
port = 9092
[ipc]
# You won't be able to use IPC to interact with Parity.
disable = true
[footprint]
# Compute and Store tracing data. (Enables trace_* APIs).
tracing = "on"
# Prune old state data. Maintains journal overlay - fast but extra 50MB of memory used.
pruning = "fast"
# Will keep up to 2000 old state entries.
pruning_history = 100 The node now is not syncing at all
Is it possible that my database is corrupted as a result of upgrading from 2.5.13 to 2.7.2 in a way which does not crash Parity but only significantly slows it down? If that is the case, this might be reproducible as both of my nodes are having the same issue. |
I have the same problem when i use the docker image from "btccom/parity:v2.5.13-btcpool".here is my config.toml and logs:
the sync speed even cannot catch up mainnet's . |
what can i do for that? |
No, once pruned there is no way to get the pruned state back; it will, as you say, continue storing more from the point you increase it again. We have upgraded all of out bootnodes from 2.5 to 2.7 and have not seen these issues, so whatever this is, I don't think it affects everyone (making it trickier to debug ofc). @sammy1991106 Do you think you can share your rocksdb |
@dvdplm In the meanwhile is it doable to warp sync to a specific block X, and then turn on tracing and full sync from there? |
I run another parity node like the follows: ExecStart=/usr/bin/parity --author "*******************" --mode active --pruning fast --db-compaction ssd --cache-size 1024 --config /etc/parity/config.toml --log-file /root/parity.log and here is my config.toml
the log sa folllows:
the parity version is:
the same result that syc very very slowly. hope response ,thank you . |
I'm afraid that is not currently possible. I don't know of a deep technical reason for that though so I think it can be done. But it's not a solution to your current problem though. |
@EthanHee so your node is not a tracing node then? And using 2.5 it syncs significantly quicker? |
@dvdplm I have re-synced two
Let me know if you find anything. We would love to provide logs, debug and PR to help to solve the issue. |
Do I understand you correctly that here the two nodes are not archive nodes and not tracing, just plain old vanilla full nodes? I have benchmarked an archive+tracing node at both 2.5 and 2.7 today and I have not found any significant performance difference but my node is not fully synced. I have ~2.5 million blocks in the DB. Here are some numbers running
Do you think you could run EDIT: I ran my benchmark with |
Hey @dvdplm, I'm working with @sammy1991106 on this. While we wait for him to get you those numbers, I will say that one of the main concerns is performance under load. It seems as though In other words, I suspect that a single request on a single cpu while in offline mode will not yield a significant difference. @sammy1991106, I'll let you prove or disprove my hypothesis. |
Lovely, this definitely could use more eyes. One place to look is the switch to
I agree; what I was after is a confirmation that what I'm seeing on my system matches up with what you guys are seeing. If otoh we have a severe regression on a single query on a single thread I'd be more inclined to look into the database layer. |
Have to re-sync to get the block Block: $ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION.json
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 27.2M 100 27.2M 100 122 13.3M 59 0:00:02 0:00:02 --:--:-- 13.3M
real 0m2.130s
user 0m0.005s
sys 0m0.035s
$ echo $BLOCK $IMAGE
0x9146a2 parity-2.5.13 $ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION.json
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 27.2M 100 27.2M 100 122 4716k 20 0:00:06 0:00:05 0:00:01 7521k
real 0m5.994s
user 0m0.008s
sys 0m0.037s
$ echo $BLOCK $IMAGE
0x9146a2 parity-2.6.8 $ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION.json
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 27.2M 100 27.2M 100 122 11.0M 49 0:00:02 0:00:02 --:--:-- 11.0M
real 0m2.557s
user 0m0.004s
sys 0m0.039s
$ echo $BLOCK $IMAGE
0x9146a2 parity-2.7.2 and block $ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION.json
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 17.9M 100 17.9M 100 122 4248k 28 0:00:04 0:00:04 --:--:-- 4248k
real 0m4.396s
user 0m0.006s
sys 0m0.026s
$ echo $BLOCK $IMAGE
0x914699 parity-2.5.13 $ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION.json
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 17.9M 100 17.9M 100 122 4360k 28 0:00:04 0:00:04 --:--:-- 4360k
real 0m4.274s
user 0m0.006s
sys 0m0.023s
$ echo $BLOCK $IMAGE
0x914699 parity-2.6.8 $ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION.json
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 17.9M 100 17.9M 100 122 5255k 34 0:00:03 0:00:03 --:--:-- 5255k
real 0m3.576s
user 0m0.010s
sys 0m0.023s
$ echo $BLOCK $IMAGE
0x914699 parity-2.7.2 I noticed that if I run the query twice, I get a significant speed-up, so obviously there's some caching going on so I'm not sure that running a proper load test really gives useful results beyond the 99 percentile. @dvdplm, I'll do the same thing for the block you were using too. |
Here are the "benchmarks" with block $ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION-$BLOCK.json
Warning: The file name argument '-0x1a3d52.json' looks like a flag.
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 50.6M 100 50.6M 100 122 15.4M 37 0:00:03 0:00:03 --:--:-- 15.4M
real 0m3.279s
user 0m0.008s
sys 0m0.066s
$ echo $BLOCK $IMAGE
0x1a3d52 parity-2.5.13 $ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION-$BLOCK.json
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 50.6M 100 50.6M 100 122 9.9M 24 0:00:05 0:00:05 --:--:-- 13.0M
real 0m5.090s
user 0m0.008s
sys 0m0.066s
$ echo $BLOCK $VERSION
0x1a3d52 parity-2.6.8 $ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION-$BLOCK.json
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 50.6M 100 50.6M 100 122 16.8M 40 0:00:03 0:00:02 0:00:01 16.8M
real 0m3.004s
user 0m0.008s
sys 0m0.068s
$ echo $BLOCK $VERSION
0x1a3d52 parity-2.7.2 Interestingly enough, I decided to run that last query a few more times to see if there was any speed up and it didn't improve by much, below is the fastest attempt: $ time curl -H 'Content-type: application/json' -X POST -d '{"method":"trace_replayBlockTransactions","params":["'$BLOCK'", ["stateDiff", "vmTrace","trace"] ],"id":1,"jsonrpc":"2.0"}' http://localhost:9091 -o $VERSION-$BLOCK.json
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 50.6M 100 50.6M 100 122 20.1M 48 0:00:02 0:00:02 --:--:-- 20.1M
real 0m2.662s
user 0m0.012s
sys 0m0.071s
$ echo $BLOCK $VERSION
0x1a3d52 parity-2.7.2 Since the original posting of this issue, we have found optimizations related to our |
While it's possible you're seeing a deadlock I would be more inclined to suspect thread starvation, where too small a pool of threads is continuously busy answering RPCs requests rather than making progress on sync. That's a guess though at this stage. |
We actually saw the syncing issue while the rpc server was off. |
Same issue here RPC server is on, but not used. Initial download of blockchain took more than 2 weeks... |
Also have this issue - Is there any way to revert the DB change and go back to 2.5.13 with the data? Early testing did not show this bug but now due to a stuck sync and getting behind it is a major issue as 2.7.2 cannot catch back up. @dvdplm or anyone? any help or suggestions is greatly appreciated. |
any updated on this one? |
I got same issue. I have two nodes running on aws, one is a m5large vm and the other is run in a kubernetes cluster. |
I got the same question with parity 2.7.2. |
update from my side
Update from my side: |
I have the same issue with Parity 2.7.2. It started misbehaving all of a sudden, and it syncs extremely slowly and has "slow fetch" messages as well. I'm a bit afraid of downgrading though because I don't want to make the problem worse, I need this node to be working as soon as I can... What is the general advice in this case? Is the downgrade "safe"? |
2.7.2 version... 2020-04-30 08:25:49 UTC IO Worker #1 WARN sync 9 -> GetNodeData: item 69/384 – slow state fetch for hash 0x60ffe6e056a92b05df4a3de978857e5bdf80c8110165a260df9d092aadeec00d; took Instant |
Just an update, we are still having the same issue with open-ethereum 3.0.0 with the following configuration
|
@sammy1991106 |
@jiangxjcn I would agree if that |
Redacted comment since it was rather large and later understood as off topic. See https://github.com/openethereum/openethereum/issues/11494#issuecomment-636523424. |
@sammy1991106 didn't you notice the opposite (it synced no-problem) on |
For Ethereum Classic, probably related: https://github.com/openethereum/openethereum/issues/11104 |
@richardpringle No, I didn't. Both 2.7.2 and 3.0.0 work fine for ethereum-classis's Mainnet and Kotti, as well as ethereum's Ropsten. |
same issue here |
FYI just upgraded my 2.6.8 to 3.0.1 and from 1 miserable peer it currently syncing with 17/25 peers.. |
For me, 2.7.2 just completely stops syncing every few days. The server still runs and RPC calls are still answered, but it just doesn't get any new blocks. Even though it shows 10+ connections. I have full debug logs of a few instances, but they don't seem to say anything interesting. I think it got better when I enabled Edit: probably more relevant to #11737 |
I saw thie errors on OpenEthereum:3.0.1 Any ideas how to resolve this? |
Will try thanks! |
@avolkov-dev yes the script from @phiresky works well for us - in production we ensure the script is running with monit and have the systemd service file for parity set to restart always. |
I have been running 2 parity nodes for ethereum mainnet in a Kubernetes cluster, and just upgraded them from 2.5.13 to 2.7.2. Now syncing blocks become extremely slow (~0.04 blk/s) and they cannot even keep up with the chain.
My TOML configuration file for both nodes is as following
Each node is running in a Kubernetes pod with SSD, 16 CPU, and 64GB of memory. What I notice is despite that
scale_verifiers
is enabled, both nodes only consume 1 CPU all the time as shown below.The logs are much like
Can someone please point out what could have gone wrong with my settings?
The text was updated successfully, but these errors were encountered: