Syncing an Archive node, memory leak #5804

rvalle · 2022-07-21T14:10:29Z

Hi,

I am setting up a new Polkadot Archive node to feed https://polkawatch.app
We have been running safely with 8GB of ram for almost a year. Our production environment was rolled out around version 0.9.16.

With the same configuration and v0.9.26 memory seems to grow exponentially during sync and does not make it to 1M blocks before crashing out of memory.

The captured memory usage curve and I wondered if it may be due to a memory leak that affects syncing only.

In green the resident memory of a new node being added to the cluster.
In yellow is another node, already in sync.

Our typical running flags are: --out-peers 2 --in-peers 2 --max-parallel-downloads 1 with pruning archive and polkadot chain.
We have also tried any of the following to see if there was any improve in memory usage:

--sync fast
--db-cache 128
--max-runtime-instances 4

What could be going on? Is there new memory requirements? since which version?

The text was updated successfully, but these errors were encountered:

ggwpez · 2022-07-21T14:37:19Z

Could you please still post the exact flags?
Are you using Rocks or Parity db? Does this happen only with 0.9.26 or already earlier?

rvalle · 2022-07-21T17:00:56Z

@ggwpez I went down version by version to .23 and behavior the same.

Now I went back to 0.9.16 and this happens:

The new .16 version is the green line, behaves like a champ! could even run on 4GB memory.
I went out a few hours and it is already 9.4M blocks done.

this is my command line:

            "Cmd": [
                "--name",
                "privazio-substrate-node",
                "--chain",
                "polkadot",
                "--pruning",
                "archive",
                "--rpc-cors",
                "all",
                "--rpc-external",
                "--ws-external",
                "--prometheus-external",
                "--out-peers",
                "2",
                "--in-peers",
                "2",
                "--max-parallel-downloads",
                "1"
            ],

There is also a persistent log message in recent versions that I don't see in the .16 that perhaps could be involved:

Perhaps it has something to do...

rvalle · 2022-07-21T17:03:14Z

@ggwpez we use the standard RockDB

koute · 2022-07-23T05:26:44Z

@rvalle Unfortunately I can't reproduce your issue. I synchronized over 5M blocks (while you said you can't even hit 1M) and the memory usage never hit more than 2GB.

Here's what I did:

Download v.0.9.26 binary: wget "https://github.com/paritytech/polkadot/releases/download/v0.9.26/polkadot"
Run the node with the following: ./polkadot --base-path ~/polkadot-archive-test --name polka-test-node --chain polkadot --pruning archive --rpc-cors all --rpc-external --ws-external --prometheus-external --out-peers 2 --in-peers 2 --max-parallel-downloads 1

So I'm afraid we need more details and/or better step-by-step from you as to how to reproduce it. (A Dockerfile which reproduces the issue would be ideal.)

rvalle · 2022-07-23T07:35:28Z

@koute, now I am getting conflicting results too.

In my case the memory overflowed also with the version v.0.9.26

I believe the issue may be caused by --pruning 2500000 and using --sync fast may have fixing the issue, however #5807 happened instead.

I still don't know what the root cause of the overflow is, but looks like it is not the polkadot node version. Sorry for that.

What I am trying to achieve is a node that can safely keep the blocks in history_depth + some margin.

rvalle · 2022-07-25T14:06:36Z

@koute looks like we got the reply of what is happening on #5807
It seems not intuitive that full-archive is feasible but 2.5M blocks is not.

bkchr · 2022-07-26T15:59:31Z

So, we can close this @rvalle?

rvalle · 2022-07-26T16:25:46Z

@bkchr yes, sure, I thought I already had.
I am following up with this: paritytech/substrate#11911
For reference.

koute self-assigned this Jul 22, 2022

rvalle changed the title ~~Synching an Archive, memory usage~~ Syncing an Archive node, memory leak Jul 22, 2022

rvalle mentioned this issue Jul 25, 2022

Pruning parameter not honored #5807

Closed

rvalle closed this as completed Jul 26, 2022

rvalle mentioned this issue Dec 2, 2022

Large pruning parameter still not honored despite substrate fix. #6378

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Syncing an Archive node, memory leak #5804

Syncing an Archive node, memory leak #5804

rvalle commented Jul 21, 2022

ggwpez commented Jul 21, 2022

rvalle commented Jul 21, 2022

rvalle commented Jul 21, 2022

koute commented Jul 23, 2022

rvalle commented Jul 23, 2022

rvalle commented Jul 25, 2022

bkchr commented Jul 26, 2022

rvalle commented Jul 26, 2022

Syncing an Archive node, memory leak #5804

Syncing an Archive node, memory leak #5804

Comments

rvalle commented Jul 21, 2022

ggwpez commented Jul 21, 2022

rvalle commented Jul 21, 2022

rvalle commented Jul 21, 2022

koute commented Jul 23, 2022

rvalle commented Jul 23, 2022

rvalle commented Jul 25, 2022

bkchr commented Jul 26, 2022

rvalle commented Jul 26, 2022