Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Syncing an Archive node, memory leak #5804

Closed
rvalle opened this issue Jul 21, 2022 · 8 comments
Closed

Syncing an Archive node, memory leak #5804

rvalle opened this issue Jul 21, 2022 · 8 comments
Assignees

Comments

@rvalle
Copy link

rvalle commented Jul 21, 2022

Hi,

I am setting up a new Polkadot Archive node to feed https://polkawatch.app
We have been running safely with 8GB of ram for almost a year. Our production environment was rolled out around version 0.9.16.

With the same configuration and v0.9.26 memory seems to grow exponentially during sync and does not make it to 1M blocks before crashing out of memory.

The captured memory usage curve and I wondered if it may be due to a memory leak that affects syncing only.

Polkadot Node Sync

In green the resident memory of a new node being added to the cluster.
In yellow is another node, already in sync.

Our typical running flags are: --out-peers 2 --in-peers 2 --max-parallel-downloads 1 with pruning archive and polkadot chain.
We have also tried any of the following to see if there was any improve in memory usage:

--sync fast
--db-cache 128
--max-runtime-instances 4 

What could be going on? Is there new memory requirements? since which version?

@ggwpez
Copy link
Member

ggwpez commented Jul 21, 2022

Could you please still post the exact flags?
Are you using Rocks or Parity db? Does this happen only with 0.9.26 or already earlier?

@rvalle
Copy link
Author

rvalle commented Jul 21, 2022

@ggwpez I went down version by version to .23 and behavior the same.

Now I went back to 0.9.16 and this happens:

Polkadot Node Sync 0_9_16

The new .16 version is the green line, behaves like a champ! could even run on 4GB memory.
I went out a few hours and it is already 9.4M blocks done.

this is my command line:

            "Cmd": [
                "--name",
                "privazio-substrate-node",
                "--chain",
                "polkadot",
                "--pruning",
                "archive",
                "--rpc-cors",
                "all",
                "--rpc-external",
                "--ws-external",
                "--prometheus-external",
                "--out-peers",
                "2",
                "--in-peers",
                "2",
                "--max-parallel-downloads",
                "1"
            ],

There is also a persistent log message in recent versions that I don't see in the .16 that perhaps could be involved:

Polkadot Node Sync failed message

Perhaps it has something to do...

@rvalle
Copy link
Author

rvalle commented Jul 21, 2022

@ggwpez we use the standard RockDB

@koute koute self-assigned this Jul 22, 2022
@rvalle rvalle changed the title Synching an Archive, memory usage Syncing an Archive node, memory leak Jul 22, 2022
@koute
Copy link
Contributor

koute commented Jul 23, 2022

@rvalle Unfortunately I can't reproduce your issue. I synchronized over 5M blocks (while you said you can't even hit 1M) and the memory usage never hit more than 2GB.

Here's what I did:

  1. Download v.0.9.26 binary: wget "https://github.com/paritytech/polkadot/releases/download/v0.9.26/polkadot"
  2. Run the node with the following: ./polkadot --base-path ~/polkadot-archive-test --name polka-test-node --chain polkadot --pruning archive --rpc-cors all --rpc-external --ws-external --prometheus-external --out-peers 2 --in-peers 2 --max-parallel-downloads 1

So I'm afraid we need more details and/or better step-by-step from you as to how to reproduce it. (A Dockerfile which reproduces the issue would be ideal.)

@rvalle
Copy link
Author

rvalle commented Jul 23, 2022

@koute, now I am getting conflicting results too.

In my case the memory overflowed also with the version v.0.9.26

I believe the issue may be caused by --pruning 2500000 and using --sync fast may have fixing the issue, however #5807 happened instead.

I still don't know what the root cause of the overflow is, but looks like it is not the polkadot node version. Sorry for that.

What I am trying to achieve is a node that can safely keep the blocks in history_depth + some margin.

@rvalle
Copy link
Author

rvalle commented Jul 25, 2022

@koute looks like we got the reply of what is happening on #5807
It seems not intuitive that full-archive is feasible but 2.5M blocks is not.

@bkchr
Copy link
Member

bkchr commented Jul 26, 2022

So, we can close this @rvalle?

@rvalle
Copy link
Author

rvalle commented Jul 26, 2022

@bkchr yes, sure, I thought I already had.
I am following up with this: paritytech/substrate#11911
For reference.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants