Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lotus Node consumes all available memory and won't release it #4714

Closed
peterVG opened this issue Nov 3, 2020 · 5 comments
Closed

Lotus Node consumes all available memory and won't release it #4714

peterVG opened this issue Nov 3, 2020 · 5 comments
Assignees
Labels

Comments

@peterVG
Copy link

peterVG commented Nov 3, 2020

Describe the bug
Lotus has been causing fatal Out Of Memory crashes for many Powergate users.

As an example: between 28 October and 31 October I uploaded 55 1GiB CIDs, 5 or 10 CIDs at a time. My Lotus Node runs as a Docker container as part of a Powergate deployment on AWS which has 8vCPU / 60GB RAM / 120GB Swap.

At Nov 2, 19:00 UTC, I had 192 final deals and 195 pending deals for these 55 CIDs. From October 28 onward, RAM and Swap usage spiked and never went down. RAM continued to max out at 59GB. Swap hovered at 60GB over 3 days and then actually went up to 80 GB on the fourth day. All without addding any new CIDs.

3-days-after-last-CID-upload

To see if a restart would help, I rebooted all Powergate containers, including Lotus Node. At 6 hours and 19 hours after the Docker container restarts memory was at half the usage it was previously and no Swap was getting used. So that had helped.

6hours-after-container-restart

19-hours-after-container-restart

However, I then uploaded 10 new 1 GiB files and three hours later I'm seeing maxed out memory again:

3-hours-after-new-uploads

From previous experience I know that uploading more files at this point will lead to a OOM crash of the Lotus node with a likelihood of permanent database corruption.

To Reproduce

  1. Deploy a Lotus Node with minimum requirements: https://docs.filecoin.io/get-started/lotus/installation/
  2. Upload 100 GIBs
  3. Watch memory rise and run out.

Expected behavior
Lotus releases memory as it finishes tasks and waits for new jobs.

Screenshots
See htop graphs above.

Version (run lotus version):
lotus version 1.1.2*git.d4cdc6d (shipped with Powergate as a Docker container)

Additional context
I have only been able to upload 65 GiBs of data to Filecoin in just under a week. A serious scalability problem IMHO.

@jennijuju
Copy link
Member

Have you tried #4619

@jennijuju jennijuju added the need/author-input Hint: Needs Author Input label Nov 3, 2020
@peterVG
Copy link
Author

peterVG commented Nov 3, 2020

@jennijuju no because it was only merged to master 11 hours ago and hasn't hit a release yet but this sounds very encouraging! thanks for the heads-up.

@jennijuju
Copy link
Member

jennijuju commented Nov 3, 2020

@jennijuju no because it was only merged to master 11 hours ago and hasn't hit a release yet but this sounds very encouraging! thanks for the heads-up.

ah ofc, if you get a chance to try it out and let us know how it goes that'd be great! Feel free to wait for it to be tagged too!

@jennijuju jennijuju self-assigned this Nov 6, 2020
@jennijuju
Copy link
Member

@peterVG some people has tested the pr and confirms that it was working, so I will close this ticket for now. It will be included in a coming release soon, if you still face issue then please reopen this ticket.

@peterVG
Copy link
Author

peterVG commented Dec 9, 2020

Better late than never? Forgot to come back to this issue and report that yes, this issue, is resolved on my server as well. Congratulations! This is a major production blocker that is now resolved. Thanks.

@TippyFlitsUK TippyFlitsUK removed the need/author-input Hint: Needs Author Input label Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants