-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM killed when using JetStream WorkQueue [v2.10.18] #5739
Comments
Perhaps related to #5673 |
How much memory are you giving each NATS node? |
|
I asked specifically because sometimes the reporting doesn't always match cgroup restrictions, do you OOM as quickly if you set the If you can capture some memory profiles when the memory usage is up but before OOM then that would be useful, either |
The What are you giving the process from your OS perspective? |
I'm running the the cluster on K8s, with each JS server as a pod giving a memory of |
Huge payload messages would consume more ram, ime I wouldn't run nats with below 3GB memory when using Jetstream. Your big messages will make matters worse. |
But on the flipside how would we estimate how much memory resources to allocate in such a scenario? Is there some guideline. |
It varies a lot by use case, message rates etc and changes from version to version - usually the needs go down. A workload that today uses about 1GB memory for me used to use 6GB some time ago. You should set up monitoring and use |
@ripienaar I agree to trying to simulate the scenarios, but for a high load this memory was spiking up a lot. For high work loads giving unbounded memory isn't possible, I want to understand where and why would the server be using the excess memory, and thus how to decide the upper bounds in that case. |
|
A good practice when limiting memory via cgroups and containers is to set the env GOMEMLIMIT to ~75% of actual limit. Sometimes the Go runtime only sees the host value of memory and does not feel pressure to GC and clean things up but the container and linux kernel will OOM it. |
That could help in terms of the golang GC, but in terms of the nats what would be a good limit upper limit in the first place? How should we calculate that. |
For any program written in Go that is to be run in a container its best practice these days to set GOMEMLIMIT. We do plan on introducing a high bound on how much buffer memory for all filestore backed streams in a server. This will come probably in 2.12. |
I agree on the GOMEMLIMIT, will definitely go ahead on changing that. |
Memory usage is quite dynamic based on connections, number of subscriptions for core. And for JetStream, number of streams and consumers, access patterns etc.. Best way is to model the upper bounds of what you expect on a larger system and monitor RSS peaks.. |
On a recent exploration with
In this case I have two streams running with size of 6.2Gib running in filestore mode, but the memory usage on the servers were considerably high. More than having both the streams completely in memory even though we have filestore.
Here are the memory profiles for the same |
This looks like a build-up of Raft append entries. How are you publishing messages? Are you using core NATS publishes or JetStream publishes? |
@neilalexander
https://github.com/nats-io/nats.go/blob/main/js.go#L45 |
I see a buildup during the writeMsgRecord as well |
Observed behavior
Running 2 streams on a jetstream cluster with 3 nodes
The server config are as follows
Stream configs are as follows ->
I am running a test where publish messages with payload of size 5mb to the streams, but I saw that the servers got terminated due to OOM killed
Expected behavior
I am running a test with payload size of 5Mb with 50msgs/sec
There are 2 streams with replication factor of 3
I see this OOM killed error in 2 servers, I want to ask why would I be seeing this error when I'm using filestore as storage mode. Will the streams be pulled in-memory at runtime?
Is this high memory requirement expected? If yes, then what will be a good way to detemine what should be the max resource allocation to the Jetstream servers.
Server and client version
Version: 2.10.18
Host environment
No response
Steps to reproduce
No response
The text was updated successfully, but these errors were encountered: