-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit policy maximum age didn't cleanup resulting storage fill [v2.10.18] #5795
Comments
When something like that happens, we request the developer capture some profiles for us, specifically cpu, mem (heap), and stacksz / goroutines. |
@derekcollison here are screenshot of some metrics. I went through many memory metrics, and all of the looks quite stable |
The stream info shows the only limit you have in place, which is age, appearing to work correctly. What do you think is not working correctly? Also do you properly set GOMEMLIMIT? |
@derekcollison we do not have GOMEMLIMIT set. At the same time, issue is not with memory of the pod, issue with disk storage. We have a replication on 3 nodes for this stream, that means that message should be copied to 3 nodes, and at any time the same amount of space should be occupied on each node (assuming all other stream also having replicas factor 3). However, one of the nodes didn't follow tis rule, as can be seen from the initial message, resulting in disk leackage. |
Can you share a |
@derekcollison |
Observed behavior
We are using Limit policy with maximum age of 15 minutes. However, 1 of 3 nodes didn't cleanup storage in time, resulting in storage filled and crash.
On the screen below, you can see the storage usage stats of 3 nodes. Notice that blue one has much larger storage usage compared to to red and yellow nodes.
The screenshot below is from NATS dashboard, you can see that stream message count also rose significantly
Configuration of the stream provided on the screen below. Stream was recreated during attempt to fix the issue, but it has exactly the same settings. Notice Max age here of 15 minutes, as well as typical bytes size and message count.
In logs, there were errors (repeated several time):
Please let me know if you need any additional details
Expected behavior
Limit policy cleaning as expected
Server and client version
Server 2.10.18
Host environment
K8s
Steps to reproduce
not clear
The text was updated successfully, but these errors were encountered: