-
Notifications
You must be signed in to change notification settings - Fork 670
Memory leak/OOM with "Received update for IP range I own" messages in log #3659
Comments
Here are the log files from the other two weave pods producing the "Received update for IP range I own" log messages - only 3 of the 71 pods are producing this message (the other is in the original comment) |
@sferrett thanks for reporting the issue and sharing the logs. pprof heap output file seems to be corrupted. I can not usethem. Looking at the logs, there is continuous activity connections getting established to the peers and then subsequently shutting down the established connected with the peers due to conflict in the IPAM entry. There might be resource leak during the connection shuttingdown. Please share the heap output i will investigate further.
|
This happened to us last week on our production cluster ... 👇 We noticed that there were two groups of IPAM lists with overlapping entries so it wasn't a complete split-brain. We bit the bullet and terminated all the nodes in one group to recover. It was brutal. After looking at the memory profile, 200MB as a limit makes sense for |
@murali-reddy - Here's how I'm generating the heap profile: curl 'http://localhost:6784/debug/pprof/heap' > weave-net-6bgmx.dump Then I gzip the file - here's another attached, it seems different in size and the method I used to copy it from the system that generated it was slightly different, so please let me know if this is a good one or not? It's from one of the more-rapidly growing instances that got a restart over the weekend |
@itskingori - interesting. I ran the following on my prd cluster, which is having this symptom, and I also see there are two distinct set of "status ipam" hosts returned from the cluster, each comprising about 50% of the nodes:
Here's the "02398" list The lists are identical except that the "57603" list has an additional entry in it:
I wonder if I can do something with that one host that's in one list but not the other? Or if I do indeed need to restart all the systems in one of the lists.... When you say you terminated the hosts in one of the groups, did you do anything to the remaining systems or just terminate the nodes in one of the lists? Also did the nodes that were terminated have anything special done on them prior (such as running weave reset or similar?) . I'm also curious if you did them one at a time, and somehow observe that they came up with a consistent ipam list? Basically I think I need to do the same thing here, just want to make sure that deleting those nodes won't just increase the amount of breakage as it seems like node removal is what precipitated all of this in the first place. Thanks! |
OK so I have this condition happening in two different clusters, so I guess that's good news I can try things in one and compare? Anyhow - in one of the clusters I did the following:
So - this sounds like somewhat of a success However - I am not sure what the ramifications are of deleting the weave db file on those nodes and just restarting weave and not the entire node. I am somewhat concerned about any ill-effect on the pods still running on these nodes where weave was restarted and if there's any potential issue that has been set up due to not having them restart (ie: full node restart). Pending negative feedback from the above, I will be looking to do the same action in the other cluster soon |
Yes, this is more or less the position we were in. I'll borrow your commands are they seem similar to (if not better than) what I have to look through weave 👇 #!/bin/bash
set -u
context=$1
command=$2
pods=$(kubectl get pods -n=kube-system -lname=weave-net --context=${context} | tail -n+2 | awk '{print $1}')
for pod in ${pods[@]}
do
kubectl exec -n=kube-system --context=${environment} ${pod} -- /home/weave/weave --local ${command}
done I use something like
Weave works by sharing state by consensus, the fact that there are two states breaks weave and you need to bite the bullet and get rid of a group or any weave pods that have inconsistent state. There are two ways to do this:
I ran with no. 2 because this was production so recovery was critical. I didn't have time to write a script for no. 2. I pretty much figured out which group to terminate and terminated them all at the same time. I'm guessing, it's better to terminate weave while deleting the database on the host so that the new pod starts from a clean slate and gets its state from other correct weave pods.
I didn't do anything to the remaining group as their state was similar and correct. Once you get rid of the bad weave pods, everything goes back to normal and the state is now consistent among those remaining ... the problem goes away, and the cluster heals.
I didn't do anything to the ones remaining. The fact that they had similar state was all I needed.
I terminated them all at the same time because I could not risk them sharing state any longer. I figured doing it one by one might not work because new pod might get it state from another bad one. I wanted all the 'bad' ones gone and only the 'good' ones left to share state.
It didn't cause more breakage for me. Other than the time when I had to wait for the autoscaling group to replace the nodes that I just terminated. And the disruption it caused to the apps on the nodes I just terminated as new pods come up. |
@sferrett I am able to use the recent heap output. It does look like continous establishing and shutting down the connections is causing the memeroy use to grow up between the go GC runs or there is a memory leak. Will investigate further.
|
No ramifications from my understanding. Anyone from weave can correct me if I'm wrong but when you delete the database and terminate weave, it just recreates it. In my opinion, this is the right approach and I took the other approach out of the need for a quick win. If you have a script that you can share for terminating weave pods and clearing the databases please share. 😅 |
Yes, there are no ramifications. For now these manual steps will reconcile from state where there were IPAM conflicts. In 2.6 release IPAM conflicts are automatically resolved (#3637) |
@sferrett Now that you're on 2.5.2 and are able to get the cluster stable, could you continue monitoring the IPAM lists to see if they remain consistent (assuming you're scaling up and down)? I really struggled remaining stable on that version for a couple of days (it would eventually get borked after scaling up and down) and so reverted to 2.5.1 as an experiment and I'm since stable. I could be drawing the wrong conclusions or that there's something unique about my setup (highly unlikely since I use kops) 🤷♂ ... would be good to have another data point. |
@murali-reddy here's another dump from one of the processes that is just about to surpass it's 1Gb heap limit, in case that's also useful. |
@itskingori - I will keep an eye on it for sure. We're not doing a huge amount of scaling at the moment, however we did just do some refactoring of instance types hence we had a fairly large amount of hosts created and deleted which is what set the current condition into motion. There is some more similar adjustments still outstanding so I will check and let you know how things look as that happens. Also, thanks very much for your feedback and detail - sounds like your process to recover normal operation was very similar to mine. The only difference seems to be that you rebooted/terminated the nodes whereas I just recycled the weave pod on the node and left the other pods and the node itself alone. We'll be doing this same process in our prd cluster today so if I cook up a noteworthy script I will share it here. Cheers - |
OK so since the weave nodes became consistent, things have been stable with no memory growth or OOM issues. I have done a minor amount of scaling (perhaps up/down of 10 nodes or so) and things have remained consistent throughout that with 2.5.2. I wrote a quick script Both of those scripts were written quickly to address a particular condition here and so are not intended as portable or good examples of coding, but may be useful to someone so here they are. @itskingori , @murali-reddy - thank you both for your help on this. |
@sferrett Thanks for the update! Will try out 2.5.2 again and check out your scripts |
Note we found a leak which matches the OOM symptom here: #3807 |
What you expected to happen?
Memory usage of the weave process is expected to be stable and not grow unbounded over time.
What happened?
I had a stable 2.5.0 weave network in my Kubernetes 1.9 cluster of about 100 nodes. The weave was initially installed by kops and had a memory limit of 200mb set. There were no occurrences of "Received update for IP range I own" in the log files and memory usage for weave pods in the cluster had been very stable over time for weeks.
As part of refactoring some services, about 30 nodes were removed from the cluster (bringing the cluster size down to 71 nodes). After this action, the memory usage of the weave pods started growing until it exceeded the memory limit, at which time the pod was OOM killed and restarted. These restarts result in brief disruption for the node on which the restart occurs. At this time the "Received update for IP range I own" message started appearing in the logs (although not from all pods, this nuance was not discovered until later).
After looking at some related tickets and such here (#3650, #3600, #2797), the following actions were taken:
Weave pods continue to grow in memory usage, the new 2.5.2 pods have not hit their 1g limit yet but look to be heading that way. The "update for IP range I own" messages are still being seen - however on closer inspection these messages are only coming from 3 of the 71 pods.
How to reproduce it?
Have a working kubernetes cluster and delete some nodes out of it.
Anything else we need to know?
Versions:
Logs:
This is the logs from one of the weave pods that is showing the "Received update for IP range I own" messages: weave-net-q56hl.log
This is the pprof/heap output for the above node
weave-net-q56hl.heap.gz
This is status ipam from the above node
weave-net-q56hl.ipam.txt
This is status peers from the above node
weave-net-q56hl.peers.txt
This is the logs from one of the weave pods not showing that message:
weave-net-9t7d8.log
This is the pprof/heap output for the above node weave-net-9t7d8.heap.gz
And here's a picture showing the history of memory usage form these pods
The text was updated successfully, but these errors were encountered: