-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No logs available #112
Comments
Depending on your hosts, you could use something like |
you could also use csysdig but I agree there should be logs |
kube-state-metrics OOMing is likely due to the size of your cluster. It builds an in memory cache of all objects in kubernetes, so the larger your kubernetes instance, the larger your kube-state-metrics memory limit should be. Unfortunately we have not benchmarked this extensively to come up with a formula of how much memory for how many objects, so I recommend running it on a large node without a limit observe the memory usage and set the limit with a reasonable margin. I'm not sure I understand what kind of logs you are expecting when an application OOMs and is killed by the supervisor. Which version are you using? Because there should be at least some logs. |
Hi @brancz, We're running |
There were actually some changes in regard to logging in the latest release, the glog library was not properly configured. Can you upgrade to latest |
This mainly refers to |
@andyxning yes, the informers/informer-framework more specifically. |
Does that solve your issue @Resisty ? |
Same issue here, v0.4.1 My deployment file comes from prometheus-operator
How I fixed ?Changing memory limits from 50mi to 100mi |
@brancz I also had to increase the memory to 80 MiB on a 6 node OpenShift cluster. Perhaps the memory settings in the deployment should be bumped? kube-state-metrics/kubernetes/kube-state-metrics-deployment.yaml Lines 56 to 57 in 469f73f
kube-state-metrics/kubernetes/kube-state-metrics-deployment.yaml Lines 38 to 41 in 469f73f
does not match README.md
Also, having the default settings tight will be an issue when more objects types are added without updating the requirements. |
I'm ok with bumping the request and limit. What do you think would be an appropriate start value then? |
The ones we recommend in the README? |
Yes I don't recall why we didn't do that in the first place. |
We came up with those independent of #200 and didn't backport them. I'm working on it, it's an easy change. |
We're running on a pretty small cluster (50 pods) and kube-state-metrics is OOMing if I ask it to collect Sort of wonder how in blazes I would even begin to try to trace this ... |
Hrm ... quick follow up: I did dump the goroutine list when the process was at low and high mem. The number of goroutines appears to growing. But we've also been tracking anomalous latency in the API server (trying to get GOOG too look at that since we don't run it (GKE).) Maybe overlapping requests because of delays? |
We don't today, but it's probably time to add pprof endpoints so we can do proper profiling to see what's happening. |
@smparkes you mean that
This problem possibly be related with client-go cause kube-state-metrics store nothing in memory.
Dump goroutine list with |
Agreed with @brancz , we need to add |
We resolved the issue. |
@brancz Does Prometheus scrape synchronously, i.e., it will wait the previous one to finish before start the next scrape? @caesarxuchao Does client-go sync with apiserver synchronously, i.e., it will wait the previous one to finish before start the next scrape? |
…openshift-4.17-kube-state-metrics OCPBUGS-34202: Updating kube-state-metrics-container image to be consistent with ART for 4.17
I have kube-state-metrics running as a deployment via ansible on my clusters:
I've had to bump kube_state_mem_(req|lim) to 800Mi in order to get the pods to stay up; the pods have started OOMKilling/CrashLoopBackoff'ing.
I'd like to know why, but the containers are basically inscrutable. There's no way to shell in and
docker logs
is empty.It'd be great if there was more information on what's going on, please and thanks!
The text was updated successfully, but these errors were encountered: