Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust dropped metrics from cAdvisor #1396

Merged

Conversation

philipgough
Copy link
Contributor

@philipgough philipgough commented Sep 24, 2021

This change drops "pod-centric" metrics without the 'container' label.

Previously we dropped Pod centric metrics without a (pod, namespace) label set
however these can be critical for debugging.

Description

Describe the big picture of your changes here to communicate to the maintainers why we should accept this pull request.
If it fixes a bug or resolves a feature request, be sure to link to that issue.

Type of change

What type of changes does your code introduce to the kube-prometheus? Put an x in the box that apply.

  • CHANGE (fix or feature that would cause existing functionality to not work as expected)
  • FEATURE (non-breaking change which adds functionality)
  • BUGFIX (non-breaking change which fixes an issue)
  • ENHANCEMENT (non-breaking change which improves existing functionality)
  • NONE (if none of the other choices apply. Example, tooling, build system, CI, docs, etc.)

Changelog entry

Please put a one-line changelog entry below. Later this will be copied to the changelog file.

Drops cAdviosor metrics `container_fs_.*` and `container_blkio_device_usage_total ` with non empty `container` label

@simonpasquier
Copy link
Contributor

lgtm

This change drops pod-centric metrics without a non-empty 'container' label.

Previously we dropped pod-centric metrics without a (pod, namespace) label set
however these can be critical for debugging.
@philipgough
Copy link
Contributor Author

Tested in minikube:

Before:
Screenshot 2021-09-24 at 17 31 34

After:
Screenshot 2021-09-24 at 17 34 48

@dgrisonnet dgrisonnet merged commit 211e758 into prometheus-operator:main Sep 24, 2021
@smarterclayton
Copy link
Contributor

lgtm, thanks (in general io behaves like cpu / mem, whereas the others are truly container specific)

@mindw
Copy link
Contributor

mindw commented Sep 26, 2021

@philipgough, aren't those discards metrics useful to debug statefulesets? or pods using host path volumes? are these metrics available elsewhere?
In case they aren't available elsewhere, wont it be enough to avoid dropping `container_fs_.*`` metrics all together?
Tx!

@philipgough
Copy link
Contributor Author

philipgough commented Sep 26, 2021

@mindw my understanding was that the dropped container_fs_.* metrics would still be available at slice level. However I am open to correction, and if we are losing valuable metrics we can do as you suggested and avoid dropping container_fs_.* entirely.

cc @simonpasquier, @smarterclayton

@mindw
Copy link
Contributor

mindw commented Sep 26, 2021

@philipgough here is an example:

image

@dgrisonnet
Copy link
Contributor

Hi @mindw thank you for raising your concerns. To give you more context about what we are trying to achieve here, we had previously discarded some of these metrics as part of an effort to reduce the number of series exposed by the kubelet. So we had identified some container-centric metrics for which pod-level information wasn't useful.
However, in the case of container_fs_.* metrics, they make more sense at the pod level since you'll be able to rely on the slice summarization from the cgroups and it should give you as much information as the previous container-centric approach whilst reducing the amount of series exposed by kubelet.
Let me know if it makes sense to you or if you still have some concerns with the new approach.

@mindw
Copy link
Contributor

mindw commented Sep 27, 2021

@dgrisonnet thank you for the detailed explanation! My apologies, but I'm not sure how the container level metrics make less sense than pod level - especially with multiple side cars. Also, how does one get those metrics - see:
image

@philipgough
Copy link
Contributor Author

@mindw - When I deployed the change to an OpenShift cluster I query a pod in a stateful set and retain the info. So trying to figure out whats different than your setup. Can you provide info on the following:

  1. kubernetes version
  2. kubelet version
  3. the container runtime

Screenshot 2021-09-27 at 12 31 50

@mindw
Copy link
Contributor

mindw commented Sep 27, 2021

@philipgough thank you for looking into this :)

kube_node_info{
container_runtime_version="docker://19.3.15", 
job               ="kube-state-metrics", 
kernel_version    ="5.10.55-flatcar", 
kubelet_version   ="v1.19.14", 
kubeproxy_version ="v1.19.14", 
node              ="ip-10-0-0-47.eu-west-1.compute.internal", 
os_image          ="Flatcar Container Linux by Kinvolk 2905.2.1 (Oklo)", 
pod_cidr          ="10.1.13.0/24", 
provider_id       ="aws:///eu-west-1a/i-06ed5017ff6f14e1f"
}

@smarterclayton
Copy link
Contributor

Container level metrics don't completely represent the metrics in a pod - a slice is the sum of all containers plus any work directly handled in the pod slice, so you can't get the total resources accounted to a pod by summing containers.

All that being said, the primary cadvisor metrics are cpu, memory, disk io, and network io (although network is unreliable in most network implementations). Did we drop container level CPU and memory metrics (I had assumed that was part of the change, but looking at it now it does not look like it)? If we did not, removing container level io read/write operations/bytes might indeed impact regular users (that's container_fs_(reads|writes)(_bytes)_total).

Regarding use of container_name vs container, that looks like a legacydocker/CRI inconsistency. I'm kind of surprised if that is so, but it may be a known bug (would need to check with cadvisor in forum-node) that they did not change because it would break users. In general, cadvisor should be returning the same results for different containers because all implementations of the container runtime should be using cgroups in very similar ways.

@smarterclayton
Copy link
Contributor

Regarding the pod level metrics being zero, I believe that's a problem in either Docker or the distro configuration with docker. Pod cgroups are supposed to sum container cgroups, and it's possible this is a docker specific bug. You'd need to file a separate bug (probably with the distro) if this is an official configuration, or kubernetes if not. Usually bugs like this are due to cgroup inconsistencies between different linux versions, or a missing configuration delivered to cgroups.

For the short run, including container_fs_* for container level metrics is one option for working around this, but what you are describing is incorrect behavior of your linux distro + container runtime + kubelet.

@philipgough
Copy link
Contributor Author

Based on @smarterclayton evaluation, I did some investigation locally (minikube, MacOS), reverting the drop rule to compare the output. I'm not sure if the fact that this is tested on OS X will have any bearing.

Here is what we get on OpenShift, and it gives us what we expected to be available after this change was made:
ocp


Checking locally, with docker runtime on minikube, my behaviour matches that reported by @mindw

mk-docker


Changing the driver to cri-o locally produces the following:
mk-crio

Here we notice that we get no 0 value at all, the series without the container label disappears entirely.


Finally with containerd we see the same behaviour as the docker driver

Screenshot 2021-09-27 at 15 48 45

@philipgough
Copy link
Contributor Author

@simonpasquier , @dgrisonnet based on the explanation by @smarterclayton I believe this is probably a bug "somewhere" but given that the behaviour is nondeterministic at this point, I think we should revert the drop of container_fs_.* until we figure that out. Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants