Skip to content
This repository has been archived by the owner on Nov 2, 2021. It is now read-only.

does pod-devices-exporter work in kubernetes version=1.10 ? #20

Closed
tingzhang-ming opened this issue Mar 5, 2019 · 7 comments
Closed

Comments

@tingzhang-ming
Copy link

I deployed node-device-exporter-daemonset.yaml, but it did not find anything about pod info with gpu. I found that i did not set KubeletPodResources, but i did not find this path /etc/default/kubelet in the deployed node, should i create one and then add the KUBELET_EXTRA_ARGS=--feature-gates=KubeletPodResources=true? Or is it releated with the kubernetes version? my kubernetes version is v1.10.
thank you!

@xiaozhouX
Copy link

It seems that the node-device-exporter need a socket file named pod-resources to know the relationship of pod and device, which is added in version 1.13
https://github.com/kubernetes/kubernetes/blob/release-1.13/pkg/kubelet/config/defaults.go#L28

@tingzhang-ming
Copy link
Author

It seems that the node-device-exporter need a socket file named pod-resources to know the relationship of pod and device, which is added in version 1.13
https://github.com/kubernetes/kubernetes/blob/release-1.13/pkg/kubelet/config/defaults.go#L28

thanks, i got it. but i still want to know does this code or node-device-exporter-daemonset.yaml work in kubernetes = v1.10 ? Have you tried this successfully?

Thank you!

@xiaozhouX
Copy link

It seems that the node-device-exporter need a socket file named pod-resources to know the relationship of pod and device, which is added in version 1.13
https://github.com/kubernetes/kubernetes/blob/release-1.13/pkg/kubelet/config/defaults.go#L28

thanks, i got it. but i still want to know does this code or node-device-exporter-daemonset.yaml work in kubernetes = v1.10 ? Have you tried this successfully?

Thank you!

It can't work, because earlier version before 1.13, /var/lib/kubelet/pod-resources/kubelet.sock is not exist. The exporter will crash. At least in my cluster which is 1.11 it can't work.

@tingzhang-ming
Copy link
Author

tingzhang-ming commented Mar 5, 2019

It seems that the node-device-exporter need a socket file named pod-resources to know the relationship of pod and device, which is added in version 1.13
https://github.com/kubernetes/kubernetes/blob/release-1.13/pkg/kubelet/config/defaults.go#L28

thanks, i got it. but i still want to know does this code or node-device-exporter-daemonset.yaml work in kubernetes = v1.10 ? Have you tried this successfully?
Thank you!

It can't work, because earlier version before 1.13, /var/lib/kubelet/pod-resources/kubelet.sock is not exist. The exporter will crash. At least in my cluster which is 1.11 it can't work.

thanks, do you know any source code which works well about the gpu exporter releated to pods? because i want to do some custom extension development. and i know https://www.jianshu.com/p/1c7ddf18e8b2 works well, but i can not get the source code of this image : registry.cn-hangzhou.aliyuncs.com/acs/gpu-prometheus-exporter:0.1-f48bc3c Since i want to add some custom metric.

@xiaozhouX
Copy link

It seems that the node-device-exporter need a socket file named pod-resources to know the relationship of pod and device, which is added in version 1.13
https://github.com/kubernetes/kubernetes/blob/release-1.13/pkg/kubelet/config/defaults.go#L28

thanks, i got it. but i still want to know does this code or node-device-exporter-daemonset.yaml work in kubernetes = v1.10 ? Have you tried this successfully?
Thank you!

It can't work, because earlier version before 1.13, /var/lib/kubelet/pod-resources/kubelet.sock is not exist. The exporter will crash. At least in my cluster which is 1.11 it can't work.

thanks, do you know any source code which works well about the gpu exporter releated to pods? because i want to do some custom extension development. and i know https://www.jianshu.com/p/1c7ddf18e8b2 works well, but i can not get the source code of this image : registry.cn-hangzhou.aliyuncs.com/acs/gpu-prometheus-exporter:0.1-f48bc3c Since i want to add some custom metric.

We can have a discussion of the image. Do you have slack or just email me ?

@tingzhang-ming
Copy link
Author

It seems that the node-device-exporter need a socket file named pod-resources to know the relationship of pod and device, which is added in version 1.13
https://github.com/kubernetes/kubernetes/blob/release-1.13/pkg/kubelet/config/defaults.go#L28

thanks, i got it. but i still want to know does this code or node-device-exporter-daemonset.yaml work in kubernetes = v1.10 ? Have you tried this successfully?
Thank you!

It can't work, because earlier version before 1.13, /var/lib/kubelet/pod-resources/kubelet.sock is not exist. The exporter will crash. At least in my cluster which is 1.11 it can't work.

thanks, do you know any source code which works well about the gpu exporter releated to pods? because i want to do some custom extension development. and i know https://www.jianshu.com/p/1c7ddf18e8b2 works well, but i can not get the source code of this image : registry.cn-hangzhou.aliyuncs.com/acs/gpu-prometheus-exporter:0.1-f48bc3c Since i want to add some custom metric.

We can have a discussion of the image. Do you have slack or just email me ?

I have sent a email (from your github homepage) to you.

@guptaNswati
Copy link
Contributor

The right instructions are here to get per pod GPU metrics which is only available k8s 1.13 onwards. K8s have added a kubelet gRPC server at /var/lib/kubelet/pod-resources that allows monitoring agents to know the devices assigned to a pod.

https://github.com/NVIDIA/gpu-monitoring-tools/tree/master/exporters/prometheus-dcgm/pod-devices-exporter#pod-device-metrics

Some links on k8s device monitoring feature
Description: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/kubelet-resource-metrics-endpoint.md
Proposal: kubernetes/community#2454
PR: kubernetes/kubernetes#70508

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants