-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collect metrics #505
Collect metrics #505
Conversation
@seanmalloy I merged |
/cc |
@@ -97,19 +98,21 @@ func (pe *PodEvictor) TotalEvicted() int { | |||
// EvictPod returns non-nil error only when evicting a pod on a node is not | |||
// possible (due to maxPodsToEvictPerNode constraint). Success is true when the pod | |||
// is evicted on the server side. | |||
func (pe *PodEvictor) EvictPod(ctx context.Context, pod *v1.Pod, node *v1.Node, reasons ...string) (bool, error) { | |||
func (pe *PodEvictor) EvictPod(ctx context.Context, pod *v1.Pod, node *v1.Node, strategy string, reasons ...string) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this new strategy
param used anywhere? It's kind of confusing to have to update all the strategies with lines like podEvictor.EvictPod(ctx, pod, node, "NodeAffinity", "NodeAffinity")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like it either. We might drop reasons
field since I don't see it used for anything but passing a strategy name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think reasons
is fine, but specifically the strategy
param you added here... I don't see it used anywhere, and it seems like you're just using reasons
anyway, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. My bad. metrics.PodsEvicted.With(map[string]string{"result": "maximum number reached", "strategy": reason, "namespace": pod.Namespace}).Inc()
is supposed to have "strategy": strategy
pair in it. Right now, reasons
can be basically anything meaningful. Yet, we set reasons
param only to individual strategy names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, I agree then we could either drop the ...reasons
param all together, or add this strategy param, and just use that in each strategy (since reasons is optional, the strategy can just pass its name which will be used for metrics).
I can see a possible use case where one strategy may have different reasons for eviction, so I'm not sure it's worth getting rid of entirely yet. But it doesn't make much sense to keep if we're not using it. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's worth getting rid of entirely yet
Yeah, not completely sure either. I'll keep it there for the moment. Better than drop it and re-introduce it later.
0551154
to
a112beb
Compare
|
After some time running over vanilla OpenShift 4.8 with the example policy config:
|
@@ -97,19 +98,21 @@ func (pe *PodEvictor) TotalEvicted() int { | |||
// EvictPod returns non-nil error only when evicting a pod on a node is not | |||
// possible (due to maxPodsToEvictPerNode constraint). Success is true when the pod | |||
// is evicted on the server side. | |||
func (pe *PodEvictor) EvictPod(ctx context.Context, pod *v1.Pod, node *v1.Node, reasons ...string) (bool, error) { | |||
func (pe *PodEvictor) EvictPod(ctx context.Context, pod *v1.Pod, node *v1.Node, strategy string, reasons ...string) (bool, error) { | |||
var reason string | |||
if len(reasons) > 0 { | |||
reason = " (" + strings.Join(reasons, ", ") + ")" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Continuing from what we're talking about above (https://github.com/kubernetes-sigs/descheduler/pull/505/files#r584629100), maybe we can change this line to:
reason = strategy + " (" + strings.Join(reasons, ", ") + ")"
then, all the calls to EvictPod
can be shortened to just 1 parameter (so, no longer passing in the optional reasons
list, but still keeping that available):
podEvictor.EvictPod(ctx, pod, nodeMap[nodeName], "RemoveDuplicatePods")
this way, we don't have to remove the reasons
parameter but we clean up those weird-looking function calls. We can then start some work to actually use reasons
for each strategy in a more informative way than how it is currently. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!!! Updated.
a112beb
to
bba5fd2
Compare
bba5fd2
to
74e2bed
Compare
/hold cancel |
Not sure if |
"sigs.k8s.io/descheduler/pkg/apis/componentconfig" | ||
"sigs.k8s.io/descheduler/pkg/apis/componentconfig/v1alpha1" | ||
deschedulerscheme "sigs.k8s.io/descheduler/pkg/descheduler/scheme" | ||
) | ||
|
||
const ( | ||
DefaultDeschedulerPort = 10258 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you mention in somewhere that port 10258
is used for metrics by default and users can use command line flag --secure-port
to custom port number? It's not easy to get this info from the help command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments into --disable-metrics
flag and the readme.
+1 for providing a way to disable metrics. |
Let's merge #513 first |
74e2bed
to
479620b
Compare
New metrics: - build_info: Build info about descheduler, including Go version, Descheduler version, Git SHA, Git branch - pods_evicted: Number of successfully evicted pods, by the result, by the strategy, by the namespace
479620b
to
572d100
Compare
@@ -521,6 +521,16 @@ Setting `--v=4` or greater on the Descheduler will log all reasons why any pod i | |||
Pods subject to a Pod Disruption Budget(PDB) are not evicted if descheduling violates its PDB. The pods | |||
are evicted by using the eviction subresource to handle PDB. | |||
|
|||
## Metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the table of contents too? (we should add a verify script for that...)
Other than that, lgtm
/approve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: damemi, ingvagabund The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
572d100
to
701f224
Compare
/kind feature |
/lgtm |
Collect metrics
Fixes: #348
Have the descheduler collect metrics and serve them through
/metrics
endpoint on 10258 secure port.How to test it? Have descheduler running with some descheduling internal (e.g. 1s) and have some pods evicted. Then just run
curl https://localhost:10258/metrics -k
.Metrics bits mostly copy-pasted from kube-scheduler.