Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support For Structured Logging #347

Closed
seanmalloy opened this issue Jul 15, 2020 · 20 comments · Fixed by #394 or #412
Closed

Add Support For Structured Logging #347

seanmalloy opened this issue Jul 15, 2020 · 20 comments · Fixed by #394 or #412
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@seanmalloy
Copy link
Member

Is your feature request related to a problem? Please describe.

As a user of descheduler I'd like the descheduler to support generating structured logs, so that my log aggregation tools can more easily parse the descheduler logs. The k/k repo has recently started adding support for structured logs.

Describe the solution you'd like

Add a new --logging-format CLI option to descheduler to allow choosing between text and json log formats. The text log format should be the default.

Update klog usage in descheduler to use the new InfoS and ErrorS methods, https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/1602-structured-logging#new-klog-methods.

Describe alternatives you've considered

Using the descheduler logs as is without any changes.

What version of descheduler are you using?

descheduler version: v0.18.0

Additional context

See structured logging KEP: https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/1602-structured-logging.

PR to add the --logging-format option to kube-scheduler: kubernetes/kubernetes#91522.

@seanmalloy seanmalloy added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 15, 2020
@seanmalloy
Copy link
Member Author

@farah would you be interested in working on this?

@farah
Copy link
Contributor

farah commented Jul 15, 2020

Of course, thanks sean.

/assign

@seanmalloy
Copy link
Member Author

@farah thanks for your help with this. Let us know if you have any questions.

FYI @damemi and @ingvagabund I'm proposing adding structured logging support to descheduler. Please let me know if you have any concerns. Thanks!

@ingvagabund
Copy link
Contributor

ingvagabund commented Jul 15, 2020

Looks good. It will force us to write better log messages :) Also, opens a possibility to parse the logs properly and allows to build various models from the logs which can be used to see how descheduler behaves.

Can you still use InfoS and ErrorS with V(?)?

@damemi
Copy link
Contributor

damemi commented Jul 15, 2020

+1 for improving logging!

@seanmalloy
Copy link
Member Author

@farah thanks for helping to get the initial structured logging changes completed in #376. Do you want to submit a PR to add the --logging-format CLI option?

Also, I think there are still some log messages that need to be converted to use structured logging(InfoS and ErrorS). Are you interesting in submitting some PR's to finish converting to structured logging?

We would like to complete the conversion to structured logging as part of the descheduler v0.20.0(k8s v1.20.0) release cycle. Let me know if you need any help with this effort.

See below for examples of log statements that still need to be converted to use InfoS. Note this list is not comprehensive.

$ ag Infof pkg/
pkg/descheduler/descheduler.go
106:			klog.V(1).Infof("The cluster size is 0 or 1 meaning eviction causes service disruption or degradation. So aborting..")

pkg/descheduler/policyconfig.go
33:		klog.V(1).Infof("policy config file not specified")

pkg/descheduler/evictions/evictions.go
115:		klog.V(1).Infof("Evicted pod in dry run mode: %#v in namespace %#v%s", pod.Name, pod.Namespace, reason)
117:		klog.V(1).Infof("Evicted pod: %#v in namespace %#v%s", pod.Name, pod.Namespace, reason)
119:		eventBroadcaster.StartLogging(klog.V(3).Infof)
230:		klog.V(4).Infof("Pod %s in namespace %s is not evictable: Pod lacks an eviction annotation and fails the following checks: %v", pod.Name, pod.Namespace, errors.NewAggregate(checkErrs).Error())

pkg/descheduler/strategies/pod_antiaffinity.go
116:				klog.Infof("%v", err)

pkg/descheduler/strategies/lownodeutilization.go
101:	klog.V(1).Infof("Criteria for a node under utilization: CPU: %v, Mem: %v, Pods: %v",
105:		klog.V(1).Infof("No node is underutilized, nothing to do here, you might tune your thresholds further")
108:	klog.V(1).Infof("Total number of underutilized nodes: %v", len(lowNodes))
111:		klog.V(1).Infof("number of nodes underutilized (%v) is less than NumberOfNodes (%v), nothing to do here", len(lowNodes), strategy.Params.NodeResourceUtilizationThresholds.NumberOfNodes)
116:		klog.V(1).Infof("All nodes are underutilized, nothing to do here")
121:		klog.V(1).Infof("All nodes are under target utilization, nothing to do here")
125:	klog.V(1).Infof("Criteria for a node above target utilization: CPU: %v, Mem: %v, Pods: %v",
127:	klog.V(1).Infof("Total number of nodes above target utilization: %v", len(targetNodes))
255:		klog.V(2).Infof("AllPods:%v, nonRemovablePods:%v, removablePods:%v", len(node.allPods), len(nonRemovablePods), len(removablePods))
262:		klog.V(1).Infof("evicting pods based on priority, if they have same priority, they'll be evicted based on QoS tiers")

pkg/descheduler/node/node.go
45:		klog.V(2).Infof("node lister returned empty list, now fetch directly")
83:			klog.V(4).Infof("Ignoring node %v with %v condition status %v", node.Name, cond.Type, cond.Status)
86:			klog.V(4).Infof("Ignoring node %v with %v condition status %v", node.Name, cond.Type, cond.Status)
92:		klog.V(4).Infof("Ignoring node %v since it is unschedulable", node.Name)

pkg/descheduler/strategies/node_affinity.go
107:	klog.V(1).Infof("Evicted %v pods", podEvictor.TotalEvicted())

pkg/utils/predicates.go
73:			klog.V(10).Infof("Match for RequiredDuringSchedulingIgnoredDuringExecution node selector terms %+v", nodeSelectorTerms)
86:			klog.V(10).Infof("Failed to parse MatchExpressions: %+v, regarding as not match.", req.MatchExpressions)

@farah
Copy link
Contributor

farah commented Sep 3, 2020

Hey Sean, sure no problem. I'll get to them soon. I was actually about to create a new issue for tracking the logging format flag, but I guess we could do it here.

@seanmalloy
Copy link
Member Author

Hey Sean, sure no problem. I'll get to them soon. I was actually about to create a new issue for tracking the logging format flag, but I guess we could do it here.

@farah feel free to break this up into multiple smaller pull requests. You don't need to do all of the changes in a single PR. Thanks!

@farah
Copy link
Contributor

farah commented Sep 3, 2020

@seanmalloy Just to be clear, are you suggesting that all logs be structured? FWIW in the KEP it says migrating all logs to be structured is a non-goal.

The ones you posted I left out deliberately because I surmised that it didn't make sense to structure it. Having had another look I can see a few more that could in fact be structured, like these:

115:		klog.V(1).Infof("Evicted pod in dry run mode: %#v in namespace %#v%s", pod.Name, pod.Namespace, reason)
117:		klog.V(1).Infof("Evicted pod: %#v in namespace %#v%s", pod.Name, pod.Namespace, reason)

What about logs like these?

255: klog.V(2).Infof("AllPods:%v, nonRemovablePods:%v, removablePods:%v", len(node.allPods), len(nonRemovablePods), len(removablePods))

Basically I'm asking for some kind of criteria to decide whether a log should be structured as I'm not 100% on some of them (like abive)

@farah
Copy link
Contributor

farah commented Sep 3, 2020

i think I'm overcomplicating it, most could be structured. I will do them all and you can you review them.

@farah
Copy link
Contributor

farah commented Sep 4, 2020

@seanmalloy I put a PR to convert the remaining ones. I left this one out:

255: klog.V(2).Infof("AllPods:%v, nonRemovablePods:%v, removablePods:%v", len(node.allPods), len(nonRemovablePods), len(removablePods))

@seanmalloy
Copy link
Member Author

There are still some small changes required before this is 100% done.

/reopen

@k8s-ci-robot
Copy link
Contributor

@seanmalloy: Reopened this issue.

In response to this:

There are still some small changes required before this is 100% done.

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Sep 9, 2020
@seanmalloy
Copy link
Member Author

@farah it looks like we are very close to being 100% converted to structured logging. Would you be willing to convert these last few log statements?

Log statements that need to be converted to use klog.ErrorS().

$ ag klog.Errorf pkg/ cmd/
pkg/descheduler/evictions/evictions.go
109:		klog.Errorf("Error evicting pod: %#v in namespace %#v%s: %#v", pod.Name, pod.Namespace, reason, err)

pkg/descheduler/strategies/lownodeutilization.go
81:		klog.Errorf("LowNodeUtilization config is not valid: %v", err)

pkg/descheduler/strategies/node_affinity.go
90:					klog.Errorf("failed to get pods from %v: %v", node.Name, err)
104:			klog.Errorf("invalid nodeAffinityType: %v", nodeAffinity)

cmd/descheduler/app/server.go
46:				klog.Errorf("%v", err)

Log statements that need to be converted to use InfoS.

ag Infof pkg/ cmd/
pkg/descheduler/policyconfig.go
33:		klog.V(1).Infof("policy config file not specified")

pkg/descheduler/descheduler.go
106:			klog.V(1).Infof("The cluster size is 0 or 1 meaning eviction causes service disruption or degradation. So aborting..")

pkg/descheduler/evictions/evictions.go
119:		eventBroadcaster.StartLogging(klog.V(3).Infof)
230:		klog.V(4).Infof("Pod %s in namespace %s is not evictable: Pod lacks an eviction annotation and fails the following checks: %v", pod.Name, pod.Namespace, errors.NewAggregate(checkErrs).Error())

pkg/descheduler/strategies/pod_antiaffinity.go
115:				klog.Infof("%v", err)

pkg/descheduler/strategies/lownodeutilization.go
105:		klog.V(1).Infof("No node is underutilized, nothing to do here, you might tune your thresholds further")
111:		klog.V(1).Infof("number of nodes underutilized (%v) is less than NumberOfNodes (%v), nothing to do here", len(lowNodes), strategy.Params.NodeResourceUtilizationThresholds.NumberOfNodes)
116:		klog.V(1).Infof("All nodes are underutilized, nothing to do here")
121:		klog.V(1).Infof("All nodes are under target utilization, nothing to do here")
255:		klog.V(2).Infof("AllPods:%v, nonRemovablePods:%v, removablePods:%v", len(node.allPods), len(nonRemovablePods), len(removablePods))
262:		klog.V(1).Infof("evicting pods based on priority, if they have same priority, they'll be evicted based on QoS tiers")

pkg/descheduler/node/node.go
45:		klog.V(2).Infof("node lister returned empty list, now fetch directly")
83:			klog.V(4).Infof("Ignoring node %v with %v condition status %v", node.Name, cond.Type, cond.Status)
86:			klog.V(4).Infof("Ignoring node %v with %v condition status %v", node.Name, cond.Type, cond.Status)
92:		klog.V(4).Infof("Ignoring node %v since it is unschedulable", node.Name)

pkg/utils/predicates.go
86:			klog.V(10).Infof("Failed to parse MatchExpressions: %+v, regarding as not match.", req.MatchExpressions)

@farah
Copy link
Contributor

farah commented Sep 10, 2020

Some logs, like klog.Errorf("invalid nodeAffinityType: %v", nodeAffinity) can't be converted as klog.ErrorS requires an error as the first argument.

@damemi
Copy link
Contributor

damemi commented Sep 10, 2020

@farah it can be acceptable to use nil as an argument to klog.ErrorS, or simply print expected errors as an InfoS. See https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md#using-errors

@seanmalloy
Copy link
Member Author

@farah we are in the home stretch. Would you be able to finish these off and submit a PR to clean up these last few references?

Just two references to Infof left.

ag Infof pkg/ cmd/
pkg/descheduler/evictions/evictions.go
119:		eventBroadcaster.StartLogging(klog.V(3).Infof)

pkg/descheduler/strategies/pod_antiaffinity.go
115:				klog.Infof("%v", err)

A small number of references to Info and Error.

pkg/descheduler/strategies/toomanyrestarts.go
54:		klog.V(1).Info(err)

pkg/descheduler/strategies/pod_lifetime.go
61:		klog.V(1).Info(err)

pkg/descheduler/strategies/duplicates.go
61:		klog.V(1).Info(err)

pkg/descheduler/strategies/node_taint.go
52:		klog.V(1).Info(err)

pkg/descheduler/strategies/lownodeutilization.go
72:		klog.V(1).Info(err)

pkg/descheduler/strategies/pod_antiaffinity.go
53:		klog.V(1).Info(err)

pkg/descheduler/strategies/node_affinity.go
52:		klog.V(1).Info(err)

pkg/descheduler/node/node.go
128:		klog.Error(err)

@ingvagabund
Copy link
Contributor

pkg/descheduler/evictions/evictions.go
119: eventBroadcaster.StartLogging(klog.V(3).Infof)

This needs to say as it is since StartLogging does not know anything about the structured logging yet.

@farah
Copy link
Contributor

farah commented Sep 20, 2020

Ok guys, I've just added PR for those final conversions.

Lets keep this issue open as there's still the json log format flag remaining.

@seanmalloy
Copy link
Member Author

I found one more log message that needed to be updated to use structured logging. I submitted #411 to clean this up.

@farah I believe all that is left is to add the new --logging-format CLI option. Is this something you have time to do? Let us know if you have any questions or need any assistance. We are trying complete this for the descheduler v0.20.0 release which should sometime in December.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
5 participants