-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make logging more consumable #143
Conversation
I used busybox for the sidecar container but I don't think we want that. Any suggestion for a sidecar container? Also I ran into a funlen issue as the 50 statements/func limit has been exceeded in the main (now there are 53). So I am wondering whether we can move the flag parameters setting inside init func in order to decrease the number of statements in the main. |
Yes, by all means move to a separate function to keep the code readable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments.
scripts/create-patch.sh
Outdated
- while true; do tail -n 1 /var/logs/try.log; sleep 5;done | ||
command: | ||
- /bin/sh | ||
image: busybox |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than busybox, lets use one that we already use in the testsuite. The requirement for what is needed in the image is quite low, so most will do. I would suggest quay.io/helmpack/chart-testing:v3.3.1
or amazon/aws-cli:2.2.24
. Introducing a new image puts disk pressure on our CI runs as we already get pretty close to the limit.
# level is the minimum logging level. Valid values are: debug, info, warn, and error | ||
level: info | ||
# stdOut is used to enforce logging to stdout. | ||
stdOut: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit: we do camelcase all the parms, but standard out is usually always shortened to stdout. So lets use that.
stdOut: false | ||
|
||
# devEnables development mode if true and production mode otherwise | ||
dev: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dev mode should not be the default for the helm chart. It should only be the default when we deploy through the makefile. That way it will stay off when used in production.
# stdOut is used to enforce logging to stdout. | ||
stdOut: false | ||
|
||
# devEnables development mode if true and production mode otherwise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This relates to logging, so should dev be part of the logging group. Can we add some comments about how logging changes when in dev mode?
@@ -24,3 +24,10 @@ spec: | |||
- "--health-probe-bind-address=:8081" | |||
- "--metrics-bind-address=127.0.0.1:8080" | |||
- "--leader-elect" | |||
- "--filepath=" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it these are all placeholders that we patch in when applying the helm chart. How does this work for olm? Can we have a comment to explain this part a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the approach. Here are some remaining comments.
# level is the minimum logging level. Valid values are: debug, info, warn, and error | ||
level: info | ||
# stdOut is used to enforce logging to stdout. | ||
stdOut: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a hard parameter to explain. Instead of a new one, could we just piggyback on dev. If dev is true, we always log to stdout, even if a file path is choosen. If dev is false, we only log to the file or stdout never both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sounds good
scripts/create-patch.sh
Outdated
@@ -0,0 +1,42 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a script? Could it be just a yaml manifest in the custom-cert-webhook directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I thought it was not a good idea to change the dir structure. I will add a manifest in custom-cert-webhook
scripts/create-patch.sh
Outdated
name: shared-data | ||
- args: | ||
- -c | ||
- while true; do tail -n 1 /var/logs/try.log; sleep 5;done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we be consistent with the log directory between the two containers? Any reason why couldn't make it /logs in both?
- script: | | ||
PN=$(kubectl get pods -n $NAMESPACE --no-headers -o custom-columns=":metadata.name" | grep verticadb ) \ | ||
&& kubectl wait --for=condition=Ready=true pod/$PN -n $NAMESPACE \ | ||
&& kubectl exec $PN -n $NAMESPACE -c sidecar-container -- tail -n 1 var/logs/try.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a leading '/' to the path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I forgot that, will add it.
cmd/operator/main.go
Outdated
// zapcore.Level. If the string level is invalid, it returns the default | ||
// level | ||
func getZapcoreLevel(lvl string) zapcore.Level { | ||
lvls := map[string]zapcore.Level{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use zapcore.UnmarshalText
here instead of having the map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I will take a look, thanks.
cmd/operator/main.go
Outdated
lvl := zap.NewAtomicLevelAt(zap.DebugLevel) | ||
if logArgs.FilePath != "" { | ||
w := getLogWriter(logArgs.FilePath, logArgs.MaxFileAge, logArgs.MaxFileSize, logArgs.MaxFileRotation) | ||
lvl = zap.NewAtomicLevelAt(getZapcoreLevel(logArgs.Level)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will want the level to be configurable, even when not using FilePath.
cmd/operator/main.go
Outdated
writes := []zapcore.WriteSyncer{} | ||
lvl := zap.NewAtomicLevelAt(zap.DebugLevel) | ||
if logArgs.FilePath != "" { | ||
w := getLogWriter(logArgs.FilePath, logArgs.MaxFileAge, logArgs.MaxFileSize, logArgs.MaxFileRotation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just pass in the Logging struct instead of each individual field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I am already changing that.
cmd/operator/main.go
Outdated
"The maximum size in megabytes of the log file "+ | ||
"before it gets rotated.") | ||
flag.IntVar(&l.MaxFileAge, "maxfileage", DefaultMaxFileAge, | ||
"This is the maximum age, in days, of the logging "+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the help usage be this? It was taken from the lumberjack documentation:
The maximum number of days to retain old log files based on the timestamp encoded in the file.
cmd/operator/main.go
Outdated
@@ -73,6 +81,35 @@ func init() { | |||
//+kubebuilder:scaffold:scheme | |||
} | |||
|
|||
// setLoggingFlagArgs define logging flags with specified names and default values | |||
func (l *Logging) setLoggingFlagArgs() { | |||
flag.StringVar(&l.FilePath, "filepath", "", "The file logging will write to.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the help text say?
The path to the log file. If omitted, all logging will be written to stdout.
cmd/operator/main.go
Outdated
"This is the maximum age, in days, of the logging "+ | ||
"before log rotation gets rid of it.") | ||
flag.IntVar(&l.MaxFileRotation, "maxfilerotation", DefaultMaxFileRotation, | ||
"this is the maximum number of files that are kept in rotation before the old ones are removed.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: "The maximum number of files that are kept in rotation before the old ones are removed"
@@ -5,5 +5,11 @@ This helm chart will install the operator and an admission controller webhook. | |||
| image.name | The name of image that runs the operator. | vertica/verticadb-operator:1.0.0 | | |||
| webhook.caBundle | A PEM encoded CA bundle that will be used to validate the webhook's server certificate. If unspecified, system trust roots on the apiserver are used. | | | |||
| webhook.tlsSecret | The webhook requires a TLS certficate to work. By default we rely on cert-manager to be installed as we use it generate the cert. If you don't want to use cert-manager, you need to specify your own cert, which you can do with this parameter. When set, it is a name of a secret in the same namespace the chart is being installed in. The secret must have the keys: tls.key, ca.crt, and tls.crt. | | | |||
| logging.filePath | When specified, it is the file logging will write to. When empty, logging will write to standard out. | | | |||
| logging.maxFileSize | When logging to a file, it is the maximum size, in MB, of the logging file before log rotation occurs. | 500 | | |||
| logging.maxFileAge | When logging to a file, it is the maximum age, in days, of the logging before log rotation gets rid of it. | 7 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the description be:
The maximum number of days to retain old log files based on the timestamp encoded in the file. This is only applicable if logging to a file.
# example: | ||
# dev -> 2022-01-21T12:50:10.078Z INFO controller-runtime.certwatcher Starting certificate watcher | ||
# prod -> 1.6427683530752437e+09 info controller-runtime.certwatcher Starting certificate watcher | ||
dev: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think dev mode disables sampling of logging messages and produces stack traces more liberally. The are probably the key things to mention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Those are more meaningful.
scripts/create-helm-charts.sh
Outdated
sed -i "s/--level=/--level={{ .Values.logging.level }}/" $TEMPLATE_DIR/verticadb-operator-controller-manager-deployment.yaml | ||
sed -i "s/--dev=/--dev={{ .Values.dev }}/" $TEMPLATE_DIR/verticadb-operator-controller-manager-deployment.yaml | ||
sed -i "s/--stdout=/--stdout={{ .Values.logging.stdOut }}/" $TEMPLATE_DIR/verticadb-operator-controller-manager-deployment.yaml | ||
sed -i "s/--maxfilesize=500/--maxfilesize={{ .Values.logging.maxFileSize }}/" $TEMPLATE_DIR/verticadb-operator-controller-manager-deployment.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are duplicating these defaults around a lot. Can we do pattern matching for the value of each of the parms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry what do you mean by pattern matching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean with reg exp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, add a reg exp portion so that we don't hard code the default in the pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
operator pod | ||
time: 2022-01-25T15:03:23.6521708+01:00 | ||
custom: | ||
Issue: "116" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets use the PR issue number of 143. We can still find our way back to the original GitHub issue.
a2530f7
to
477d53e
Compare
Before, logging for the operator wrote everything to stdout. The logging levels were not tunable, all of the error, warning and info messages are written out.
Now thanks to new helm parameters, we can log to file too and we built log rotation into the operator. We add 2 extra parameters, stdout and devMode that we use respectively to enforce logging to stdout and to select developement/production mode.
The scenario are as follow:
First it was decided to define more helm parameters for sidecar and volume that are used for testing, as the manager container is distroless, but I decided instead to patch to the already installed operator the sidecar container and the volume and then verify that the log has been done to the file using that sidecar.
Closes #56
Closes #116