Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make logging more consumable #143

Merged
merged 4 commits into from
Jan 25, 2022
Merged

Conversation

roypaulin
Copy link
Collaborator

@roypaulin roypaulin commented Jan 20, 2022

Before, logging for the operator wrote everything to stdout. The logging levels were not tunable, all of the error, warning and info messages are written out.
Now thanks to new helm parameters, we can log to file too and we built log rotation into the operator. We add 2 extra parameters, stdout and devMode that we use respectively to enforce logging to stdout and to select developement/production mode.
The scenario are as follow:

  • if no filepath is specified, we log to stdout only and keep as level debug. So the ability to select the level is something supported only when a filepath is given. (That could be easily changed)
  • if filepath is invalid or you do not have access to the file you will have error msgs all over stdout .
  • if a valid filepath is specififed and stdout is false, the logging will be only to file, if stdout is true, the logging will be to both file and stdout

First it was decided to define more helm parameters for sidecar and volume that are used for testing, as the manager container is distroless, but I decided instead to patch to the already installed operator the sidecar container and the volume and then verify that the log has been done to the file using that sidecar.

Closes #56
Closes #116

@roypaulin
Copy link
Collaborator Author

roypaulin commented Jan 20, 2022

I used busybox for the sidecar container but I don't think we want that. Any suggestion for a sidecar container?

Also I ran into a funlen issue as the 50 statements/func limit has been exceeded in the main (now there are 53). So I am wondering whether we can move the flag parameters setting inside init func in order to decrease the number of statements in the main.

@spilchen
Copy link
Collaborator

Also I ran into a funlen issue as the 50 statements/func limit has been exceeded in the main (now there are 53). So I am wondering whether we can move the flag parameters setting inside init func in order to decrease the number of statements in the main.

Yes, by all means move to a separate function to keep the code readable.

Copy link
Collaborator

@spilchen spilchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments.

- while true; do tail -n 1 /var/logs/try.log; sleep 5;done
command:
- /bin/sh
image: busybox
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than busybox, lets use one that we already use in the testsuite. The requirement for what is needed in the image is quite low, so most will do. I would suggest quay.io/helmpack/chart-testing:v3.3.1 or amazon/aws-cli:2.2.24. Introducing a new image puts disk pressure on our CI runs as we already get pretty close to the limit.

# level is the minimum logging level. Valid values are: debug, info, warn, and error
level: info
# stdOut is used to enforce logging to stdout.
stdOut: false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: we do camelcase all the parms, but standard out is usually always shortened to stdout. So lets use that.

stdOut: false

# devEnables development mode if true and production mode otherwise
dev: true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dev mode should not be the default for the helm chart. It should only be the default when we deploy through the makefile. That way it will stay off when used in production.

# stdOut is used to enforce logging to stdout.
stdOut: false

# devEnables development mode if true and production mode otherwise
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relates to logging, so should dev be part of the logging group. Can we add some comments about how logging changes when in dev mode?

@@ -24,3 +24,10 @@ spec:
- "--health-probe-bind-address=:8081"
- "--metrics-bind-address=127.0.0.1:8080"
- "--leader-elect"
- "--filepath="
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it these are all placeholders that we patch in when applying the helm chart. How does this work for olm? Can we have a comment to explain this part a bit.

helm-charts/verticadb-operator/values.yaml Show resolved Hide resolved
Copy link
Collaborator

@spilchen spilchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach. Here are some remaining comments.

# level is the minimum logging level. Valid values are: debug, info, warn, and error
level: info
# stdOut is used to enforce logging to stdout.
stdOut: false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hard parameter to explain. Instead of a new one, could we just piggyback on dev. If dev is true, we always log to stdout, even if a file path is choosen. If dev is false, we only log to the file or stdout never both.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sounds good

@@ -0,0 +1,42 @@
#!/bin/bash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a script? Could it be just a yaml manifest in the custom-cert-webhook directory?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I thought it was not a good idea to change the dir structure. I will add a manifest in custom-cert-webhook

name: shared-data
- args:
- -c
- while true; do tail -n 1 /var/logs/try.log; sleep 5;done
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be consistent with the log directory between the two containers? Any reason why couldn't make it /logs in both?

- script: |
PN=$(kubectl get pods -n $NAMESPACE --no-headers -o custom-columns=":metadata.name" | grep verticadb ) \
&& kubectl wait --for=condition=Ready=true pod/$PN -n $NAMESPACE \
&& kubectl exec $PN -n $NAMESPACE -c sidecar-container -- tail -n 1 var/logs/try.log
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a leading '/' to the path?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I forgot that, will add it.

// zapcore.Level. If the string level is invalid, it returns the default
// level
func getZapcoreLevel(lvl string) zapcore.Level {
lvls := map[string]zapcore.Level{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use zapcore.UnmarshalText here instead of having the map.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I will take a look, thanks.

lvl := zap.NewAtomicLevelAt(zap.DebugLevel)
if logArgs.FilePath != "" {
w := getLogWriter(logArgs.FilePath, logArgs.MaxFileAge, logArgs.MaxFileSize, logArgs.MaxFileRotation)
lvl = zap.NewAtomicLevelAt(getZapcoreLevel(logArgs.Level))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will want the level to be configurable, even when not using FilePath.

writes := []zapcore.WriteSyncer{}
lvl := zap.NewAtomicLevelAt(zap.DebugLevel)
if logArgs.FilePath != "" {
w := getLogWriter(logArgs.FilePath, logArgs.MaxFileAge, logArgs.MaxFileSize, logArgs.MaxFileRotation)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just pass in the Logging struct instead of each individual field?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I am already changing that.

"The maximum size in megabytes of the log file "+
"before it gets rotated.")
flag.IntVar(&l.MaxFileAge, "maxfileage", DefaultMaxFileAge,
"This is the maximum age, in days, of the logging "+
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the help usage be this? It was taken from the lumberjack documentation:

The maximum number of days to retain old log files based on the timestamp encoded in the file.

@@ -73,6 +81,35 @@ func init() {
//+kubebuilder:scaffold:scheme
}

// setLoggingFlagArgs define logging flags with specified names and default values
func (l *Logging) setLoggingFlagArgs() {
flag.StringVar(&l.FilePath, "filepath", "", "The file logging will write to.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the help text say?

The path to the log file. If omitted, all logging will be written to stdout.

"This is the maximum age, in days, of the logging "+
"before log rotation gets rid of it.")
flag.IntVar(&l.MaxFileRotation, "maxfilerotation", DefaultMaxFileRotation,
"this is the maximum number of files that are kept in rotation before the old ones are removed.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: "The maximum number of files that are kept in rotation before the old ones are removed"

@@ -5,5 +5,11 @@ This helm chart will install the operator and an admission controller webhook.
| image.name | The name of image that runs the operator. | vertica/verticadb-operator:1.0.0 |
| webhook.caBundle | A PEM encoded CA bundle that will be used to validate the webhook's server certificate. If unspecified, system trust roots on the apiserver are used. | |
| webhook.tlsSecret | The webhook requires a TLS certficate to work. By default we rely on cert-manager to be installed as we use it generate the cert. If you don't want to use cert-manager, you need to specify your own cert, which you can do with this parameter. When set, it is a name of a secret in the same namespace the chart is being installed in. The secret must have the keys: tls.key, ca.crt, and tls.crt. | |
| logging.filePath | When specified, it is the file logging will write to. When empty, logging will write to standard out. | |
| logging.maxFileSize | When logging to a file, it is the maximum size, in MB, of the logging file before log rotation occurs. | 500 |
| logging.maxFileAge | When logging to a file, it is the maximum age, in days, of the logging before log rotation gets rid of it. | 7 |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the description be:

The maximum number of days to retain old log files based on the timestamp encoded in the file. This is only applicable if logging to a file.

# example:
# dev -> 2022-01-21T12:50:10.078Z INFO controller-runtime.certwatcher Starting certificate watcher
# prod -> 1.6427683530752437e+09 info controller-runtime.certwatcher Starting certificate watcher
dev: false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think dev mode disables sampling of logging messages and produces stack traces more liberally. The are probably the key things to mention.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Those are more meaningful.

sed -i "s/--level=/--level={{ .Values.logging.level }}/" $TEMPLATE_DIR/verticadb-operator-controller-manager-deployment.yaml
sed -i "s/--dev=/--dev={{ .Values.dev }}/" $TEMPLATE_DIR/verticadb-operator-controller-manager-deployment.yaml
sed -i "s/--stdout=/--stdout={{ .Values.logging.stdOut }}/" $TEMPLATE_DIR/verticadb-operator-controller-manager-deployment.yaml
sed -i "s/--maxfilesize=500/--maxfilesize={{ .Values.logging.maxFileSize }}/" $TEMPLATE_DIR/verticadb-operator-controller-manager-deployment.yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are duplicating these defaults around a lot. Can we do pattern matching for the value of each of the parms?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry what do you mean by pattern matching?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean with reg exp?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, add a reg exp portion so that we don't hard code the default in the pattern.

Copy link
Collaborator

@spilchen spilchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

operator pod
time: 2022-01-25T15:03:23.6521708+01:00
custom:
Issue: "116"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets use the PR issue number of 143. We can still find our way back to the original GitHub issue.

@spilchen spilchen merged commit ff10c43 into vertica:main Jan 25, 2022
@roypaulin roypaulin deleted the improve-logging branch March 10, 2022 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vertica operator: Write logs to a file Configurable log level
2 participants