Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gRPC storage plugins - how to debug? #1529

Closed
jacobmarble opened this issue May 10, 2019 · 7 comments · Fixed by #1962
Closed

gRPC storage plugins - how to debug? #1529

jacobmarble opened this issue May 10, 2019 · 7 comments · Fixed by #1962
Labels
help wanted Features that maintainers are willing to accept but do not have cycles to implement

Comments

@jacobmarble
Copy link
Contributor

Requirement - what kind of business use case are you trying to solve?

I'm writing a gRPC storage plugin for InfluxDB #272

Problem - what in Jaeger blocks you from solving the requirement?

Originally, the InfluxDB storage plugin was developed in a fork of the Jaeger repo. I copied that code into a new repo, which uses the gRPC framework instead.
https://github.com/influxdata/jaeger-influxdb

The plugin has a bug, which manifests as an effectively empty message:

$ SPAN_STORAGE_TYPE=grpc-plugin ./cmd/all-in-one/all-in-one- --grpc-storage-plugin.binary /Users/jacobmarble/jaeger-influxdb/cmd/jaeger-influxdb/jaeger-influxdb-darwin --grpc-storage-plugin.configuration-file /Users/jacobmarble/jaeger-influxdb/config-example-v1.yaml 
{"level":"info","ts":1557523300.436537,"caller":"flags/service.go:113","msg":"Mounting metrics handler on admin server","route":"/metrics"}
{"level":"info","ts":1557523300.437811,"caller":"flags/admin.go:108","msg":"Mounting health check on admin server","route":"/"}
{"level":"info","ts":1557523300.4378781,"caller":"flags/admin.go:114","msg":"Starting admin HTTP server","http-port":14269}
{"level":"info","ts":1557523300.437897,"caller":"flags/admin.go:100","msg":"Admin server started","http-port":14269,"health-status":"unavailable"}
{"level":"fatal","ts":1557523300.497668,"caller":"all-in-one/main.go:102","msg":"Failed to init storage factory","error":"error attempting to connect to plugin rpc client: Unrecognized remote plugin message: \n\nThis usually means that the plugin is either invalid or simply\nneeds to be recompiled to support the latest protocol.","stacktrace":"main.main.func1\n\t/Users/jacobmarble/go/src/github.com/jaegertracing/jaeger/cmd/all-in-one/main.go:102\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).execute\n\t/Users/jacobmarble/go/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:762\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).ExecuteC\n\t/Users/jacobmarble/go/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:852\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/Users/jacobmarble/go/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:800\nmain.main\n\t/Users/jacobmarble/go/src/github.com/jaegertracing/jaeger/cmd/all-in-one/main.go:167\nruntime.main\n\t/Users/jacobmarble/.gvm/gos/go1.12.4/src/runtime/proc.go:200"}

In order to figure out what's going on, I changed this the plugin client log level from Warn to Debug. Without this change, I have zero ability to debug/diagnose plugin problems:

client := plugin.NewClient(&plugin.ClientConfig{
HandshakeConfig: shared.Handshake,
VersionedPlugins: map[int]plugin.PluginSet{
1: shared.PluginMap,
},
Cmd: cmd,
AllowedProtocols: []plugin.Protocol{plugin.ProtocolGRPC},
Logger: hclog.New(&hclog.LoggerOptions{
Level: hclog.Warn,
}),
})

Proposal - what do you suggest to solve the problem or improve the existing situation?

  1. The plugin client log level should match (if possible) the Jaeger service log level, and that the plugin log should be piped to the service's logger. It's a little awkward because hashicorp's plugin library uses hclog while Jaeger services use zap.

  2. Add plugin client log level as a service option. For me, this allows stack traces through, which is very helpful.

  3. Looks like this is a known issue, from the last section of the README in Create README for developing grpc storage plugins #1518 #1525. I'll certainly modify my plugin accordingly, but maybe (2) above is a suitable workaround for the exception described?

If the plugin uses a hclog ("github.com/hashicorp/go-hclog") logger then any logs created at the WARN or above level will be included in the log output of the all-in-one application. Bear in mind that any logging performed before grpc.Serve is called will not be included. As well as this grpc.Serve should likely be called at the end of your main function as once running Jaeger may start calling to read/write spans straight away. This can mean that logging output during any setup can be challenging.

@jacobmarble
Copy link
Contributor Author

jacobmarble commented May 10, 2019

After changing my plugin to use hclog, and only logging at level Warn and above, I still see zero useful logs.

The following log was produced with the grpc/config/config.go hack described earlier. Without the hack, the non-JSON lines are omitted:

{"level":"info","ts":1557529077.3812459,"caller":"flags/service.go:113","msg":"Mounting metrics handler on admin server","route":"/metrics"}
{"level":"info","ts":1557529077.381747,"caller":"flags/admin.go:108","msg":"Mounting health check on admin server","route":"/"}
{"level":"info","ts":1557529077.381844,"caller":"flags/admin.go:114","msg":"Starting admin HTTP server","http-port":14269}
{"level":"info","ts":1557529077.381883,"caller":"flags/admin.go:100","msg":"Admin server started","http-port":14269,"health-status":"unavailable"}
2019-05-10T15:57:57.385-0700 [DEBUG] starting plugin: path=/Users/jacobmarble/jaeger-influxdb/cmd/jaeger-influxdb/jaeger-influxdb-darwin args=[/Users/jacobmarble/jaeger-influxdb/cmd/jaeger-influxdb/jaeger-influxdb-darwin, --config, /Users/jacobmarble/jaeger-influxdb/config-example-v1.yaml]
2019-05-10T15:57:57.387-0700 [DEBUG] plugin started: path=/Users/jacobmarble/jaeger-influxdb/cmd/jaeger-influxdb/jaeger-influxdb-darwin pid=12972
2019-05-10T15:57:57.387-0700 [DEBUG] waiting for RPC address: path=/Users/jacobmarble/jaeger-influxdb/cmd/jaeger-influxdb/jaeger-influxdb-darwin
2019-05-10T15:57:57.458-0700 [DEBUG] jaeger-influxdb-darwin: 2019-05-10T15:57:57.457-0700 [ERROR] jaeger-influxdb/main.go:39: jaeger-influxdb: failed to parse configuration file
2019-05-10T15:57:57.458-0700 [DEBUG] plugin process exited: path=/Users/jacobmarble/jaeger-influxdb/cmd/jaeger-influxdb/jaeger-influxdb-darwin pid=12972 error="exit status 1"
{"level":"fatal","ts":1557529077.458667,"caller":"all-in-one/main.go:102","msg":"Failed to init storage factory","error":"error attempting to connect to plugin rpc client: Unrecognized remote plugin message: \n\nThis usually means that the plugin is either invalid or simply\nneeds to be recompiled to support the latest protocol.","stacktrace":"main.main.func1\n\t/Users/jacobmarble/go/src/github.com/jaegertracing/jaeger/cmd/all-in-one/main.go:102\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).execute\n\t/Users/jacobmarble/go/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:762\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).ExecuteC\n\t/Users/jacobmarble/go/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:852\ngithub.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/Users/jacobmarble/go/src/github.com/jaegertracing/jaeger/vendor/github.com/spf13/cobra/command.go:800\nmain.main\n\t/Users/jacobmarble/go/src/github.com/jaegertracing/jaeger/cmd/all-in-one/main.go:167\nruntime.main\n\t/Users/jacobmarble/.gvm/gos/go1.12.5/src/runtime/proc.go:200"}

Logger instantiated like this:

	logger := hclog.New(&hclog.LoggerOptions{
		Name:            "jaeger-influxdb",
		Level:           hclog.Warn, // Jaeger only captures >= Warn, so don't bother logging below Warn
		IncludeLocation: true,
	})

@yurishkuro
Copy link
Member

yurishkuro commented May 10, 2019

@gianarb & @jacobmarble are you in a friendly competition about InfluxDB plugin? :-)

@yurishkuro yurishkuro added the help wanted Features that maintainers are willing to accept but do not have cycles to implement label May 10, 2019
@yurishkuro
Copy link
Member

// Jaeger only captures >= Warn, so don't bother logging below Warn

this sounds like a potential enhancement to the CLI flags so that the log level can be passed to the plugin binary from jaeger-component

@gianarb
Copy link

gianarb commented May 11, 2019

Hello Yuri! Nop same project but far as timezones. As I told you via email I was able to run tt successfully locally. But probably when I left and Jacob tried by himself I was not able to get the same behavior and he starts his troubleshooting. I think it will be nice to have the --log-level flag propagated btw.

Anyway. We have @goller with his team as verifying that code and data structure all good, and probably next week you are gonna have a look as well! There is a general feeling of excitement about being able to distribute influxdb as a backend for influxdb. We have a fork that runs in our system since a couple of weeks now!

@yyyogev
Copy link
Contributor

yyyogev commented Dec 2, 2019

// Jaeger only captures >= Warn, so don't bother logging below Warn

this sounds like a potential enhancement to the CLI flags so that the log level can be passed to the plugin binary from jaeger-component

I want to help with this one. As I see it there are two ways to do this

  • add a boolean CLI flag like --grpc-storage-plugin.propagate-log-level which will determine if the grpc logger will have the same level as the logger of the grpc factory.
  • add a CLI flag --grpc-storage-plugin.log-level to set what level the grpc plugin will omit.

I think the first option is better only if it will be true by default. It seems more intuitive, I would expect that if I set the log-level, I would get the corresponding logs from the storage plugin as well.
@yurishkuro what do you think?

@yurishkuro
Copy link
Member

I think option 2 allows for better control. If you are debugging plugin issues you may not need all debug logs from Jaeger.

I'm not clear if this will address the startup logs issue.

@rleiwang
Copy link

This is just an other option in addition to logging solution, attach debugger to a running gRPC plugin:

  1. build plugin with additional debug info
go build -gcflags=all="-N -l"
  1. start Jaeger
SPAN_STORAGE_TYPE=grpc-plugin ./all-in-one --grpc-storage-plugin.binary=[PLUGIN] --grpc-storage-plugin.configuration-file=[config]
  1. attach dlv to running plugin
dlv attach ${PLUGIN_PID} --headless --listen=:${DEBUG_PORT} --api-version 2 --log

or
use the script https://github.com/chronowave/opentelemetry/blob/main/scripts/debug-plugin.sh I added

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Features that maintainers are willing to accept but do not have cycles to implement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants