-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure service telemetry #5
Configure service telemetry #5
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some feedback, I know this is in our fork but I'd like @jpkrohling to sign off on whether this is going roughly in the right direction.
We should aim to upstream this as soon as we can so we don't have to keep using the fork, and the current state makes me a little nervous that keeping it in sync will be a bit of a pain.
@@ -84,7 +84,7 @@ func (cm *configProvider) Get(ctx context.Context, factories Factories) (*Config | |||
} | |||
|
|||
var cfg *configSettings | |||
if cfg, err = unmarshal(conf, factories); err != nil { | |||
if cfg, err = Unmarshal(conf, factories); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to expose this method? It seems like downstream we could use NewConfigProvider.Get
instead to do the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConfigProvider
can only get config from URIs. In our case the Agent parses its own config file, and we can't give the whole file to ConfigProvider
because it contains non-Otel config. We could theoretically add new functionality to ConfigProvider
but I don't think it'll be accepted upstream since what we do here is not really what it is meant to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I have the full context, but you could have another config provider specific to agent. We have a few config providers already, like the file provider, HTTP provider, and so on. Look at confmap/provider
for example of implementations.
@@ -29,7 +29,7 @@ type configSettings struct { | |||
|
|||
// unmarshal the configSettings from a confmap.Conf. | |||
// After the config is unmarshalled, `Validate()` must be called to validate. | |||
func unmarshal(v *confmap.Conf, factories Factories) (*configSettings, error) { | |||
func Unmarshal(v *confmap.Conf, factories Factories) (*configSettings, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above, I'm not sure if we need to expose this.
@@ -91,8 +91,8 @@ func newColTelemetry(useOtel bool, disableHighCardinality bool, extendedConfig b | |||
} | |||
} | |||
|
|||
func (tel *telemetryInitializer) init(res *resource.Resource, settings component.TelemetrySettings, cfg telemetry.Config, asyncErrorChannel chan error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diff in this file is pretty significant. Is this the absolute smallest unit of change we could be doing?
I'm worried about the overhead of maintaining the fork with the number of merge conflicts this is susceptible to.
@@ -39,18 +43,29 @@ type Settings struct { | |||
} | |||
|
|||
// New creates a new Telemetry from Config. | |||
func New(_ context.Context, set Settings, cfg Config) (*Telemetry, error) { | |||
func New(_ context.Context, set Settings, cfg Config, tracerProvider trace.TracerProvider) (*Telemetry, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you discussed these changes with @jpkrohling?
Something feels a bit off to me here with how the service-level providers are being injected down through the call stack; it doesn't feel like it's on the right trajectory for code that could eventually be merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not discussed this particular change with @jpkrohling. I only discussed with him the changes in the upstream PR.
I know that passing a TracerProvider is a bit hacky. It is not something I expect to be merged upstream. The reason I'm doing it here is because I noticed the Agent used to use a noop TracerProvider before, and I wanted to match this post-upgrade because I don't think we need it. As far as I can tell Otel uses it for the zpagesextension which we don't use in the Agent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have an effort upstream to instrument the collector, including for tracing. I'm quite confident the solution upstream is going to be quite different than what is here, but again, @mar4uk has a better context when it comes to this effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a proposal to make the collector observable open-telemetry#7532 upstream.
But PRs are not moving for quite some time: open-telemetry#7644
So it is difficult to say when this proposal will be implemented and go live. Probably not very soon
@rfratto Honestly I doubt that the upstream PR will get merged, for reasons already mentioned in its description. On Slack it was mentioned that most likely the way forward is going to be in not using the The one change which maybe has a chance of being upstreamed is the one for disabling Collector's |
@@ -84,7 +84,7 @@ func (cm *configProvider) Get(ctx context.Context, factories Factories) (*Config | |||
} | |||
|
|||
var cfg *configSettings | |||
if cfg, err = unmarshal(conf, factories); err != nil { | |||
if cfg, err = Unmarshal(conf, factories); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I have the full context, but you could have another config provider specific to agent. We have a few config providers already, like the file provider, HTTP provider, and so on. Look at confmap/provider
for example of implementations.
@@ -57,6 +59,14 @@ type Settings struct { | |||
// LoggingOptions provides a way to change behavior of zap logging. | |||
LoggingOptions []zap.Option | |||
|
|||
OtelMetricViews []sdkmetric.View |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know @mar4uk has been following the discussions around instrumenting the collector itself. Perhaps she could provide some feedback to this part of the PR?
@@ -39,18 +43,29 @@ type Settings struct { | |||
} | |||
|
|||
// New creates a new Telemetry from Config. | |||
func New(_ context.Context, set Settings, cfg Config) (*Telemetry, error) { | |||
func New(_ context.Context, set Settings, cfg Config, tracerProvider trace.TracerProvider) (*Telemetry, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have an effort upstream to instrument the collector, including for tracing. I'm quite confident the solution upstream is going to be quite different than what is here, but again, @mar4uk has a better context when it comes to this effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I don't want to block this, and we anticipate it to be temporary anyway (especially since Flow doesn't need it). Let's get it merged for now; we can always revisit in future updates if there's a different approach we could take if we're having a hard time maintaining this.
To resolve the govulncheck reports: ``` Vulnerability #1: GO-2023-1987 Large RSA keys can cause high CPU usage in crypto/tls More info: https://pkg.go.dev/vuln/GO-2023-1987 Standard library Found in: crypto/[email protected] Fixed in: crypto/[email protected] Example traces found: Error: #1: service/internal/proctelemetry/config.go:299:27: proctelemetry.initOTLPgRPCExporter calls otlpmetricgrpc.New, which eventually calls tls.Conn.Handshake Error: #2: service/internal/proctelemetry/config.go:156:39: proctelemetry.InitPrometheusServer calls http.Server.ListenAndServe, which eventually calls tls.Conn.HandshakeContext Error: #3: service/service.go:251:36: service.buildResource calls uuid.NewRandom, which eventually calls tls.Conn.Read Error: #4: service/config.go:35:13: service.Config.Validate calls fmt.Printf, which eventually calls tls.Conn.Write Error: #5: service/telemetry/telemetry.go:32:28: telemetry.Telemetry.Shutdown calls trace.TracerProvider.Shutdown, which eventually calls tls.Dialer.DialContext ``` https://github.com/open-telemetry/opentelemetry-collector/actions/runs/5753675727/job/15597394973?pr=8144
Grafana Agent uses code from the Collector. However, in order to integrate with it we need features which are not available in the Collector yet, hence the need for this forked repo. So far we dealt with this by exporting all Collector internals. This PR is an attempt at a more lightweight approach, where we only modify the minimum bits of functionality that we need.
There is also an upstream PR to merge this functionality into the Collector itself, or at least to get some feedback on how appropriate this approach is.