Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure service telemetry #5

Merged
merged 4 commits into from
Jun 30, 2023

Conversation

ptodev
Copy link

@ptodev ptodev commented Jun 20, 2023

Grafana Agent uses code from the Collector. However, in order to integrate with it we need features which are not available in the Collector yet, hence the need for this forked repo. So far we dealt with this by exporting all Collector internals. This PR is an attempt at a more lightweight approach, where we only modify the minimum bits of functionality that we need.

There is also an upstream PR to merge this functionality into the Collector itself, or at least to get some feedback on how appropriate this approach is.

Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some feedback, I know this is in our fork but I'd like @jpkrohling to sign off on whether this is going roughly in the right direction.

We should aim to upstream this as soon as we can so we don't have to keep using the fork, and the current state makes me a little nervous that keeping it in sync will be a bit of a pain.

@@ -84,7 +84,7 @@ func (cm *configProvider) Get(ctx context.Context, factories Factories) (*Config
}

var cfg *configSettings
if cfg, err = unmarshal(conf, factories); err != nil {
if cfg, err = Unmarshal(conf, factories); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to expose this method? It seems like downstream we could use NewConfigProvider.Get instead to do the same thing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConfigProvider can only get config from URIs. In our case the Agent parses its own config file, and we can't give the whole file to ConfigProviderbecause it contains non-Otel config. We could theoretically add new functionality to ConfigProviderbut I don't think it'll be accepted upstream since what we do here is not really what it is meant to do.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I have the full context, but you could have another config provider specific to agent. We have a few config providers already, like the file provider, HTTP provider, and so on. Look at confmap/provider for example of implementations.

@@ -29,7 +29,7 @@ type configSettings struct {

// unmarshal the configSettings from a confmap.Conf.
// After the config is unmarshalled, `Validate()` must be called to validate.
func unmarshal(v *confmap.Conf, factories Factories) (*configSettings, error) {
func Unmarshal(v *confmap.Conf, factories Factories) (*configSettings, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, I'm not sure if we need to expose this.

@@ -91,8 +91,8 @@ func newColTelemetry(useOtel bool, disableHighCardinality bool, extendedConfig b
}
}

func (tel *telemetryInitializer) init(res *resource.Resource, settings component.TelemetrySettings, cfg telemetry.Config, asyncErrorChannel chan error) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff in this file is pretty significant. Is this the absolute smallest unit of change we could be doing?

I'm worried about the overhead of maintaining the fork with the number of merge conflicts this is susceptible to.

@@ -39,18 +43,29 @@ type Settings struct {
}

// New creates a new Telemetry from Config.
func New(_ context.Context, set Settings, cfg Config) (*Telemetry, error) {
func New(_ context.Context, set Settings, cfg Config, tracerProvider trace.TracerProvider) (*Telemetry, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you discussed these changes with @jpkrohling?

Something feels a bit off to me here with how the service-level providers are being injected down through the call stack; it doesn't feel like it's on the right trajectory for code that could eventually be merged.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not discussed this particular change with @jpkrohling. I only discussed with him the changes in the upstream PR.

I know that passing a TracerProvider is a bit hacky. It is not something I expect to be merged upstream. The reason I'm doing it here is because I noticed the Agent used to use a noop TracerProvider before, and I wanted to match this post-upgrade because I don't think we need it. As far as I can tell Otel uses it for the zpagesextension which we don't use in the Agent.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an effort upstream to instrument the collector, including for tracing. I'm quite confident the solution upstream is going to be quite different than what is here, but again, @mar4uk has a better context when it comes to this effort.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a proposal to make the collector observable open-telemetry#7532 upstream.
But PRs are not moving for quite some time: open-telemetry#7644
So it is difficult to say when this proposal will be implemented and go live. Probably not very soon

@ptodev
Copy link
Author

ptodev commented Jun 20, 2023

We should aim to upstream this as soon as we can so we don't have to keep using the fork, and the current state makes me a little nervous that keeping it in sync will be a bit of a pain.

@rfratto Honestly I doubt that the upstream PR will get merged, for reasons already mentioned in its description. On Slack it was mentioned that most likely the way forward is going to be in not using the service package at all, and instead the graph (previously pipelines) package will be made public so that people could build their own services.

The one change which maybe has a chance of being upstreamed is the one for disabling Collector's /metrics endpoint. But IMO it'd need to be implemented differently so that it's configured through yaml.

@@ -84,7 +84,7 @@ func (cm *configProvider) Get(ctx context.Context, factories Factories) (*Config
}

var cfg *configSettings
if cfg, err = unmarshal(conf, factories); err != nil {
if cfg, err = Unmarshal(conf, factories); err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I have the full context, but you could have another config provider specific to agent. We have a few config providers already, like the file provider, HTTP provider, and so on. Look at confmap/provider for example of implementations.

@@ -57,6 +59,14 @@ type Settings struct {
// LoggingOptions provides a way to change behavior of zap logging.
LoggingOptions []zap.Option

OtelMetricViews []sdkmetric.View

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know @mar4uk has been following the discussions around instrumenting the collector itself. Perhaps she could provide some feedback to this part of the PR?

@@ -39,18 +43,29 @@ type Settings struct {
}

// New creates a new Telemetry from Config.
func New(_ context.Context, set Settings, cfg Config) (*Telemetry, error) {
func New(_ context.Context, set Settings, cfg Config, tracerProvider trace.TracerProvider) (*Telemetry, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an effort upstream to instrument the collector, including for tracing. I'm quite confident the solution upstream is going to be quite different than what is here, but again, @mar4uk has a better context when it comes to this effort.

Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I don't want to block this, and we anticipate it to be temporary anyway (especially since Flow doesn't need it). Let's get it merged for now; we can always revisit in future updates if there's a different approach we could take if we're having a hard time maintaining this.

@ptodev ptodev merged commit 1841af9 into grafana:0.79-grafana Jun 30, 2023
wildum pushed a commit that referenced this pull request Sep 25, 2023
To resolve the govulncheck reports:
```
Vulnerability #1: GO-2023-1987
    Large RSA keys can cause high CPU usage in crypto/tls
  More info: https://pkg.go.dev/vuln/GO-2023-1987
  Standard library
    Found in: crypto/[email protected]
    Fixed in: crypto/[email protected]
    Example traces found:
Error:       #1: service/internal/proctelemetry/config.go:299:27: proctelemetry.initOTLPgRPCExporter calls otlpmetricgrpc.New, which eventually calls tls.Conn.Handshake
Error:       #2: service/internal/proctelemetry/config.go:156:39: proctelemetry.InitPrometheusServer calls http.Server.ListenAndServe, which eventually calls tls.Conn.HandshakeContext
Error:       #3: service/service.go:251:36: service.buildResource calls uuid.NewRandom, which eventually calls tls.Conn.Read
Error:       #4: service/config.go:35:13: service.Config.Validate calls fmt.Printf, which eventually calls tls.Conn.Write
Error:       #5: service/telemetry/telemetry.go:32:28: telemetry.Telemetry.Shutdown calls trace.TracerProvider.Shutdown, which eventually calls tls.Dialer.DialContext
```


https://github.com/open-telemetry/opentelemetry-collector/actions/runs/5753675727/job/15597394973?pr=8144
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants