Agents should report supported components #198

andykellr · 2024-07-09T18:03:55Z

I'll describe this in the context of OpenTelemetry, but similar issues will exist for other agents.

OpenTelemetry distributions may have different components included during the build. Before sending a configuration to an agent, the management server will want to know what components are available. Sending a configuration with unsupported components will cause an error. It may be desirable to avoid errors and only allow configurations to be applied to agents that are able to use them.

This is similar to agent capabilities but will need more than a bitmask to support arbitrary component names and versions.

This is similar to what is available in component health, but components not in use (but still supported) would not be reported.

This is similar to package statuses but components are not necessarily packages.

This could be reported with remote configuration but doesn't feel right.

andykellr · 2024-07-09T18:23:30Z

Related issue in the collector: open-telemetry/opentelemetry-collector#10570

andykellr · 2024-07-09T18:27:25Z

We discussed in the SIG meeting that a generic map<string, string> metadata could be added to the AgentDescription message and the structure documented for OpenTelemetry-based agents.

We also discussed making a separate top level AgentMetadata message with a separate agent capability ReportsMetadata.

BinaryFissionGames · 2024-07-09T18:58:36Z

I think logically, you'd consider this part of the AgentDescription. I'd personally advocate for adding it there if possible.

I think the primary advantage we discussed for the separate message was that it did not need to be sent if the OpAMP server does not use it. I'm not sure if that's a huge advantage or not though, given the AgentDescription is generally static over the lifetime of an agent and status compression allows it to be omitted most of the time.

BinaryFissionGames · 2024-07-09T19:08:25Z

Regarding the structure of the field, how would you propose representing the components for the collector in a map[string]string?

Additionally, I feel like arbitrary metadata describing agent is already covered in the identifying/non-identifying attributes, how does this differ?

BinaryFissionGames · 2024-07-23T18:37:33Z

We discussed this again today at the SIG.

We primarily discussed adding this information to the ComponentHealth message, or to the AgentDescription.

BinaryFissionGames · 2024-08-14T15:13:03Z

Here's what I'm thinking in terms of concrete details:

We could imagine doing something like this:

message AgentDescription {
	// ... snip ... //

	// The list of components that are available for this agent to configure. 
	repeated AvailableComponent available_components = 4
}

message AvailableComponent {
	// The ID of the component. This ID MUST be unique within the list of
	// available components.
	string id = 1
	
	// Extra key/value pairs that may be used to describe the component.
	// The key/value pairs are according to semantic conventions, see:
	// https://opentelemetry.io/docs/specs/semconv/
	repeated KeyValue metadata = 2


}

The available components is pretty free-form; You are only required to specify an ID for each component. In the case of the collector, ID would map to the type string + class of the component (e.g. receiver/hostmetrics). The type + class of a component are unique within the collector.

For extra metadata, we'd report the following:

component module path
component version

We could report these either as two separate fields, or as one field:

two fields

component.module - "github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver"
component.version - "v0.107.0"

one field

component.module - "github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver v0.107.0"
or maybe
code.namespace - "github.com/open-telemetry/opentelemetry-collector-contrib/receiver/[email protected]"?
see code semconv, not sure if it fits.

Additionally, the component type string and class can be derived from the ID, but you could consider also reporting these in separate fields in the metadata so we don't have to rely on IDs being formatted a specific way.

I think we were debating putting this on ComponentHealth; I think the two are a little different. ComponentHealth is more about instances of components, as opposed to the types of components.

In addition, ComponentHealth would be sent pretty frequently, which could end up bloating network traffic. I'd expect AgentDescription to be sent very infrequently (essentially only at agent startup or when agent identity changes), which would help keep the amount of network data sent down.

BinaryFissionGames · 2024-08-15T15:17:29Z

@tigrannajaryan @andykellr Seeing any immediate issues with the above? If not, I can write it up and put it in a PR.

andykellr · 2024-08-15T15:29:12Z

I think the general structure of the message and making it a part of AgentDescription makes sense. I agree that it could be very verbose and unlikely to change and therefore should not be part of ComponentHealth.

An alternative would be using a map where the id field is removed from the AvailableComponent message and used as a key in a map. This would follow the pattern of ComponentHealth, PackagesAvailable, etc.

message AgentDescription {
	// ... snip ... //

	// The list of components that are available for this agent to configure. 
	map<string, AvailableComponent> available_components = 4
}

message AvailableComponent {
	// Key/value pairs that may be used to describe the component.
	// The key/value pairs are according to semantic conventions, see:
	// https://opentelemetry.io/docs/specs/semconv/
	repeated KeyValue metadata = 1
}

I am less certain on the best semantic conventions to follow for the module name and version fields, but that is specific to the implementation in the OpAMP extension. It would be useful to show an example using go in the spec, but in theory this could be any agent in any language reporting its available components.

andykellr · 2024-08-15T15:34:45Z

Since name and version are likely fields of the AvailableComponent message, they could be pulled out of the metadata to provide some more structure.

message AvailableComponent {
	string name = 1
	string version = 2

	// Key/value pairs that may be used to describe the component.
	// The key/value pairs are according to semantic conventions, see:
	// https://opentelemetry.io/docs/specs/semconv/
	repeated KeyValue metadata = 3
}

tigrannajaryan · 2024-08-15T18:58:30Z

I agree it belongs to AgentDescription. It would be nice if we could somehow make it uniform with ComponentHealth unless we think type and name of the component are fundamentally different and warrant a different structure (e.g. list instead of tree).

What would be some other agent examples (besides Otel Collector) where there is a concept of components and which could use this new data structure? Is there a type concept as well?

BinaryFissionGames · 2024-08-20T16:41:02Z

Mirroring ComponentHealth definitely makes sense, I think that would end up something like this:

message AgentDescription {
	// ... snip ... //

	ComponentDetails component_details = 4;
}

message ComponentDetails {
        // A map of component ID to sub components details. It can nest as deeply as needed to
        // describe the underlying system.
        map<string, ComponentDetails> sub_component_map = 1;

	// Extra key/value pairs that may be used to describe the component.
	// The key/value pairs are according to semantic conventions, see:
	// https://opentelemetry.io/docs/specs/semconv/
	repeated KeyValue metadata = 2;
}

This would be the closes to mirroring ComponentHealth; One thing to note here is that there is a single "root" component.

On type and name:

I think there are a lot of "pipeline"-based agents out there that have this concept of type, e.g.:

Logstash for instance has a concept of filter plugins of varying types.
Fluent Bit is much the same, having various types of parsers/filters/inputs

So I think it could make sense to have some concept of type.

One thing I'm not clear on is type VS name VS the map key; What should each of these be and how do they differ?

You would already think type is unique, so wouldn't that just be the map key?
For name, I'm having trouble conceptualizing exactly what the semantics of it are and how it differs from the type.

I'll write this up into a PR, I think it'll be easier to comment on in that format.

andykellr · 2024-09-11T19:07:12Z

Thinking about this some more, I'm leaning toward making this a separate message with a separate flag for ReportComponents or something similar.

The issue is that this will likely be a large message and if you deploy 1,000 agents of the same type and version, they will likely all have the same components. I think it should be up to the server to decide that it needs the agent to report this information instead of assuming that it should always be sent as part of AgentDescription.

tigrannajaryan · 2024-09-12T13:59:45Z

The issue is that this will likely be a large message and if you deploy 1,000 agents of the same type and version, they will likely all have the same components. I think it should be up to the server to decide that it needs the agent to report this information instead of assuming that it should always be sent as part of AgentDescription.

This is a good point.

We can add ReportComponents flag to ServerToAgentFlags with semantics similar to ReportFullState.

BinaryFissionGames · 2024-09-17T18:25:43Z

This makes sense to me. I think we'd use the same structure that exists in the current PR right now, just move it out of AgentDetails to the top level:

message AgentToServer {
  // -- snip --
  map<string, ComponentDetails> available_components = 14;
}

We would expand ServerToAgentFlags to include a ReportComponents flag:

enum ServerToAgentFlags {
    ServerToAgentFlags_ReportComponents = 0x00000002;
}

In this case, the components would be requested separately from other state. That is, ReportFullState does NOT imply ReportComponents, but rather the two are entirely separate and may be specified with each-other or alone.

The assumption would be that if service.name and service.version (+ optional service.namespace) are the same across two agents, these agents contain the same components.

From the above, I'm wondering:

Should the available_components field in the AgentToServer be wrapped in another message, or is it OK to have it as a map?
Is the assumption that service.name + service.version (+ optional service.namespace) indicates a single list of components reasonable? To me, it seems reasonable to assume that the service name + version indicates a particular build of the code, but we'll need to make it clear that this is the assumption if we want to be able to take advantage of not requesting the components for every agent.

rbg · 2024-09-17T18:41:39Z

would adding a hash for the entire message be of use? - Meaning the server could compare and only take whatever action it desired should the hash be different than the previous reporting.

BinaryFissionGames · 2024-09-17T18:55:24Z

would adding a hash for the entire message be of use? - Meaning the server could compare and only take whatever action it desired should the hash be different than the previous reporting.

Including a hash could help mitigate cases where the service name + version are the same, but there are actually different components there, which could definitely be useful.

If we were to do that, I would definitely want another message wrapping the component map:

message AvailableComponents {
   map<string, ComponentDetails> available_components = 1;
   bytes hash = 2;
}

In this scheme, the hash would be reported when the ReportFullState flag is sent, but the available_components would only be populated if ReportComponents is specified.

If we feel that the name + version assumption is not solid enough to rely on, I think this doesn't that much extra complexity to it for some extra safety.

andykellr · 2024-09-24T15:34:26Z

I like the separate AvailableComponents message. It allows us to add additional fields in the future as needed.

I also like reporting the hash, but I think the spec should be clear that we expect the message with the hash on every request and the message with the full component list when the flag is set. I would propose changing the flag to ReportAvailableComponents to keep it in sync with the message.

BinaryFissionGames · 2024-09-25T13:25:26Z

@tigrannajaryan Does this make sense to you? I'll update the PR if you're good with it.

tigrannajaryan · 2024-09-25T14:52:26Z

What's the typical expected agent<->server interaction here? Is it going to be this:

Agent connects and sends AgentDescription. available_components is unspecified.
Server replies with ReportAvailableComponents.
Agent sends available_components.
Server replies with RemoteConfig.
Agent replies with EffectiveConfig.

I am a bit worried about the number of roundtrips, especially for http transport, where it does not necessarily use the same connection.

Should we make it a client choice to include available_components in the first AgentToServer if the client wishes so? That way we eliminate one roundtrip. Agents which know they have a very large list of components may choose to omit it in the first message.

It would be also useful to quantify this. How many / how large is the available_components list for Otel Collector contrib?

BinaryFissionGames · 2024-09-25T15:39:35Z

I imagine the interaction to be like this:

Agent connects and sends AgentDescription. available_components is specified with only the hash present.
If the hash mismatches what is known for the agent type/version, the server replies with ReportAvailableComponents.
Agent sends available_components.
Server replies with RemoteConfig.
Agent replies with EffectiveConfig.

So in reality the extra roundtrips would only be on initial connection for a given service.name + service.version + hash. In reality, I imagine there being a small set of unique combinations at any given time.

I do think it would be fine for the agent to decide to send the available components before the server asks for them.

As for quantification, in contrib, there are:
28 extensions
47 exporters
24 processors
91 receivers
10 connectors

Which makes a total of 200 separate components. I'm not sure where the cutoff of what would be considered "large" would be, but this seems on the large side of things for sure.

rbg · 2024-09-25T15:45:56Z

In addition the server is not required to post a ServerToAgent with ReportAvailableComponents

BinaryFissionGames · 2024-09-30T17:44:58Z

@tigrannajaryan If you don't have any objections with the above, I'll get started on updating the PR to capture what was discussed here.

tigrannajaryan · 2024-09-30T18:18:27Z

@tigrannajaryan If you don't have any objections with the above, I'll get started on updating the PR to capture what was discussed here.

SGTM

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agents should report supported components #198

Agents should report supported components #198

andykellr commented Jul 9, 2024 •

edited

Loading

andykellr commented Jul 9, 2024

andykellr commented Jul 9, 2024 •

edited

Loading

BinaryFissionGames commented Jul 9, 2024 •

edited

Loading

BinaryFissionGames commented Jul 9, 2024

BinaryFissionGames commented Jul 23, 2024

BinaryFissionGames commented Aug 14, 2024 •

edited

Loading

BinaryFissionGames commented Aug 15, 2024

andykellr commented Aug 15, 2024 •

edited

Loading

andykellr commented Aug 15, 2024

tigrannajaryan commented Aug 15, 2024

BinaryFissionGames commented Aug 20, 2024

andykellr commented Sep 11, 2024

tigrannajaryan commented Sep 12, 2024

BinaryFissionGames commented Sep 17, 2024

rbg commented Sep 17, 2024

BinaryFissionGames commented Sep 17, 2024

andykellr commented Sep 24, 2024

BinaryFissionGames commented Sep 25, 2024

tigrannajaryan commented Sep 25, 2024

BinaryFissionGames commented Sep 25, 2024

rbg commented Sep 25, 2024

BinaryFissionGames commented Sep 30, 2024

tigrannajaryan commented Sep 30, 2024

Agents should report supported components #198

Agents should report supported components #198

Comments

andykellr commented Jul 9, 2024 • edited Loading

andykellr commented Jul 9, 2024

andykellr commented Jul 9, 2024 • edited Loading

BinaryFissionGames commented Jul 9, 2024 • edited Loading

BinaryFissionGames commented Jul 9, 2024

BinaryFissionGames commented Jul 23, 2024

BinaryFissionGames commented Aug 14, 2024 • edited Loading

two fields

one field

BinaryFissionGames commented Aug 15, 2024

andykellr commented Aug 15, 2024 • edited Loading

andykellr commented Aug 15, 2024

tigrannajaryan commented Aug 15, 2024

BinaryFissionGames commented Aug 20, 2024

andykellr commented Sep 11, 2024

tigrannajaryan commented Sep 12, 2024

BinaryFissionGames commented Sep 17, 2024

rbg commented Sep 17, 2024

BinaryFissionGames commented Sep 17, 2024

andykellr commented Sep 24, 2024

BinaryFissionGames commented Sep 25, 2024

tigrannajaryan commented Sep 25, 2024

BinaryFissionGames commented Sep 25, 2024

rbg commented Sep 25, 2024

BinaryFissionGames commented Sep 30, 2024

tigrannajaryan commented Sep 30, 2024

andykellr commented Jul 9, 2024 •

edited

Loading

andykellr commented Jul 9, 2024 •

edited

Loading

BinaryFissionGames commented Jul 9, 2024 •

edited

Loading

BinaryFissionGames commented Aug 14, 2024 •

edited

Loading

andykellr commented Aug 15, 2024 •

edited

Loading