Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agents should report supported components #198

Open
andykellr opened this issue Jul 9, 2024 · 23 comments
Open

Agents should report supported components #198

andykellr opened this issue Jul 9, 2024 · 23 comments

Comments

@andykellr
Copy link
Contributor

andykellr commented Jul 9, 2024

I'll describe this in the context of OpenTelemetry, but similar issues will exist for other agents.

OpenTelemetry distributions may have different components included during the build. Before sending a configuration to an agent, the management server will want to know what components are available. Sending a configuration with unsupported components will cause an error. It may be desirable to avoid errors and only allow configurations to be applied to agents that are able to use them.

This is similar to agent capabilities but will need more than a bitmask to support arbitrary component names and versions.

This is similar to what is available in component health, but components not in use (but still supported) would not be reported.

This is similar to package statuses but components are not necessarily packages.

This could be reported with remote configuration but doesn't feel right.

@andykellr
Copy link
Contributor Author

Related issue in the collector: open-telemetry/opentelemetry-collector#10570

@andykellr
Copy link
Contributor Author

andykellr commented Jul 9, 2024

We discussed in the SIG meeting that a generic map<string, string> metadata could be added to the AgentDescription message and the structure documented for OpenTelemetry-based agents.

We also discussed making a separate top level AgentMetadata message with a separate agent capability ReportsMetadata.

@BinaryFissionGames
Copy link

BinaryFissionGames commented Jul 9, 2024

I think logically, you'd consider this part of the AgentDescription. I'd personally advocate for adding it there if possible.

I think the primary advantage we discussed for the separate message was that it did not need to be sent if the OpAMP server does not use it. I'm not sure if that's a huge advantage or not though, given the AgentDescription is generally static over the lifetime of an agent and status compression allows it to be omitted most of the time.

@BinaryFissionGames
Copy link

Regarding the structure of the field, how would you propose representing the components for the collector in a map[string]string?

Additionally, I feel like arbitrary metadata describing agent is already covered in the identifying/non-identifying attributes, how does this differ?

@BinaryFissionGames
Copy link

We discussed this again today at the SIG.

We primarily discussed adding this information to the ComponentHealth message, or to the AgentDescription.

@BinaryFissionGames
Copy link

BinaryFissionGames commented Aug 14, 2024

Here's what I'm thinking in terms of concrete details:

We could imagine doing something like this:

message AgentDescription {
	// ... snip ... //

	// The list of components that are available for this agent to configure. 
	repeated AvailableComponent available_components = 4
}

message AvailableComponent {
	// The ID of the component. This ID MUST be unique within the list of
	// available components.
	string id = 1
	
	// Extra key/value pairs that may be used to describe the component.
	// The key/value pairs are according to semantic conventions, see:
	// https://opentelemetry.io/docs/specs/semconv/
	repeated KeyValue metadata = 2


}

The available components is pretty free-form; You are only required to specify an ID for each component. In the case of the collector, ID would map to the type string + class of the component (e.g. receiver/hostmetrics). The type + class of a component are unique within the collector.

For extra metadata, we'd report the following:

  1. component module path
  2. component version

We could report these either as two separate fields, or as one field:

two fields

component.module - "github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver"
component.version - "v0.107.0"

one field

component.module - "github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver v0.107.0"
or maybe
code.namespace - "github.com/open-telemetry/opentelemetry-collector-contrib/receiver/[email protected]"?
see code semconv, not sure if it fits.

Additionally, the component type string and class can be derived from the ID, but you could consider also reporting these in separate fields in the metadata so we don't have to rely on IDs being formatted a specific way.


I think we were debating putting this on ComponentHealth; I think the two are a little different. ComponentHealth is more about instances of components, as opposed to the types of components.

In addition, ComponentHealth would be sent pretty frequently, which could end up bloating network traffic. I'd expect AgentDescription to be sent very infrequently (essentially only at agent startup or when agent identity changes), which would help keep the amount of network data sent down.

@BinaryFissionGames
Copy link

@tigrannajaryan @andykellr Seeing any immediate issues with the above? If not, I can write it up and put it in a PR.

@andykellr
Copy link
Contributor Author

andykellr commented Aug 15, 2024

I think the general structure of the message and making it a part of AgentDescription makes sense. I agree that it could be very verbose and unlikely to change and therefore should not be part of ComponentHealth.

An alternative would be using a map where the id field is removed from the AvailableComponent message and used as a key in a map. This would follow the pattern of ComponentHealth, PackagesAvailable, etc.

message AgentDescription {
	// ... snip ... //

	// The list of components that are available for this agent to configure. 
	map<string, AvailableComponent> available_components = 4
}

message AvailableComponent {
	// Key/value pairs that may be used to describe the component.
	// The key/value pairs are according to semantic conventions, see:
	// https://opentelemetry.io/docs/specs/semconv/
	repeated KeyValue metadata = 1
}

I am less certain on the best semantic conventions to follow for the module name and version fields, but that is specific to the implementation in the OpAMP extension. It would be useful to show an example using go in the spec, but in theory this could be any agent in any language reporting its available components.

@andykellr
Copy link
Contributor Author

Since name and version are likely fields of the AvailableComponent message, they could be pulled out of the metadata to provide some more structure.

message AvailableComponent {
	string name = 1
	string version = 2

	// Key/value pairs that may be used to describe the component.
	// The key/value pairs are according to semantic conventions, see:
	// https://opentelemetry.io/docs/specs/semconv/
	repeated KeyValue metadata = 3
}

@tigrannajaryan
Copy link
Member

I agree it belongs to AgentDescription. It would be nice if we could somehow make it uniform with ComponentHealth unless we think type and name of the component are fundamentally different and warrant a different structure (e.g. list instead of tree).

What would be some other agent examples (besides Otel Collector) where there is a concept of components and which could use this new data structure? Is there a type concept as well?

@BinaryFissionGames
Copy link

Mirroring ComponentHealth definitely makes sense, I think that would end up something like this:

message AgentDescription {
	// ... snip ... //

	ComponentDetails component_details = 4;
}

message ComponentDetails {
        // A map of component ID to sub components details. It can nest as deeply as needed to
        // describe the underlying system.
        map<string, ComponentDetails> sub_component_map = 1;

	// Extra key/value pairs that may be used to describe the component.
	// The key/value pairs are according to semantic conventions, see:
	// https://opentelemetry.io/docs/specs/semconv/
	repeated KeyValue metadata = 2;
}

This would be the closes to mirroring ComponentHealth; One thing to note here is that there is a single "root" component.

On type and name:

I think there are a lot of "pipeline"-based agents out there that have this concept of type, e.g.:

So I think it could make sense to have some concept of type.

One thing I'm not clear on is type VS name VS the map key; What should each of these be and how do they differ?

You would already think type is unique, so wouldn't that just be the map key?
For name, I'm having trouble conceptualizing exactly what the semantics of it are and how it differs from the type.

I'll write this up into a PR, I think it'll be easier to comment on in that format.

@andykellr
Copy link
Contributor Author

Thinking about this some more, I'm leaning toward making this a separate message with a separate flag for ReportComponents or something similar.

The issue is that this will likely be a large message and if you deploy 1,000 agents of the same type and version, they will likely all have the same components. I think it should be up to the server to decide that it needs the agent to report this information instead of assuming that it should always be sent as part of AgentDescription.

@tigrannajaryan
Copy link
Member

The issue is that this will likely be a large message and if you deploy 1,000 agents of the same type and version, they will likely all have the same components. I think it should be up to the server to decide that it needs the agent to report this information instead of assuming that it should always be sent as part of AgentDescription.

This is a good point.

We can add ReportComponents flag to ServerToAgentFlags with semantics similar to ReportFullState.

@BinaryFissionGames
Copy link

This makes sense to me. I think we'd use the same structure that exists in the current PR right now, just move it out of AgentDetails to the top level:

message AgentToServer {
  // -- snip --
  map<string, ComponentDetails> available_components = 14;
}

We would expand ServerToAgentFlags to include a ReportComponents flag:

enum ServerToAgentFlags {
    ServerToAgentFlags_ReportComponents = 0x00000002;
}

In this case, the components would be requested separately from other state. That is, ReportFullState does NOT imply ReportComponents, but rather the two are entirely separate and may be specified with each-other or alone.

The assumption would be that if service.name and service.version (+ optional service.namespace) are the same across two agents, these agents contain the same components.

From the above, I'm wondering:

  1. Should the available_components field in the AgentToServer be wrapped in another message, or is it OK to have it as a map?
  2. Is the assumption that service.name + service.version (+ optional service.namespace) indicates a single list of components reasonable? To me, it seems reasonable to assume that the service name + version indicates a particular build of the code, but we'll need to make it clear that this is the assumption if we want to be able to take advantage of not requesting the components for every agent.

@rbg
Copy link

rbg commented Sep 17, 2024

would adding a hash for the entire message be of use? - Meaning the server could compare and only take whatever action it desired should the hash be different than the previous reporting.

@BinaryFissionGames
Copy link

would adding a hash for the entire message be of use? - Meaning the server could compare and only take whatever action it desired should the hash be different than the previous reporting.

Including a hash could help mitigate cases where the service name + version are the same, but there are actually different components there, which could definitely be useful.

If we were to do that, I would definitely want another message wrapping the component map:

message AvailableComponents {
   map<string, ComponentDetails> available_components = 1;
   bytes hash = 2;
}

In this scheme, the hash would be reported when the ReportFullState flag is sent, but the available_components would only be populated if ReportComponents is specified.

If we feel that the name + version assumption is not solid enough to rely on, I think this doesn't that much extra complexity to it for some extra safety.

@andykellr
Copy link
Contributor Author

I like the separate AvailableComponents message. It allows us to add additional fields in the future as needed.

I also like reporting the hash, but I think the spec should be clear that we expect the message with the hash on every request and the message with the full component list when the flag is set. I would propose changing the flag to ReportAvailableComponents to keep it in sync with the message.

@BinaryFissionGames
Copy link

@tigrannajaryan Does this make sense to you? I'll update the PR if you're good with it.

@tigrannajaryan
Copy link
Member

What's the typical expected agent<->server interaction here? Is it going to be this:

  1. Agent connects and sends AgentDescription. available_components is unspecified.
  2. Server replies with ReportAvailableComponents.
  3. Agent sends available_components.
  4. Server replies with RemoteConfig.
  5. Agent replies with EffectiveConfig.

I am a bit worried about the number of roundtrips, especially for http transport, where it does not necessarily use the same connection.

Should we make it a client choice to include available_components in the first AgentToServer if the client wishes so? That way we eliminate one roundtrip. Agents which know they have a very large list of components may choose to omit it in the first message.

It would be also useful to quantify this. How many / how large is the available_components list for Otel Collector contrib?

@BinaryFissionGames
Copy link

I imagine the interaction to be like this:

  1. Agent connects and sends AgentDescription. available_components is specified with only the hash present.
  2. If the hash mismatches what is known for the agent type/version, the server replies with ReportAvailableComponents.
  3. Agent sends available_components.
  4. Server replies with RemoteConfig.
  5. Agent replies with EffectiveConfig.

So in reality the extra roundtrips would only be on initial connection for a given service.name + service.version + hash. In reality, I imagine there being a small set of unique combinations at any given time.

I do think it would be fine for the agent to decide to send the available components before the server asks for them.

As for quantification, in contrib, there are:
28 extensions
47 exporters
24 processors
91 receivers
10 connectors

Which makes a total of 200 separate components. I'm not sure where the cutoff of what would be considered "large" would be, but this seems on the large side of things for sure.

@rbg
Copy link

rbg commented Sep 25, 2024

  • In addition the server is not required to post a ServerToAgent with ReportAvailableComponents

@BinaryFissionGames
Copy link

@tigrannajaryan If you don't have any objections with the above, I'll get started on updating the PR to capture what was discussed here.

@tigrannajaryan
Copy link
Member

@tigrannajaryan If you don't have any objections with the above, I'll get started on updating the PR to capture what was discussed here.

SGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants