Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rcvCommand should send update to Server on err #186

Open
srikanthccv opened this issue Jul 6, 2023 · 5 comments · May be fixed by #187
Open

rcvCommand should send update to Server on err #186

srikanthccv opened this issue Jul 6, 2023 · 5 comments · May be fixed by #187

Comments

@srikanthccv
Copy link
Member

The rcvCommand in receivedProcessor ignores the returned error from the callback OnCommand. When the error is non-nil it should be reported to Server with AgentHealth.healhy=false along with lastError (@tigrannajaryan, I think this is the right way to report the error?)

func (r *receivedProcessor) rcvCommand(command *protobufs.ServerToAgentCommand) {
if command != nil {
r.callbacks.OnCommand(command)
}
}

@srikanthccv srikanthccv linked a pull request Jul 6, 2023 that will close this issue
@tigrannajaryan
Copy link
Member

The spec does not say anything about what the behavior should be.

I am a bit reluctant about using AgentHealth to report the error, especially with healthy=false flag. The Agent may fail to restart (because for example the Supervisor does something wrong), but if the Agent continues running then setting healthy=false is misleading. Of course if the Agent actually fails to start after the attempted restart then we should report healthy=false, but that will happen independently from rcvCommand processing anyway.

Probably a compromise is to report AgentHealth but with healthy=true and last_error set to whatever the error was from the OnCommand callback.

I think it would be useful to try to implement restarting in our Supervisor example to get a better feel of how this machinery should work.

In the meantime I think we should keep the Command capability in "beta". I will add it to open-telemetry/opamp-spec#147

@srikanthccv
Copy link
Member Author

I think it would be useful to try to implement restarting in our Supervisor example to get a better feel of how this machinery should work.

I agree. That is partly what got me to create this issue. I was looking into this open-telemetry/opentelemetry-collector-contrib#21077 where the commander returns an error https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/afb5b9b93fe3e8be5bb396335f8d959f6ad415a8/cmd/opampsupervisor/supervisor/commander/commander.go#L84-L90 which I thought could occur when the agent actually fails to start, but I see your point about Supervisor.

@tigrannajaryan
Copy link
Member

I think it would be great if you can ahead and experiment a bit with this and then based on your feedback perhaps we need to refine the spec so that it tells what the expected behavior in response to commands is.

@srikanthccv
Copy link
Member Author

I think this equally applies to OpampConnectionSettings. Capabilities like config and packages have status fields to indicate the ErrorMessage. How is it expected that error from the new connection settings offer will be reported to the server? AgentHealth doesn't seem like a good way to report such errors.

@tigrannajaryan
Copy link
Member

I think this equally applies to OpampConnectionSettings. Capabilities like config and packages have status fields to indicate the ErrorMessage. How is it expected that error from the new connection settings offer will be reported to the server? AgentHealth doesn't seem like a good way to report such errors.

Good observation. It is a missing feature.

tigrannajaryan added a commit to tigrannajaryan/opamp-go that referenced this issue May 14, 2024
Updates OpAMP spec to 0.9.0 and implements spec change open-telemetry/opamp-spec#186

The example server still accepts old-style ULID agent instances to
demonstrate how this change can be handled in a backward compatible way.
Replace ULIDs by 16 byte ids and recommend UUID v7 (open-telemetry#186)
tigrannajaryan added a commit that referenced this issue May 27, 2024
Updates OpAMP spec to 0.9.0 and implements spec change open-telemetry/opamp-spec#186

The example server still accepts old-style ULID agent instances to
demonstrate how this change can be handled in a backward compatible way.
Replace ULIDs by 16 byte ids and recommend UUID v7 (#186)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants