Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[component] Move component status reporting public API to new componentstatus module #10730

Merged
Merged
25 changes: 25 additions & 0 deletions .chloggen/componentstatus-new-module.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: new_component

# The name of the component, or a single word describing the area of concern, (e.g. otlpreceiver)
component: componentstatus

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Adds new componentstatus module that will soon replace status content in component.

# One or more tracking issues or pull requests related to the change
issues: [10730]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:

# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: [api]
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -259,6 +259,7 @@ check-contrib:
-replace go.opentelemetry.io/collector=$(CURDIR) \
-replace go.opentelemetry.io/collector/client=$(CURDIR)/client \
-replace go.opentelemetry.io/collector/component=$(CURDIR)/component \
-replace go.opentelemetry.io/collector/component/componentstatus=$(CURDIR)/component/componentstatus \
-replace go.opentelemetry.io/collector/config/configauth=$(CURDIR)/config/configauth \
-replace go.opentelemetry.io/collector/config/configcompression=$(CURDIR)/config/configcompression \
-replace go.opentelemetry.io/collector/config/configgrpc=$(CURDIR)/config/configgrpc \
Expand Down Expand Up @@ -321,6 +322,7 @@ restore-contrib:
-dropreplace go.opentelemetry.io/collector \
-dropreplace go.opentelemetry.io/collector/client \
-dropreplace go.opentelemetry.io/collector/component \
-dropreplace go.opentelemetry.io/collector/component/componentstatus \
-dropreplace go.opentelemetry.io/collector/config/configauth \
-dropreplace go.opentelemetry.io/collector/config/configcompression \
-dropreplace go.opentelemetry.io/collector/config/configgrpc \
Expand Down
1 change: 1 addition & 0 deletions component/componentstatus/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
include ../../Makefile.Common
34 changes: 34 additions & 0 deletions component/componentstatus/go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
module go.opentelemetry.io/collector/component/componentstatus

go 1.21.0

require (
github.com/stretchr/testify v1.9.0
go.opentelemetry.io/collector/component v0.106.1
)

require (
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
go.opentelemetry.io/collector/config/configtelemetry v0.106.1 // indirect
go.opentelemetry.io/collector/pdata v1.12.0 // indirect
go.opentelemetry.io/otel v1.28.0 // indirect
go.opentelemetry.io/otel/metric v1.28.0 // indirect
go.opentelemetry.io/otel/trace v1.28.0 // indirect
go.uber.org/multierr v1.11.0 // indirect
go.uber.org/zap v1.27.0 // indirect
golang.org/x/net v0.26.0 // indirect
golang.org/x/sys v0.21.0 // indirect
golang.org/x/text v0.16.0 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20240528184218-531527333157 // indirect
google.golang.org/grpc v1.65.0 // indirect
google.golang.org/protobuf v1.34.2 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
)

replace go.opentelemetry.io/collector/config/configtelemetry => ../../config/configtelemetry

replace go.opentelemetry.io/collector/component => ../

replace go.opentelemetry.io/collector/pdata => ../../pdata
78 changes: 78 additions & 0 deletions component/componentstatus/go.sum

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 16 additions & 0 deletions component/componentstatus/instance.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

package componentstatus // import "go.opentelemetry.io/collector/component/componentstatus"

import "go.opentelemetry.io/collector/component"

// InstanceID uniquely identifies a component instance
//
// TODO: consider moving this struct to a new package/module like `extension/statuswatcher`
// https://github.com/open-telemetry/opentelemetry-collector/issues/10764
type InstanceID struct {
ID component.ID
Kind component.Kind
PipelineIDs map[component.ID]struct{}
}
139 changes: 139 additions & 0 deletions component/componentstatus/status.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

// Package componentstatus is an experimental module that defines how components should
// report health statues, how collector hosts should facilitate component status reporting,
// and how extensions should watch for new component statuses.
//
// This package is currently under development and is exempt from the Collector SIG's
// breaking change policy.
package componentstatus // import "go.opentelemetry.io/collector/component/componentstatus"

import (
"time"
)

// Watcher is an extra interface for Extension hosted by the OpenTelemetry
// Collector that is to be implemented by extensions interested in changes to component
// status.
//
// TODO: consider moving this interface to a new package/module like `extension/statuswatcher`
// https://github.com/open-telemetry/opentelemetry-collector/issues/10764
type Watcher interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this + the InstanceId to the "extension/statuswatcher` or something like that to be consistent with storage and other type of extensions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're probably correct, but I'd prefer to encompass all public component status reporting features behind componentstatus for now while it is experimental and subject to lots of breaking API discussions. Once we feel better about its API we could move it into extension/storage as a final location. Likely other aspects of componentstatus will move around in the future as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not suggesting to extension/storage but to extension/statuswatcher :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about why we do this then. Is this to just remove unstable code (that is ok)? But still you should at least document all these things as todos (file issues) and then we can do it in a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing unstable code from component and extension is the goal. Issue created: #10764

// ComponentStatusChanged notifies about a change in the source component status.
// Extensions that implement this interface must be ready that the ComponentStatusChanged
// may be called before, after or concurrently with calls to Component.Start() and Component.Shutdown().
// The function may be called concurrently with itself.
ComponentStatusChanged(source *InstanceID, event *Event)
}

type Status int32

// Enumeration of possible component statuses
const (
// StatusNone indicates absence of component status.
StatusNone Status = iota
// StatusStarting indicates the component is starting.
StatusStarting
// StatusOK indicates the component is running without issues.
StatusOK
// StatusRecoverableError indicates that the component has experienced a transient error and may recover.
StatusRecoverableError
// StatusPermanentError indicates that the component has detected a condition at runtime that will need human intervention to fix. The collector will continue to run in a degraded mode.
StatusPermanentError
// StatusFatalError indicates that the collector has experienced a fatal runtime error and will shut down.
StatusFatalError
// StatusStopping indicates that the component is in the process of shutting down.
StatusStopping
// StatusStopped indicates that the component has completed shutdown.
StatusStopped
)

// String returns a string representation of a Status
func (s Status) String() string {
switch s {
case StatusStarting:
return "StatusStarting"
case StatusOK:
return "StatusOK"
case StatusRecoverableError:
return "StatusRecoverableError"
case StatusPermanentError:
return "StatusPermanentError"
case StatusFatalError:
return "StatusFatalError"
case StatusStopping:
return "StatusStopping"
case StatusStopped:
return "StatusStopped"
}
return "StatusNone"

Check warning on line 70 in component/componentstatus/status.go

View check run for this annotation

Codecov / codecov/patch

component/componentstatus/status.go#L70

Added line #L70 was not covered by tests
}

// Event contains a status and timestamp, and can contain an error
type Event struct {
status Status
err error
// TODO: consider if a timestamp is necessary in the default Event struct or is needed only for the healthcheckv2 extension
// https://github.com/open-telemetry/opentelemetry-collector/issues/10763
timestamp time.Time
}

// Status returns the Status (enum) associated with the Event
func (ev *Event) Status() Status {
return ev.status
}

// Err returns the error associated with the Event.
func (ev *Event) Err() error {
return ev.err
}

// Timestamp returns the timestamp associated with the Event
func (ev *Event) Timestamp() time.Time {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timestamp seem to be use-less, because it is always "now", which means consumers (watchers) can also read if they need it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important for the healthcheck extension, which also needs to synthesize events. When synthesizing events the timestamp needs to be settable. We used to do this in core, but have removed it, at least temporarily. I was proposing we add an option to set the timestamp in a followup: #10730 (comment).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about, if this is needed for that extension, we can have StatusWithTime struct {Status, Time} in the healthcheck and not here. No point in having it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, until we discuss the next PR, I would rather remove this for the moment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep this PR a 1-to-1 move of stuff related to component status reporting into componentstatus. In my opinion we can have API discussions about componentstatus in future issues/PRs. I can update the godocs to state that everything in componentstatus is experimental and currently exempt from our deprecation process if you'd like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep this PR a 1-to-1 move of stuff related to component status reporting into componentstatus. In my opinion we can have API discussions about componentstatus in future issues/PRs. I can update the godocs to state that everything in componentstatus is experimental and currently exempt from our deprecation process if you'd like.

Then update the title to "move code" instead of adding a new module which to me means we can do a better job.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a timestamp will be useful for other status watchers and worth considering for the generated status event. Otherwise, if there are multiple watchers that care about the timestamp, each one will have to wrap the event in a custom struct and generate timestamps independently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a timestamp will be useful for other status watchers and worth considering for the generated status event. Otherwise, if there are multiple watchers that care about the timestamp, each one will have to wrap the event in a custom struct and generate timestamps independently.

When in doubt leave it out. I am not convinced yet, and until we have 2-3 use-cases we should not add it, since adding later is backwards compatible, but currently we pay the cost of reading the time as well as the maintenance cost.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followup issue created: #10763

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will still suggest the timestamp has value. I wrote the following on the issue:

I believe that we should keep the timestamp as the observed time is not the same time as the observation. Since status watchers are dispatched in sequence, any status watcher that does a non-trivial amount of work, or something that could block (like writing to an event to a channel) could delay the observed time for subsequent watchers. This would be especially problematic if the delay was variable between invocations. Currently the timestamp is used by the health check extension for time calculations.

return ev.timestamp
}

// NewEvent creates and returns a Event with the specified status and sets the timestamp
// time.Now(). To set an error on the event for an error status use one of the dedicated
// constructors (e.g. NewRecoverableErrorEvent, NewPermanentErrorEvent, NewFatalErrorEvent)
func NewEvent(status Status) *Event {
TylerHelmuth marked this conversation as resolved.
Show resolved Hide resolved
return &Event{
status: status,
timestamp: time.Now(),
}
}

// NewRecoverableErrorEvent wraps a transient error
// passed as argument as a Event with a status StatusRecoverableError
// and a timestamp set to time.Now().
func NewRecoverableErrorEvent(err error) *Event {
ev := NewEvent(StatusRecoverableError)
ev.err = err
return ev
}

// NewPermanentErrorEvent wraps an error requiring human intervention to fix
// passed as argument as a Event with a status StatusPermanentError
// and a timestamp set to time.Now().
func NewPermanentErrorEvent(err error) *Event {
ev := NewEvent(StatusPermanentError)
ev.err = err
return ev
}

// NewFatalErrorEvent wraps the fatal runtime error passed as argument as a Event
// with a status StatusFatalError and a timestamp set to time.Now().
func NewFatalErrorEvent(err error) *Event {
ev := NewEvent(StatusFatalError)
ev.err = err
return ev
}

// StatusIsError returns true for error statuses (e.g. StatusRecoverableError,
// StatusPermanentError, or StatusFatalError)
func StatusIsError(status Status) bool {
return status == StatusRecoverableError ||
status == StatusPermanentError ||
status == StatusFatalError
}
Loading
Loading