Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add per-metric stability guidelines #5487

Closed

Conversation

djaglowski
Copy link
Member

Resolves (contrib)#9963.

Possibly these should be defined in contrib or in a separate file. I'd appreciate feedback on the guidelines themselves as well as appropriate placement in the documentation.

@codecov
Copy link

codecov bot commented Jun 7, 2022

Codecov Report

Merging #5487 (7f39f31) into main (0bc2777) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main    #5487   +/-   ##
=======================================
  Coverage   90.79%   90.79%           
=======================================
  Files         193      193           
  Lines       11430    11430           
=======================================
  Hits        10378    10378           
  Misses        830      830           
  Partials      222      222           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0bc2777...7f39f31. Read the comment docs.

README.md Outdated

The component is planned to be removed in a future version and no further support will be provided. Note that new issues will likely not be worked on. When a component enters "deprecated" mode, it is expected to exist for at least two minor releases. See the component's readme file for more details on when a component will cease to exist.

### Metric stability levels

Receivers that produce metrics should declare a stability level for each metric. A for each metric may not exceed the The status for each metric should be listed in the receiver's `documentation.md` file according to the following definitions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A for each metric something sounds off here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

### Stability level progression

Comonents and metric stability levels should progress according to the following guidelines:
- A component that produces metrics should produce at least one metric of the same stability level. For example, a Stable component should produce at least one Stable metric.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? If we did not have metrics in a stable component, and start adding them now? How do I get to stable directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention here is to address primarily how it "should" be. Then we would take steps to get there.

As to the reasoning here, this just seems necessary in order to give meaning to the component status. I would question whether a scraper can be stable if it produces no stable metrics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:))) Is this about own "observability" metrics, or about "scrapers" like components :))) I feel that I was just confusing everything

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mostly trying to address scrapers, but didn't use the term because I'm not sure it's a user-facing term. (I said receivers that produce metrics instead, but maybe this isn't specific enough.)

That said, I think all of this would apply to own telemetry as well, with the exception of this line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different than "receiver that produce logs" or "spans" should we actually call this "telemetry"?

I would probably have a different term for "own observability telemetry" vs "telemetry on behalf of others" :))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably have a different term for "own observability telemetry" vs "telemetry on behalf of others" :))

Makes sense. I think "own telemetry" would be clear and is already used elsewhere (eg OpAMP). Perhaps "scraped telemetry" then for the other term.

How is this different than "receiver that produce logs" or "spans" should we actually call this "telemetry"?

The way I see it is that metrics have named instances, each of which has its own detailed data model. Maybe I'm overlooking something, but I don't think logs and spans have an equivalent really.

In any case, I can try to organize this PR to leave room for future stability guidelines for other signals, but I only have detailed thoughts for metrics at this point. What do you think of organizing it like this:

## Metric stability levels

### "Own metrics" vs "Scraped metrics"

// TODO describe the difference and give a couple examples

The following levels and guidelines apply to "own metrics" and "scraped metrics", except where otherwise noted.

...

### Scraped metrics

Receivers whose sole purpose is to scrape metrics (ie. "scrapers") should produce at least one metric of the same stability level. For example, a Stable component should produce at least one Stable metric.

Then if someone can reuse some of these guidelines for logs and/or spans, they can extract them into a "telemetry" header.


The component is planned to be removed in a future version and no further support will be provided. Note that new issues will likely not be worked on. When a component enters "deprecated" mode, it is expected to exist for at least two minor releases. See the component's readme file for more details on when a component will cease to exist.

### Stability level progression
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you allowed to remove a deprecated metric, or disable to save cost (backend cost as well as runtime cost)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder to incorporate default enabled state as well. I will add considerations for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've proposed some guidelines for enabled status within each stability level.

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.


The component is planned to be removed in a future version and no further support will be provided. Note that new issues will likely not be worked on. When a component enters "deprecated" mode, it is expected to exist for at least two minor releases. See the component's readme file for more details on when a component will cease to exist.

### Metric stability levels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a "Since" property, stating the version of the collector the metric started being reported?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a great idea.

@djaglowski
Copy link
Member Author

A closely related issue was opened on the spec repo at roughly the same time as this proposal was drafted. I think that issue should be resolved before this moves forward.

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Aug 13, 2022
@github-actions
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Aug 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add per metric stability
3 participants