-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[k8scluster] switch k8s.node.condition to mdatagen #27617
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Will this work with the existing metric naming convention for the condition metrics? The current metric names have the condition in it. So for the condition type |
I see, I guess we can leave it as is, given that we don't have full list of node conditions. |
I'm going to close since it seems like we can't do this with our current functionality, but please let me know if I've misunderstood something here. |
I would like to continue considering what it would take to do this. Since it is a scraper receiver it would be nice if it can use mdatagen for everything, even if that means updating mdatagen. At the moment these metrics cannot be enabled/disabled via the metrics config and the docs are not auto-generating. Both are inconsistencies with the rest of the metrics and present a degraded user experience. |
@jinja2 I agree with everything you said. We'd have to follow a deprecation process, but since the condition value isn't really an enum I think handling it as a datapoint attribute is appropriate. |
Yeah, this sounds good to me too. To wrap everything up:
I think we should keep the same values as we have now: Regarding Unknown Kubernetes docs state:
It looks like these are the possible values K8S API returns -> https://github.com/kubernetes/api/blob/master/core/v1/types.go#L2705-L2716 Enabled by default, reports all conditions. If people don't want some condition they can write some ottl to remove the uneeded attributes or keep the ones they want.
If this config is set throw a warning that this is deprecated and tell users to use the new metric. |
FYI: Opened a PR with 1st part #27838 let me know what you think :) |
**Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> Add new k8s.node.condition metric, so that we can deprecate k8s.node.condition_* metrics. **Link to tracking Issue:** <Issue number if applicable> #27617 **Testing:** <Describe what testing was performed and which tests were added.> - added unit tests **Documentation:** <Describe the documentation added.> - added docs
) **Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> Add new k8s.node.condition metric, so that we can deprecate k8s.node.condition_* metrics. **Link to tracking Issue:** <Issue number if applicable> open-telemetry#27617 **Testing:** <Describe what testing was performed and which tests were added.> - added unit tests **Documentation:** <Describe the documentation added.> - added docs
Can we 🙏 please 🙏 stop perpetuating this anti-pattern of encoding states as metric values? See k8s.pod.phase for another (worse) instance of this. Maybe this works well for Splunk but I don't think it is idiomatic in general, and doesn't work well for other backends that I have used (Datadog, New Relic, etc.). Instead, the
As an example, if I want to get the count of |
I don't follow your comment. The actual suggestion and refactor of k8s.node.condition does this:
The plan is to remove ``node_conditions_to_report` config and the old metric. So what is the issue with this? |
This
For example, in Datadog with its
|
Let me ask the question a different way... Why not follow the same pattern as From the OpenMetrics spec:
|
I agree with this change ☝️ . My issue is with encoding the |
I see, I'm not against encoding status = true|false to 0 and 1 values, but we really need an otel way if defining enumerations and then implement it same way in all metrics. Reference issue: open-telemetry/opentelemetry-specification#1712 Also similar discussion and arguments against having one state per metric: #24425 (comment) |
Edit: we actually merged PR which encodes status =
So it's actually fine. |
I'm not following. Isn't this encoding what I was arguing against? |
@TylerHelmuth @jinja2 @crobert-1 - any thoughts on this? |
I disagree with those arguments
The payload doesn't have to be larger if you do not emit metrics with
Since there is no inherent semantic meaning to the numeric values, this seems like a pretty weak argument. For This seems like a pretty intuitive way to view changes in conditions over time.
|
Sorry, misread things now I think I understand what you are trying to say. I would suggest you move this discussion into actual issue into OTEL enums and statesets. |
One last example on why the current formulation is problematic. Many metrics backends will store pre-aggregated time-based rollups to reduce storage and increase performance for queries over longer time intervals. Consider these two scenarios of an hour's worth of Formulation 1 - encoding status as metric value:
Formulation 2 - encoding status as dimension:
With
At query time, most frontends will apply an
This gives the false impression that Contrast this with the suggested
With this encoding, the query on the rolled-up data with
|
Fair.... but I don't understand why you're assuming that the way it's encoded here is the current best practice and we need a committee decision / spec change to use the suggested alternative? Can you point to any other metrics in OTel Collector (other than My suggested alternative has precedent in |
Take a look at the hw.status metric in OTel Semantic Conventions which uses the value-as-attribute approach I'm suggesting. In particular, see this note
|
This metric is actually pretty close to the recommended formulation. I think if you just remove the |
The time based rollup example I gave is a bit trickier, but a simpler example is spatial aggregation. Let's say I want to get a count of nodes in my cluster having |
) **Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> Add new k8s.node.condition metric, so that we can deprecate k8s.node.condition_* metrics. **Link to tracking Issue:** <Issue number if applicable> open-telemetry#27617 **Testing:** <Describe what testing was performed and which tests were added.> - added unit tests **Documentation:** <Describe the documentation added.> - added docs
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been closed as inactive because it has been stale for 120 days with no activity. |
Component(s)
receiver/k8scluster
Describe the issue you're reporting
As noted in #27586 (comment) comment, we should switch k8s.node.condition_* metrics to mdatagen.
I think we would need to deprecate this config:
We could keep existing default, i.e k8s.node_condition_ready enabled, other conditions disabled.
The text was updated successfully, but these errors were encountered: