-
Notifications
You must be signed in to change notification settings - Fork 890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mistake while applying the logic in #2617 from #2589: process.network.io and system.network.* should not have been split #2726
Comments
/cc @open-telemetry/technical-committee I think this should block the spec release until we clarify this. |
@bogdandrutu just to clarify, are you saying #2617 should be reverted? That PR split |
In the discussion surrounding this issue, we debated the usefulness and meaningfulness of either having or not having a Migration into OTel from all of the old metrics technologies, where precedent favors separate metrics, was looking difficult. We concluded that the meaningfulness was outweighed by a strong precedent and low utility. If the question boils down to "how often do you find yourself plotting the sum of directions?", the answer was "not often". If the question boils down to "how much value is there in binding the two directions together in one metric?" the answer was "not much". Since the PR in question merged, I found another potential reason to oppose the "change of direction". If we're going to review the arguments, here's another one. In OTEP 176 Columnar Encoding for OTLP specifically this section, @lquerel observes that in a multi-variate metrics system, it makes a lot of sense to maintain the connection that is being lost in #2617. In the example used in the OTEP, it's the
Would we make the same statement about network bytes? IMO the case is not as strong, but I'm speaking as a person who has never experienced networking hardware become a bottleneck. It's true that you could identify machines with overloaded NICs by examining the sum of the two metrics. To address this in the case of The point being made by the OTEP is that when converting from uni-variate to multi-variate metrics, these attributes are distributed across columns of a single row and instead of contributing to multiple rows. This leaves me undecided because we could encode the same relationship between univariate metrics in several ways. Even if we accept the point made in the OTEP, a naming rule or other metadata about the instruments could inform the system about these relationships. For example, we don't necessarily need to encode this information in the attributes; if we have metrics named |
@tigrannajaryan I think based on the discussion in the issue, the @jmacd I think there are lots of reasons to split or not. For me we have a rule and the rule says that if the total is meaningful we should do this way, and for the network it is. Also grouping them as a metric helps with things like "load balancing/sharding", so you get all your points in the same shard, and you require a more complicated sharding scheme in this case for example. And other places where this grouping is lost for no good reasons. An extra argument is that, points grouped in one metric can be much easier split into different metrics than vice-versa, and we should not emit them separately, because will become almost impossible to re-group them, especially in a distributed system. |
@reyang you were able to find for disk I/O metrics that there is not only "read" and "write" data points but also "other" data point. Do you know if something similar exists for network I/O metrics? |
We need to have the same logic for disk and network. Sometimes network and storage metrics blend, like for iSCSI protocol, FC adapters, etc. Also, we can't go back and forth with these specifications. Personal anecdote: my company started developing a hardware monitoring specialized Otel collector, first using the There are pros and cons for both options, but we have to stick to one. |
@open-telemetry/specs-approvers This needs to be discussed in the Spec SIG. The PR in question that may need to be reverted is #2617 |
@codeboten FYI. |
Discussed in Spec SIG. @reyang will try to find an example of NIC metric where there is also another direction, other than "read" or "write". |
I've put my findings here #2589 (comment). |
@reyang @tigrannajaryan one thing that definitely this mistake open a huge cane of warms, see #2706 :) Should that be an attribute or not? When do we apply the "exception". I am very confused, and we are going to do very random decisions based on TC member personal's preference that approves the PR. Because of that I continue to believe that it is a mistake and should revert, maybe including disk, until we have clear rules that I can follow when I review a PR like that. It does not seem that we are doing progress, and we are only blocking the release, maybe it is not that bad to rollback and re-discuss that change and make according specs changes to the rules about when to use an attribute or a different metric. |
I believe semantic convention is indeed subjective (e.g. should it be
I consider unblocking the spec release as a high-order bit, so if we're blocked here, I would give +1 to unblock it. |
I agree that reverting is the best right now. We should revert and make a release. |
I agree that given the current state the best is to revert, and figure out a clear recommendation on this front. That being said, you seem to imply approvers do not review the actual content, including TC members, so please be more vocal about it, so we avoid misunderstandings. |
I think this discussion raises a fundamental issue that is not yet effectively addressed by the existing OTEL protocol, namely a native support for multivariate time-series. This is one of the improvements proposed by OTEP 156 Columnar Encoding for OTLP (not 176). The attributes "state" and "direction" are artificial ways to encode relationships between metrics that are part of the same multivariate time-series. Unfortunately this encoding, in terms of volume of data generated, is not efficient. The attributes are duplicated for all the metrics participating in the same relationship. Moving the "direction" attribute in a declaration within a schema will not remove this duplication of attributes. In my opinion, we are surrounded by multivariate time-series (see examples in the OTEP). It is quite common to have a metric whose behavior is influenced by a collection of other metrics (and vice versa) or that its interpretation cannot be done without the knowledge of a collection of other metrics (and vice versa). We collect these metrics in order to interpret and analyze them, so it seems to me that it is useful not to break these relationships between the metrics because otherwise it means that the transport protocol makes these tasks more complex. My answer to this problem is OTEP 156 which proposes to represent all OTLP entities in columnar form and to extend the metric entities to natively support multivariate time series. This is possible without breaking the existing ecosystem and a phasing is proposed to 1) optimize the bandwidth, 2) optimize the processing at the collector and backends level. If you think this proposal will be approved, then I think an important thing to consider in this discussion is to choose the option that will allow to automate the reconstruction of multivariate time series in the future. I don't have a strong opinion on the "direction" attribute or the use of a statement within a schema. I think that in both cases, this information can be used to perform this reconstruction. |
My English was bad, I meant to say based on a situation when some TC members that are looking at that specific PR have an opinion, and others who have a different opinion don't look at that specific PR (a.k.a a random situation). |
@lquerel I don't agree with the statement that format influenced any of the split decision. The decision to split was mostly because of "total being or not being useful" or because of "ability to search/look for that", so I am confused how that OTEP will if ever influence a decision like this. I do agree that the current OTLP may have some downside in terms of wire size, but I am not sure that matters in this discussion, since OTel semantics are independent of the wire format. PS: @jmacd you also mentioned that OTEP, still not able to see the relevance. |
@bogdandrutu I do agree with you that OTEL semantics in general have nothing to do with this OTEP. However, IMO the “state” and “direction” attributes are a bit special as they encode some kind of relationship between several metrics. It is only for this specific context that I believe there is a connection with the concept of multivariate time-series if we agree on the following informal definition. A multivariate time-series is a collection of interdependent metrics measured at the same time and sharing the same context (i.e. attributes in the OTEL context). So, if the underlying format natively supports multivariate time-series then I don’t think these two specific attributes would exist. For example in OTEP 156 the metric system.cpu.time is transformed as follows: Existing OTLP representation (partial representation focusing on the metric definition):
Proposed representation:
I may be completely off base, as I haven't followed OTEL semantics topic too much but I wanted to bring a slightly different perspective. |
+1 This is why I changed my opinion, splitting metrics when they ought to be part of a single row in a multi-variate representation moves us in the wrong direction. |
…ry#2617)" Contributes to open-telemetry#2726 This reverts open-telemetry#2617 We are reverting open-telemetry#2617 until we are certain how to resolve issue open-telemetry#2726
…ry#2617)" Contributes to open-telemetry#2726 This reverts open-telemetry#2617 We are reverting open-telemetry#2617 until we are certain how to resolve issue open-telemetry#2726
…ry#2617)" Contributes to open-telemetry#2726 This reverts open-telemetry#2617 We are reverting open-telemetry#2617 until we are certain how to resolve issue open-telemetry#2726 Also reverts the corresponding schema file changes done in open-telemetry#2688
…ry#2617)" Contributes to open-telemetry#2726 This reverts open-telemetry#2617 We are reverting open-telemetry#2617 until we are certain how to resolve issue open-telemetry#2726 Also reverts the corresponding schema file changes done in open-telemetry#2688
…ry#2617)" Contributes to open-telemetry#2726 This reverts open-telemetry#2617 We are reverting open-telemetry#2617 until we are certain how to resolve issue open-telemetry#2726 Also reverts the corresponding schema file changes done in open-telemetry#2688
…ection`" (#2748) Please review this carefully. It is not an automatic reverting, I had to fix merge conflicts manually and may have made mistakes, so a thorough review is needed. Changes: - This reverts #2617. We are reverting it until we are certain how to resolve issue #2726 - Also reverts the corresponding schema file changes done in #2688 - Also reverts #2675 ~Note that this does not revert #2675 which I believe to still be valid. If you think otherwise please speak.~ [UPDATE: discussed in Spec SIG and decided to revert 2675 too]. Contributes to #2726
… "add metrics to replace metrics with `direction`" (open-telemetry#2748) Please review this carefully. It is not an automatic reverting, I had to fix merge conflicts manually and may have made mistakes, so a thorough review is needed. Changes: - This reverts open-telemetry#2617. We are reverting it until we are certain how to resolve issue open-telemetry#2726 - Also reverts the corresponding schema file changes done in open-telemetry#2688 - Also reverts open-telemetry#2675 ~Note that this does not revert open-telemetry#2675 which I believe to still be valid. If you think otherwise please speak.~ [UPDATE: discussed in Spec SIG and decided to revert 2675 too]. Contributes to open-telemetry#2726
Is there any further discussion required for this issue? If so, can we close it w/ the resolution of keeping the direction and protocol attributes as they are in the spec today? This of course could be changed in the future, if someone comes along with a compelling enough case for splitting it out again, but I think for the short term, keeping the specification as is is preferable to going back and forth yet again. It's also worth mentioning that if a user would like to split metrics based on those attributes, they could define their own schemas to accomplish this. |
@codeboten we also reverted the splitting of the network metric, but I think that change was more justified. We reverted simply to begin from a clean slate. We can close this issue and re-open the network metric issue (I can't find it). |
@tigrannajaryan i couldn't find an issue for it, here's the PR. I can create a new issue if that's the correct way to move forward. |
Yes, please open an issue so that we can discuss and decide what we want to do about it. |
@tigrannajaryan open-telemetry/semantic-conventions#649 is now open. |
I'll take care of the hardware metrics |
I am closing this issue since the problem it uncovered is now fixed (in the sense that the offending PRs are reverted). Further discussion on the topic will happen on the new issues that are already created. |
…ection`" (#2748) Please review this carefully. It is not an automatic reverting, I had to fix merge conflicts manually and may have made mistakes, so a thorough review is needed. Changes: - This reverts open-telemetry/opentelemetry-specification#2617. We are reverting it until we are certain how to resolve issue #2726 - Also reverts the corresponding schema file changes done in open-telemetry/opentelemetry-specification#2688 - Also reverts open-telemetry/opentelemetry-specification#2675 ~Note that this does not revert open-telemetry/opentelemetry-specification#2675 which I believe to still be valid. If you think otherwise please speak.~ [UPDATE: discussed in Spec SIG and decided to revert 2675 too]. Contributes to open-telemetry/opentelemetry-specification#2726
…ection`" (#2748) Please review this carefully. It is not an automatic reverting, I had to fix merge conflicts manually and may have made mistakes, so a thorough review is needed. Changes: - This reverts open-telemetry/opentelemetry-specification#2617. We are reverting it until we are certain how to resolve issue #2726 - Also reverts the corresponding schema file changes done in open-telemetry/opentelemetry-specification#2688 - Also reverts open-telemetry/opentelemetry-specification#2675 ~Note that this does not revert open-telemetry/opentelemetry-specification#2675 which I believe to still be valid. If you think otherwise please speak.~ [UPDATE: discussed in Spec SIG and decided to revert 2675 too]. Contributes to open-telemetry/opentelemetry-specification#2726
…ection`" (#2748) Please review this carefully. It is not an automatic reverting, I had to fix merge conflicts manually and may have made mistakes, so a thorough review is needed. Changes: - This reverts open-telemetry/opentelemetry-specification#2617. We are reverting it until we are certain how to resolve issue #2726 - Also reverts the corresponding schema file changes done in open-telemetry/opentelemetry-specification#2688 - Also reverts open-telemetry/opentelemetry-specification#2675 ~Note that this does not revert open-telemetry/opentelemetry-specification#2675 which I believe to still be valid. If you think otherwise please speak.~ [UPDATE: discussed in Spec SIG and decided to revert 2675 too]. Contributes to open-telemetry/opentelemetry-specification#2726
…ection`" (#2748) Please review this carefully. It is not an automatic reverting, I had to fix merge conflicts manually and may have made mistakes, so a thorough review is needed. Changes: - This reverts open-telemetry/opentelemetry-specification#2617. We are reverting it until we are certain how to resolve issue #2726 - Also reverts the corresponding schema file changes done in open-telemetry/opentelemetry-specification#2688 - Also reverts open-telemetry/opentelemetry-specification#2675 ~Note that this does not revert open-telemetry/opentelemetry-specification#2675 which I believe to still be valid. If you think otherwise please speak.~ [UPDATE: discussed in Spec SIG and decided to revert 2675 too]. Contributes to open-telemetry/opentelemetry-specification#2726
The issue #2589 explains why the direction for
disk
is not the right thing, which makes total sense.But we got that blindly and apply for every "attribute" called "direction" without discussing if that makes sense or not, lucky I thing that is not yet released, and hopefully not implemented by users. As an example the logic in #2589 does not apply for
network
metrics, why did we deprecate that? It makes total sense to know the totalnetwork
bytes.PS: If you have an argument why I did not say anything, is because in June/July I was PTO, so ...
The text was updated successfully, but these errors were encountered: