-
Notifications
You must be signed in to change notification settings - Fork 890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardization of Units needed #705
Comments
@MrAlias in the spirit of limiting the 1.0 GA scope do you think this is necessary for GA or can be postponed? |
Will it be possible to change the units of the OTLP after GA? If so I think it would be reasonable to postpone. |
The current wording in the proto is pretty specific in what should be in the unit field. It is a string complying with UCUM standard. That's all it says. If you envision a change that contradicts this statement then we cannot change it after we GA. I am not quite sure what changes would be needed though. The current wording sounds good enough to my unsophisticated eye. Some of the nuanced topics that you touched, such as aggregation of unit seem to fit, e.g. I haven't put much thought into this, that's probably why I cannot foresee any possible breaking changes needed. That's also the reason I asked if you see reasons since I assume you know more about this topic than I do :-) |
It seems like we should make a decision on what unit standard we want to use prior to GA then. I think we should use the UCUM standard. The UCUM standard is designed with the goal to facilitate machine-to-machine communication. It uses a 7-bit US-ASCII character set exclusively for encoding, includes all units defined in ISO 1000, ISO 2955-1983, ANSI X3.50-1986, HL7 and ENV 12435, addresses the naming conflicts and ambiguities in those standards to resolve them, and through its use of annotations provides a way to deal with If we can agree to this standard for the specification the other details (e.g. how to encode Based on this agreement a PR to the metric API to capture this decision is all that is needed for the GA work. |
Works for me. @MrAlias if you or one of @open-telemetry/specs-metrics-approvers want to submit a PR or would like to have a closer look at it it would be great. Let's try to either make this happen quickly or postpone to after GA. |
I'm curious how the API/SDK would be expected to enforce this. For example, would this mean that the Java API would gain a dependency on an external unit library that defines all the possibilities? Or, would there be some subset that we would put into the API code as an enum? |
I think this enforcement is the part we would be punting on. I think the addition to the spec would be of the form "the API will accept units that can be represented in a format compliant with the UCUM standard".
Yeah, this sounds like something I can take on. Should be doable in the near term to get something in. |
@MrAlias can you clarify what is different between this statement ^^^ and "Follows the format |
I would not try to constrain the set of units to a large set or make them types. Having unit be a free string is fine and low-dependency, but I think OpenCensus struck the right balancing in both specifying UCUM strings and providing a few predefined values in the libraries. I think it would be nice to specify modestly larger default set, with common bytes and timing measurements spelled out (question 1). If we ever need to support more than UCUM in the future, I imagine we could prefix the unit name with the schema (question 2, question 4). My impression is that UCUM aims to make the translation from a single string into the prefix and base unit format unambiguous. This means we don't need to separate units and prefixes (question 3).
I don't think so. The UCUM spec talks about some non-units named like these (e.g., "total"), but the unit is "1" in these case. It uses the term "pseudo-unit". I feel there's still more for me to digest in UCUM, but it looks solid to me and I feel we can safely postpone pinning this down until post GA. Getting back to my first statement, "I would not try to constrain the set of units", I think the SDKs themselves should just pass a string provided by the user, who may have used one of the common predefined unit names or may have passed something more obscure. Hopefully I will not have to pass "ns" in Go, e.g., I'd like that to be predefined. 😀 |
Here are my thoughts on this:
IMO, I think this should be at the same level of "enforcement" as the rest of the semantic conventions. That is to say, no real enforcement, but a strong recommendation that if you do use the conventions, you could possibly get a better experience, depending on your metric backend and visualization tools. I also think that our semantic conventions should specify the units in which various semantically-defined metric instruments are to use. So, my recommendation would be one of two directions:
|
I don't think I was following your points in the SIG meeting today, John, but I think I agree with what you've just posted -- no real enforcement, but strong recommendations. And +1 to including units in our various semantic conventions. I'm in favor of option 2 here.
Regardless, I believe this semantic convention belongs in the spec, and it's important that it gets added before GA so we don't end up with bizarre base units that we have to support because of backward compatibility. |
👍 for APIs to accept any string with no real enforcement but with strong recommendations and documented conventions. I feel strongly that units of measure should be included in GA so that visualizations can be created that accurately represents the telemetry, even if this means that I need to loosely couple them out of band. I would be OK in accepting errata that any SDK, collector or exporter is unit unaware when performing aggregation that requires merging measurements in different units. Simply merge them as if they were the same unit. |
This could effect the value type (float or int) of the metric instruments, should we recommend using an int of the original precision? For example If a stopwatch measures nanoseconds, use an instrument with int value type and nano second units. |
It would be great to have units produced alongside with other semantic conventions as more users start producing metrics. |
UCUM is really too broad and doesn't specify unit names in a non-ambiguous way. How do I express that my histogram is in nanoseconds? Can be second or s, because there is nothing telling what table 2's print, c/s and c/i means. Oops, I have to rework ALL my code because I used long histograms in nanoseconds. Not having a clear convention for units will cause exporting from an histogram with times to Prometheus unreliable. Any query relying on time will have to be reworked to consider the values in an arbitrary unit depending on whatever the instrumented code used. |
heads up, I have sent a PR to mark the "Instrumentation Units" section of the general metric semantic conventions as stable I believe it will close this issue, please comment here if you think there are parts of this issue which are not closed by #3294 and I can unlink it so it won't close this issue |
One thing I don't understand about the current Instrument Units is which version of a unit's representation should be used? UCUM defines the following:
Which do we use? It looks like we a lowercase version of the c/i definition, but I don't believe we actually document that anywhere. |
Another issue was raised by @pirgeo's #3294 (comment)
|
It is the case sensitive code, so |
I'll open a PR to clarify. |
That is going to depend on the context. If you have a situation where there is already a standard, like networking where they use megabits per second, it would be a mistake to define our units using the base bits per second. Also, if there is an That said, I think where there is no clear precedent or logical requirements using the base units is advisable given parsing prefixes from units is not trivial when decoding a unit downstream. |
@MrAlias do you think we should add something clarifying in the spec about this? |
Makes sense to me. I can put something together on Monday. |
Looking across language implementation the unit field for the
metric instrument definition
is implemented differently.In some cases it is implemented as a definite type, restricting it to well defined values:
But, in most cases it is an unconstrained string:
I wasn't able to find a part of the specification that specifies a unit. Given the metric SDK is still a work in progress, I'm guessing that it would eventually at least be referenced there.
However, the proto does give some specification on what values a unit can have: https://github.com/open-telemetry/opentelemetry-proto/blob/b54688569186e0b862bf7462a983ccf2c50c0547/opentelemetry/proto/metrics/v1/metrics.proto#L117-L119
This uses the Unified Code For Units of Measure.
Questions
count
,total
, orsum
be included in the unit specification?Additional context.
This is all motivated by a need for interoperability between instrumentation and exporters. An exporter needs to know how instrumentation will define its units so it can identify, display, and possibly convert values correctly.
For example, if instrumentation sends time data and the exporter wants all time data in milliseconds. If the unit of time is unspecified and instrumentation sends data as
microseconds
,micro-sec
,us
, andμs
the will have an insurmountable problem to solve doing this interpretation and conversion.Additionally, if instrumentation records in one unit (say mmhg) but the exporter exports in another (i.e. kPa) the end user will not have their data interpreted appropriately.
The text was updated successfully, but these errors were encountered: