Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental Semantic Convention Attribute Registration #224

Closed
wants to merge 4 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 115 additions & 0 deletions text/0224-experimental-semantic-conventions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# Allowing experimental extensions to Semantic Conventions
lmolkova marked this conversation as resolved.
Show resolved Hide resolved

A process that allows experimental / non-stable semantic conventions to exist within stable versions and how they evolve into stable semantic conventions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: in the below comments I used the term "provisional arguments", but I think it would be better to pick a single term and use it consistently. Is there a semantic difference between provisional and experimental?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the newly limited scope of the proposal to an attribute dictionary should address this concern. Please resolve this conversation if you agree.


## Motivation

Today, semantic conventions are released as a large bundle of "all or nothing".
OpenTelemetry struggles with ensuring semantic conventions represent true "generic" need in observability in addition to providing a mechanism to "discover" or "experiment" with a standard.
This [Slack thread on #otel-specification](https://cloud-native.slack.com/archives/C01N7PP1THC/p1667997213067169?thread_ts=1667933561.313819&cid=C01N7PP1THC) highlights the problem in context of a specific semantic convention pull request.
What should OpenTelemetry do with semantic convention proposals when there's unclear evidence that a particular signal is necessary or ubiquitous?

## Explanation

This proposal generally follows the recommendations of [RFC6648 section 4 “Recommendations for Protocol Designers.”](https://www.rfc-editor.org/rfc/rfc6648#section-4)
RFC6648 recommends a registry of names be established which standardizes usage and prevents collisions.
We already have such a registry in the form of our semantic conventions.
RFC6648 further recommends that X- prefix (or similar structure) MUST NOT be defined as experimental or nonstandard, and that names missing the X- prefix MUST NOT be defined as stable or standard.

## Definitions

**Attribute**: For the purposes of this document, an attribute may refer to any component of any signal emitted by OpenTelemetry instrumentation.
This may include span or metric attribute keys and values, span or metric names, log or event attribute names and values, or any other value emitted by OTel instrumentation.

**Private Attribute**: An attribute is private if it is meant only for the exclusive private use of the instrumentation author, and is not expected to be emitted by any public library or understood by any public backend tools.

**Registered Attribute**: A registered attribute is any attribute which is in the provisional registry outlined in the proposal or the semantic conventions.

## Proposal

For the purposes of this document only attributes are discussed, but it should be understood that the TC may decide to make other telemetry components such as metric names, event names, or other components registerable.
dyladan marked this conversation as resolved.
Show resolved Hide resolved
This proposal SHOULD also apply to any such components at the discretion of the TC.

A procedure will be established by the technical committee (TC), or an entity designated by the TC, for registration of provisional semantic convention attributes.
Any non-private attributes which can be registered SHOULD be registered in the provisional registry or an official OpenTelemetry semantic convention.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am somewhat uncomfortable with the vagueness here. What, realistically, is going to be an implementation of a registry? Are we going to maintain an online database with an API? Are we going to bifurcate the "registration" process of the official and provisional attributes?

Why not just make a call right now and say that provisional attributes (which I think is a better name than "registered") are going to use the exact same mechanism of registration as the official attributes, i.e. the semantic conventions yaml file in OTEL repo. And we need to decide if it's going to be a separate repo or a separate directory. Or should this be even the same directory and just an extra label in the yam definition?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A related topic is that provisional attributes should also be subject to the OpenTelemetry Schemas versioning mechanism (which also suggests we should not bifurcate the registration process).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have the concept of "experimental" and "stable" semconv. It was my intention that if the TC wanted to, this OTEP could be interpreted to mean simply that the bar is drastically lowered for "experimental" semantic convention. In that interpretation there would be no difference between experimental and provisional.

I didn't want to specify that here, because I didn't know if the TC would prefer to keep the provisional registry separate as it may grow very large very quickly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Practically, there would need to be a YAML file of sorts, the same we have for semantic conventions.

I wanted to chime on this one, although I haven't had a chance to solidify my thoughts. Effectively, such a registry of "provisional" conventions is an inevitability. The question is whether we limit this to an exclusive group of "expert committees" like we're doing for HTTP, or if we want to allow more organic growth (like what @dyladan and I propose here).

I think currently, we're using expert committees for OTEL because of the following reasons:

  • There's a lot of high-value, high-impact conventions for (HTTP, Databases, Message-Passing, e.g.) observability.
  • We CAN draw an expert committee
  • The cost of failure is high.

Once we whittle down the big hitters, we have the issue of the "long-tail" of observability and our ability to attract enough experts and bandwidth. A process like this allows us to rapidly expand to help o11y users.

I guess I'm suggesting I think the need for this registry is inevitable, but it could also be "not now". I.e. we may want to first ensure we have a solid and rich core offering of conventions before we solve long-tail issues.

This registry does solve the issue of "How do OpenTelemetry instrumentation authors create instrumentation that's not part of the semantic conventions". However, this registry is also, in my view, the only possible path for semconv stability and also one that relies on having semconv be widely adopted, successful standard already, with meaningful starting point (like HTTP).

I.e. my current thinking is:

  • We should enable this registry, after we've solidified semantic conventions for a few key areas.
  • We probably need a different solution for "How do OpenTelemetry instrumentation authors create instrumentation that's not part of the semantic conventions"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the answer is "not now" we could easily simplify this to a policy change where we allow/encourage contrib and community authors to experiment with instrumentation even if it isn't covered by semantic conventions. The end result should hopefully be the same with the best solutions bubbling to the top and becoming the most popular. It would mean different instrumentations for the same module might not export data the same way, but maybe that sort of competition is healthy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the newly limited scope of the proposal to an attribute dictionary should address this concern. Please resolve this conversation if you agree.

The provisional registry MAY be separate from the semantic conventions, a status applied to some entries in the existing semantic conventions, or some other mechanism specified by the TC.

Instrumentation authors SHOULD assume that any attribute they create may become standardized, public, commonly deployed, or usable across multiple OpenTelemetry clients, distributions, or receivers.
Instrumentation authors SHOULD choose attribute keys that are descriptive, relevant, and that they have reason to believe are not currently in use or reserved by the TC for future use.
Before creating a new attribute, the attribute author SHOULD review the semantic conventions and provisional registry to see if an attribute already exists which fits the intended use case.
If such a registered attribute exists, then it SHOULD be preferred over the creation of new attributes.

An entry MUST NOT be refused without credible reason to believe that registration would be harmful.
Some reasonable grounds for refusal might be frivolous use of the registry, TC consensus that the attribute may be a security or privacy risk, conflict with another attribute or a name reserved by the TC for future use, or that the proposal is sufficiently lacking in purpose, or misleading about its purpose, that it can be held to be a waste of time and effort.
dyladan marked this conversation as resolved.
Show resolved Hide resolved
Note that a disagreement about technical details are not considered grounds to refuse listing in the provisional registry.

A provisional registry entry which has gained enough support, use, and market penetration SHOULD be promoted to the semantic conventions following existing procedures for review and inclusion.
dyladan marked this conversation as resolved.
Show resolved Hide resolved
When proposing a new semantic convention, existing provisional registry entries SHOULD be considered, and a semantic convention proposal SHOULD NOT break compatibility with an existing provisional registry entry without sufficient reasoning.
If a semantic convention is added which is not compatible with existing provisional registry entries, a new name SHOULD be created.
For example, if a provisional entry exists for http.headers, a semantic convention proposal which breaks compatibility might choose the name http.message_headers in order to avoid a conflict.

This document explicitly does NOT define a process for removal of a registered attribute from the provisional registry or semantic conventions.
Because absence of evidence does not constitute evidence of absence, there is no way to know if any registered attribute is in active use.
If a registered attribute is deemed by the TC to be harmful in some way, it SHOULD be marked deprecated, and sufficient reasoning for the deprecation should be included in the registry.

Consumers of telemetry SHOULD consider attributes and their values to be untrusted input.
Attribute values SHOULD be validated before assuming they conform to the format specified in the semantic conventions or the provisional registry.
Consumers SHOULD consider any security and privacy implications of displaying unhidden attribute values.

## Alternatives Considered

### "x." prefix

This alternative involves establishing an “x.” prefix convention for experimental conventions.
For example, "x.mydatabase.attribute" may be an experimental version of the eventual "mydatabase.attribute" attribute key.
See [X- convention for HTTP message headers](#x--http-header-convention) below for more details about why this alternative was discarded.

lmolkova marked this conversation as resolved.
Show resolved Hide resolved
## Prior Art

### HTTP Header field and media type registration procedures

[RFC6648](https://www.rfc-editor.org/rfc/rfc6648), which serves as the primary inspiration for this OTEP, recommends [RFC3864](https://www.rfc-editor.org/rfc/rfc3864) "Registration Procedures for Message Header Fields" and [RFC4288](https://www.rfc-editor.org/rfc/rfc4288) "Media Type Specifications and Registration Procedures" as positive examples of "simpler registration rules."

Much of this OTEP draws inspiration from [RFC6864 Section 4](https://www.rfc-editor.org/rfc/rfc3864#section-4), which is itself an implementation suggested RFC6648 section 4.

### X- HTTP header convention

* Deprecated in [RFC6648](https://datatracker.ietf.org/doc/html/rfc6648)

For many years, HTTP headers had a convention of using an X- prefix for headers which were either not yet standardized or not meant to ever be standardized.
The practice, which began as an informal suggestion, and has since permeated much of web practices and technologies, was officially deprecated in RFC6648.
A more full accounting of the history can be found in [RFC6648 Appenix A](https://www.rfc-editor.org/rfc/rfc6648#appendix-A), but a short version is included here for completeness.

In 1975, Brian Harvey suggested “an initial letter X to be used for really local idiosyncrasies [sic]” with regard to FTP parameters.
The convention was later adopted for several standard and nonstandard uses including email as user extension fields, SIP as P- prefix headers, iCalendar x- tokens, and HTTP X- headers.
The practice has since been deprecated in FTP fields, email, SIP, and HTTP.

The inclusion of X- prefixed headers, and similar constructs in other standards, introduced a series of problems.
The first, and most severe, issue is that X- prefixed names leaked into the set of standardized headers.
An example of this can be seen in the HTTP media type standard where x-gzip and x-compress are considered equivalent to gzip and compress respectively, or the X-Archived-At message header field which MAY be parsed but MUST NOT be generated.
An exhaustive list does not exist and is not provided here, but an engineer who has been familiar with web technologies for a sufficiently long time is likely to be familiar with the phenomenon which unnecessarily complicates specifications and standards.

The second problem is that including X- prefixes in a name encodes, either explicitly or implicitly, an understanding that the field in question is experimental or nonstandard in nature.
It implies a level of instability which may or may not be present.
Even a field which starts as an experiment or nonstandard may gain enough popularity or market use where it becomes a de facto standard.
When this happens, the understanding of the experimental nature of the field becomes a false assumption.
Subsequent efforts to formally standardize the field must take such use into account and often result in standards which either codify the X- version as the standard, or state that it is to be treated as equivalent, leading to the first issue again.

#### Examples

##### Email extension fields

* Implemented in [RFC822](https://www.rfc-editor.org/rfc/rfc822)
* Removed in [RFC2822](https://www.rfc-editor.org/rfc/rfc2822)

##### SIP P- headers

[RFC5727](https://www.rfc-editor.org/rfc/rfc5727)

##### iCalendar x-token

[RFC5545](https://www.rfc-editor.org/rfc/rfc5545)

## Open Questions

## Future Possibilities