Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telemetry Schema and Resource #161

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
203 changes: 203 additions & 0 deletions text/0161-telemetry-schema-and-resource.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
# Telemetry Schema and Resource

Specify a mechanism for users of Resource to interact with Resource leveraging
TelemetrySchema version for a stable API.

## Motivation
jsuereth marked this conversation as resolved.
Show resolved Hide resolved

In the [current definition](https://github.com/open-telemetry/opentelemetry-specification/pull/1692)
of Resource + Telemetry schema interaction, the specification focuses mostly on
how to provide a consistent "storage" of Resource within the SDK. However,
from a usage standpoint, how do consumers gain stability across changes to
bogdandrutu marked this conversation as resolved.
Show resolved Hide resolved
schema?

- Telemetry Schema allows the notion of "renaming" attributes in a schema.
Can a consumer of Resource "lock" in a specific version of the schema to
ensure no renaming occurs?
- If resource detection is provided by libraries external to open-telemetry, do
we require ALL resource detectors to be against the same telemetry schema
version number? This would inhibit upgrading versions as you'd need to wait
for all resource detectors to upgrade, even if a (stable) conversion exists
from the resource version.

Specifically, we need a way for exporters that rely on Resource labels to ensure
various version differences across OpenTelemetry components do not cause
breakages by leveraging the powers provided by telemetry schema.

As an example, prior to schema version and stability, the PR
[#1495](https://github.com/open-telemetry/opentelemetry-specification/pull/1495)
renamed `cloud.zone` to `cloud.availability_zone`. In a Schema world, this
change could have been defined with the following schema rule:

```yaml
next.version:
resources:
changes:
- rename_attributes:
attribute_map:
cloud.zone: cloud.availability_zone
previous.version:
```

If an exporter is defined expecting to find `cloud.zone` on `previous_version`,
bogdandrutu marked this conversation as resolved.
Show resolved Hide resolved
then when the SDK is bumped to `next.version` it will no longer be able to see
`cloud.zone`.

Specifically, we assume the following components could have independent
versioning + release cycles:

```
+-------------------+ +-----+ +----------+
| Resource Detector | <-> | SDK | <-> | Exporter |
+-------------------+ +-----+ +----------+
```

## Explanation

We propose a new set of SDK features:

- A mechanism to "migrate" a resource up/down compatible schema changes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should provide only the "go up" version, and enforce everyone to use the most recent (latest) schema. The exporter in your case should be able to adopt (maybe) the rules to work on the Resource schema (if newer) vs downgrading the Resource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can start by implementing only go-up, but as I stated, there's an issue with mismatch between various sourced components. If we're not somewhat compatible in a forwards way, we could cause issues across the ecosystem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be comfortable with both up and down directions. The schema files unambiguously defines whether it is safe to go in a particular direction.

- A mechanism to determine if two schema URLs allow compatible changes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a long term issue or just a short term need, I had the impression that we should always do backwards compatible changes. So we can always go up, sometimes we may not be able to go down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an issue of how users consume exporters + resource detecters in the SDK. Both are pluggable components and we can expect resource detection to advance ahead of exporters, so the forward-compatible behavior is needed in that kind of distribution environment.


### Consuming resources at specific schema version

The first is done by providing a new method against `Resource` in the SDK that
will transform the attributes as necessary to match a given schema version.

e.g. In Java you might have the following for resource:

```java
public abstract class Resource {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we say that similar changes will be applied to other places where schemaurl is used, like spans/metrics/logs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, that would be the goal.

... existing methods ...
/** The current schema version. */
public String getSchemaUrl();
/**
* Returns a new instance of Resource with labels compatible with the
* given schema url.
*
* <p> Returns an invalid Resource if conversion is not possible.
bogdandrutu marked this conversation as resolved.
Show resolved Hide resolved
*/
public Resource convertToSchema(String schemaUrl);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This is an expected capability of schemas, so no surprises here.
I believe this should also be able to return an error since some conversions are not possible (e.g. between different schema families or if the versions are incompatible).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Resource resolution happens at application startup, I'm nervous about having SDKs have to go and fetch multiple schemas from the public internet as a part of the startup process, in order to perform the conversion. There are failure modes here, and general startup-time issues to be considered (especially, for example, in mobile or FAAS implementations).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Resource resolution happens at application startup, I'm nervous about having SDKs have to go and fetch multiple schemas from the public internet as a part of the startup process, in order to perform the conversion.

That's a good point. The SDKs can have the schema file that they correspond to embedded with the source code so that there is no need to fetch anything for detectors that are part of the SDK. If each SDK includes the latest Otel schema file then any Otel schema conversions can happen without network access.
Custom, non-Otel schema conversions will still require a network fetching. Perhaps we also provide an alternate convertToSchema function which accepts a schema file instead of schema URL?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that an alternate conversion method should be required to work from a file, but an alternate URL->bytes function could be provided. I'd imagine possibly a three tier lookup: 1) embedded in the application by the SDK, 2) on disk in a well-known or specified location 3) over the network through traditional URL resolution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per this thread conversation, it may be better to start with a solution that "file"/"bytes" for the schema are provided, and delegate to the user of the API to ensure that is available somewhere to grab. Some users may embed the files in their application, some may embed the schema bytes, etc.

@jkwatson had a very good point, and probably we should postpone the API that fetches from the internet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the notion of providing the local schema instead of fetching from the URL. We may go a step further and define Schema as an opaque object that is loadable from file or from a URL. This opens up the possibility of performance improvements (parse and compile the schema once such that the subsequent conversion calls are fast). See e.g. prototype here that has "parse/compile" and "convert" as separate operations.

/**
* Returns a new instance of Resource with labels compatible with the
* given schema file.
*
* <p> Returns an invalid Resource if conversion is not possible.
*/
public Resource convertToSchema(File schemaFile);
}
```

This would allow a consumer of resources to ensure a stable view of resource
labels, e.g.

```java
public static String MY_SCHEMA_VERSION="https://opentelemetry.io/schemas/1.1.0";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary related question: Is this the expected value for the proto fields or just the string number "1.1.0"?

/cc @tigrannajaryan

public void myMetricMethod(MetricData metric) {
useStableResource(metric.getResource().convertToSchema(MY_SCHEMA_VERSION));
}
```

### Understanding schema compatibility

The second will be done via a new (public) SDK method that allows understanding
whether or not two schemas are compatible, and which is the "newer" version.

e.g. in Java you would have:

```java
public abstract class Schemas {
public static boolean isCompatible(String oldUrl, String newUrl);
public static boolean isNewer(String testUrl, String baseUrl);
}
```

This could be used in the `Resource.merge` method to determine if resources
can be converted to a compatible schema. Specfically resource can be merged
IFF:

- The schemas are compatible
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question, is this something that we need to worry? Should we assume that converting to the "newest" is always possible?

- Converting all `Resource` instances to the "newest" schema version returns
valid `Resource` objects.
bogdandrutu marked this conversation as resolved.
Show resolved Hide resolved

## Internal details

There are three major changes to SDKs for this proposal.

### Understanding schema compatibility
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this section needed because currently the schema url does not respect semver for the version number? I though that was implicit since the schema url version number corresponds to the specs version which follows semver, probably my assumption is wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Schema URL could be for a different specification entirely. This OTEP proposed a level of compatibility we could not assume before.

This is needed so we CAN assume semver for the default otel schema.


Here we propose simply providing an implementation of
[SemVer](https://semver.org/) for isCompatible, and using the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused on this. If you had a schema rule like the one above:

  next.version:
    resources:
      changes:
        - rename_attributes:
            attribute_map:
              cloud.zone: cloud.availability_zone
  previous.version:

since there is a transform between versions, would the two schemas be compatible and have the same major version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me what the answer to this is, which is why I opened: #162

IF the answer to the quesiton is "yes they're compatible", then I think this OTEP resolves some major pain that kind of change would cause.

the [ordering rules](https://semver.org/#spec-item-11) for isNewer. This would
pull the version number out of the schema url.
bogdandrutu marked this conversation as resolved.
Show resolved Hide resolved

### Resource merge logic

Resource merge logic is updated as follows (new mechanisms in bold):

The resulting resource will have the Schema URL calculated as follows:

- If the old resource's Schema URL is empty then the resulting
resource's Schema URL will be set to the Schema URL of the updating resource
- Else if the updating resource's Schema URL is empty then the resulting
resource's Schema URL will be set to the Schema URL of the old resource,
- Else if the Schema URLs of the old and updating resources are the same then
that will be the Schema URL of the resulting resource,
- **Else if the Schema urls of the old and updating resources are compatible**
**then the older resource will be converted to newer schema url and merged.**
bogdandrutu marked this conversation as resolved.
Show resolved Hide resolved
Comment on lines +146 to +147
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that "older resource" may be used in two senses here. One being the target of a merge operation and the other being the resource with the lesser schema version. Utilizing a different term for one or both of those senses may help avoid confusion.

- Else this is a merging error (this is the case when the Schema URL of the old
and updating resources are not empty and are not compatible).

### Resource conversion logic

SDKs will be expected to provide a mechanism to apply transformations
(both backwards and forwards) listed in `changes` for any compatible schemas.

**Note: Until conversion logic is available in SDKs, any change to telemetry**
**schema that relies on `changes` will be considered a breaking, i.e.**
**they will lead to a major version bump.**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"major version bump" of what?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the schema version


## Trade-offs and mitigations

The major drawback to this proposal is the requirement of making all schema
migration implementations available in all SDKs. This is mitigated by the
bogdandrutu marked this conversation as resolved.
Show resolved Hide resolved
following:

- Initially, we could prevent any `rename` alterations to schema for compatible
version numbers. `convertToSchema` would check for semver compatibility and
just return the original resource if compatible.
- Alternatively, we can check the newer schema for any transformations in the
`all` or `resource` section before failing to convert.
Comment on lines +166 to +170
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this would be a "poor mans" conversion function, which just figures out that the conversion is noop and can just return the original and fail otherwise, right? I believe this should cover a very large number of real-world use cases of existing resource detectors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this check be limited to attributes actually present on the resources being evaluated, or would the presence of any transformation, even for non-present attributes, prevent merging/migration?


This should allow a slow-rollout of implementation by SDKs by increasing the
number of successful `convertToSchema` operations they allow.
bogdandrutu marked this conversation as resolved.
Show resolved Hide resolved

## Prior art and alternatives

It's a common practice in protocols to negotiate version semantics and for
newer code to provide backwards-compatible "views" of older APIs for
compatibility. This is intended as a stop-gap to prevent immediate breakages
as the ecosystem of components evolves to the new protocol on independent
timelines. Specifically "core" components can update without forcing all
downstream to components to also update.

### Alternatives

One alternative we can take here is to force all exporters to always be on the
same version of telemetry schema as the SDK and find a mechanism to warn/prevent
older exporters from using a newer SDK.

Another alternative is for SDKs to continue propagating resources as-is and
have exporters issue errors when resource versions don't align with
expectations.

## Open questions

- What kinds of operations on Resource should be considered compatible?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is defined by schema OTEP. Conversion between families is incompatible. Ambiguous "renames" are incompatible.

- Is "downgrading" a schema version going to be "safe" for exporters?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is safe. Any reason to believe it may not?


## Future possibilities

The notion of telemetry-schema conversion could be expanded and adapted into
its own set of functionality that SDKs or SDK-extensions (like resource
Comment on lines +201 to +202
Copy link
Member

@tigrannajaryan tigrannajaryan Jun 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make them independent part of SDKs, consumable separately if possible by code that does not care about the rest of Otel but needs to deal with schemas. It will be very nice though to make this an official part of Otel libaries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need this because you want to be able to also convert Spans/Metrics/Logs that also have schemaURL.

detectors and exporters) can make use of.