-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Telemetry Schema and Resource #161
base: main
Are you sure you want to change the base?
Telemetry Schema and Resource #161
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice OTEP. Spot on with the spirit of schemas and nicely builds on top of it.
Makefile
Outdated
.PHONY: check | ||
check: markdown-lint misspell markdown-link-check | ||
echo "All checks complete" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: unrelated change.
* | ||
* <p> Returns an invalid Resource if conversion is not possible. | ||
*/ | ||
public Resource convertToSchema(String schemaUrl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This is an expected capability of schemas, so no surprises here.
I believe this should also be able to return an error since some conversions are not possible (e.g. between different schema families or if the versions are incompatible).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Resource resolution happens at application startup, I'm nervous about having SDKs have to go and fetch multiple schemas from the public internet as a part of the startup process, in order to perform the conversion. There are failure modes here, and general startup-time issues to be considered (especially, for example, in mobile or FAAS implementations).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Resource resolution happens at application startup, I'm nervous about having SDKs have to go and fetch multiple schemas from the public internet as a part of the startup process, in order to perform the conversion.
That's a good point. The SDKs can have the schema file that they correspond to embedded with the source code so that there is no need to fetch anything for detectors that are part of the SDK. If each SDK includes the latest Otel schema file then any Otel schema conversions can happen without network access.
Custom, non-Otel schema conversions will still require a network fetching. Perhaps we also provide an alternate convertToSchema
function which accepts a schema file instead of schema URL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that an alternate conversion method should be required to work from a file, but an alternate URL->bytes function could be provided. I'd imagine possibly a three tier lookup: 1) embedded in the application by the SDK, 2) on disk in a well-known or specified location 3) over the network through traditional URL resolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per this thread conversation, it may be better to start with a solution that "file"/"bytes" for the schema are provided, and delegate to the user of the API to ensure that is available somewhere to grab. Some users may embed the files in their application, some may embed the schema bytes, etc.
@jkwatson had a very good point, and probably we should postpone the API that fetches from the internet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the notion of providing the local schema instead of fetching from the URL. We may go a step further and define Schema
as an opaque object that is loadable from file or from a URL. This opens up the possibility of performance improvements (parse and compile the schema once such that the subsequent conversion calls are fast). See e.g. prototype here that has "parse/compile" and "convert" as separate operations.
The notion of telemetry-schema conversion could be expanded and adapted into | ||
its own set of functionality that SDKs or SDK-extensions (like resource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make them independent part of SDKs, consumable separately if possible by code that does not care about the rest of Otel but needs to deal with schemas. It will be very nice though to make this an official part of Otel libaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need this because you want to be able to also convert Spans/Metrics/Logs that also have schemaURL.
|
||
## Open questions | ||
|
||
- What kinds of operations on Resource should be considered compatible? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is defined by schema OTEP. Conversion between families is incompatible. Ambiguous "renames" are incompatible.
## Open questions | ||
|
||
- What kinds of operations on Resource should be considered compatible? | ||
- Is "downgrading" a schema version going to be "safe" for exporters? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is safe. Any reason to believe it may not?
Some general questions I have about this proposal:
|
It will be used by the SDK itself to correctly merge the resources. It can be also used by an external code to do schema translation (e.g. the Collector or a backend).
Yes, you can have your own schema that is not Otel schema. You will need to choose a schema family URL and publish the schema files for each version you release.
They are different schema families, not comparable. "Newer" only makes sense for schemas that belong to the same family.
No, not compatible, since they are different families.
You can't. Note: I planned a concept of parent schemas for the future. This would allow deriving Oracle schema from Otel schema and in that case possibly allow conversion. But it is not part of the current Schemas concept yet.
There is no code needed to be introduced when you update the conventions. The schema conversion code is capable of generically handling changes to convention. The rules defined in schema file drive this generic schema conversion code. |
I'm trying to understand how an actual implementation would do this. So does this mean that we are going to either embed the schema file or go fetch the schema file when needed? Then the code that would be written for this would be an interpreter of the schema to dynamically convert one version to the next? |
Both are valid approaches.
Yes, exactly. If you haven't read the Schemas OTEP it may be useful to do it since it answers some of your questions: https://github.com/open-telemetry/oteps/blob/main/text/0152-telemetry-schemas.md |
Yes, we need this for GCP exporters.
SDK != End users. Instrumentation library = API. This needs to be in the SDK for exporters, and it's a code-only thing. End users should be interacting MOSTLY via configuration or 'code configuration' on the sdk.
We also need this in the collector. This is an interface that should be provided to consumers of
I think tigran responded to most of these.
Totally agree these are good questions. I see this migration being a temporary solution to prevent ecosystem breakage as everyone adopts the latest telemetry schema. Let's say a user is using the following components:
If we take the approach we have today for Resources, it means I need to upgrade them ALL in lockstop for correct behavior. To mitigate this, we've kept Resource detection components + GCP exporters in our own repositories and release them together, so it's easier to know if you have an issue. However, if folks use any other Resource Detector component besides the one we provide, even if it abides by semantic conventions, we can't tell if we're compatible. Renaming attributes is silent breakage to some extent, and worse if the exporter is unable to declare which version of the schema it was using. |
So, let me start off I like the ideas presented here, but I think there are a few assumptions that have been made that aren't stated, and it's a bit light on how external conventions interact with the conventions discussed. And as I'm thinking more on this I think we are solving two separate issues, How to merge Resource Conventions, and How to merge Telemetry Conventions. They are both the same problem, but where we can solve them is different, as well as how we should solve them might be different. We might want to split the issue of resource migration and telemetry migration into different proposals. The major concern right now with useability. If I want to guarantee any single convention outside of the current semantic conventions, anything that uses that is unable to be mixed in with other data. Meaning even if my convention is a superset of a semconv there is no way to upgrade other libraries to my convention. Nor can I downgrade to a different family of conventions. The other large concern is that this seems to be leading us to dynamic processing of arbitrary schemas. While this can be made safe, it is a very challenging task to do so safely, and I would highly recommend against it.
For the resource portion, the user/library writers will know when they write the application what conventions will be used. I think we shouldn't have any complicated logic around transforming those but push for a way for an application author to know:
For the telemetry portion, I think a stand-alone library is much more appropriate. SDKs don't provide any ingest logic and this is an ingest problem. |
### Understanding schema compatibility | ||
|
||
Here we propose simply providing an implementation of | ||
[SemVer](https://semver.org/) for isCompatible, and using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused on this. If you had a schema rule like the one above:
next.version:
resources:
changes:
- rename_attributes:
attribute_map:
cloud.zone: cloud.availability_zone
previous.version:
since there is a transform between versions, would the two schemas be compatible and have the same major version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me what the answer to this is, which is why I opened: #162
IF the answer to the quesiton is "yes they're compatible", then I think this OTEP resolves some major pain that kind of change would cause.
@MadVikingGod You state:
I'd disagree with the later part. |
@jsuereth do you plan to continue working on this OTEP? |
@tigrannajaryan Yes, tabled it temporarily to work more directly with metrics. I still think this is an unresolved issue w/ semantic conventions + telemetry. |
@jsuereth I believe we need this change. It evolves the schemas in the right direction. |
* | ||
* <p> Returns an invalid Resource if conversion is not possible. | ||
*/ | ||
public Resource convertToSchema(String schemaUrl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that an alternate conversion method should be required to work from a file, but an alternate URL->bytes function could be provided. I'd imagine possibly a three tier lookup: 1) embedded in the application by the SDK, 2) on disk in a well-known or specified location 3) over the network through traditional URL resolution.
- **Else if the Schema urls of the old and updating resources are compatible** | ||
**then the older resource will be converted to newer schema url and merged.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that "older resource" may be used in two senses here. One being the target of a merge operation and the other being the resource with the lesser schema version. Utilizing a different term for one or both of those senses may help avoid confusion.
- Initially, we could prevent any `rename` alterations to schema for compatible | ||
version numbers. `convertToSchema` would check for semver compatibility and | ||
just return the original resource if compatible. | ||
- Alternatively, we can check the newer schema for any transformations in the | ||
`all` or `resource` section before failing to convert. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this check be limited to attributes actually present on the resources being evaluated, or would the presence of any transformation, even for non-present attributes, prevent merging/migration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, I think definitely a prototype is needed :)
My major concern is about providing the downgrading functionality since right now we require changes to be backwards compatible (semver does the same) but not forward compatible.
Please consider to "resolve" all the comments if the discussion got to the resolution point.
text/0149-exponential-histogram.md
Outdated
@@ -0,0 +1,162 @@ | |||
# Add exponential bucketing to histogram protobuf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge conflict.
We propose a new set of SDK features: | ||
|
||
- A mechanism to "migrate" a resource up/down compatible schema changes. | ||
- A mechanism to determine if two schema URLs allow compatible changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a long term issue or just a short term need, I had the impression that we should always do backwards compatible changes. So we can always go up, sometimes we may not be able to go down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an issue of how users consume exporters + resource detecters in the SDK. Both are pluggable components and we can expect resource detection to advance ahead of exporters, so the forward-compatible behavior is needed in that kind of distribution environment.
|
||
We propose a new set of SDK features: | ||
|
||
- A mechanism to "migrate" a resource up/down compatible schema changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should provide only the "go up" version, and enforce everyone to use the most recent (latest) schema. The exporter in your case should be able to adopt (maybe) the rules to work on the Resource schema (if newer) vs downgrading the Resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can start by implementing only go-up, but as I stated, there's an issue with mismatch between various sourced components. If we're not somewhat compatible in a forwards way, we could cause issues across the ecosystem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be comfortable with both up and down directions. The schema files unambiguously defines whether it is safe to go in a particular direction.
* | ||
* <p> Returns an invalid Resource if conversion is not possible. | ||
*/ | ||
public Resource convertToSchema(String schemaUrl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per this thread conversation, it may be better to start with a solution that "file"/"bytes" for the schema are provided, and delegate to the user of the API to ensure that is available somewhere to grab. Some users may embed the files in their application, some may embed the schema bytes, etc.
@jkwatson had a very good point, and probably we should postpone the API that fetches from the internet.
labels, e.g. | ||
|
||
```java | ||
public static String MY_SCHEMA_VERSION="https://opentelemetry.io/schemas/1.1.0"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary related question: Is this the expected value for the proto fields or just the string number "1.1.0"?
/cc @tigrannajaryan
can be converted to a compatible schema. Specfically resource can be merged | ||
IFF: | ||
|
||
- The schemas are compatible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question, is this something that we need to worry? Should we assume that converting to the "newest" is always possible?
|
||
There are three major changes to SDKs for this proposal. | ||
|
||
### Understanding schema compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this section needed because currently the schema url does not respect semver for the version number? I though that was implicit since the schema url version number corresponds to the specs version which follows semver, probably my assumption is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Schema URL could be for a different specification entirely. This OTEP proposed a level of compatibility we could not assume before.
This is needed so we CAN assume semver for the default otel schema.
e.g. In Java you might have the following for resource: | ||
|
||
```java | ||
public abstract class Resource { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we say that similar changes will be applied to other places where schemaurl is used, like spans/metrics/logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, that would be the goal.
The notion of telemetry-schema conversion could be expanded and adapted into | ||
its own set of functionality that SDKs or SDK-extensions (like resource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need this because you want to be able to also convert Spans/Metrics/Logs that also have schemaURL.
Co-authored-by: Nikita Salnikov-Tarnovski <[email protected]>
Co-authored-by: Reiley Yang <[email protected]>
5ea71c0
to
71155a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is well thought out.
@jsuereth do you plan to continue working on this OTEP? |
@jsuereth I assume we still need this OTEP? |
@tedsuo I think we still need it but I've put effort on hold to address higher-priority items. I think we still have time to revisit this post some semconv stability, and this needs two more approvers. |
@jsuereth we are cleaning up stale OTEP PRs. If there is no further action at this time, we will close this PR in one week. Feel free to open it again when it is time to pick it back up. |
Specify a mechanism for users of Resource to interact with Resource leveragingTelemetrySchema version for a stable API.
Goals of this OTEP: