Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed messaging.system attribute type to an open enum #517

Merged
merged 8 commits into from
Nov 16, 2023

Conversation

AlexanderWert
Copy link
Member

@AlexanderWert AlexanderWert commented Nov 14, 2023

Changes

Follow-up from #490 (comment)

For consistency with database.system changing the messaging.system attribute type from string to enum. This also helps with documenting well-known values to be used for different messaging systems.

Merge requirement checklist

Copy link
Member

@joaopgrassi joaopgrassi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this requires a breaking change changelog entry?

@AlexanderWert
Copy link
Member Author

I guess this requires a breaking change changelog entry?

Yeah, was not sure whether that is practically a breaking change or not. Since it still allows for custom values, so any string values being used before should still work, right?

@arminru
Copy link
Member

arminru commented Nov 14, 2023

I guess this requires a breaking change changelog entry?

Yeah, was not sure whether that is practically a breaking change or not. Since it still allows for custom values, so any string values being used before should still work, right?

It could be considered breaking for an instrumentation that used to report amazon_sqs before but would be supposed to change this to amazonsqs now with the MUST wording.

@AlexanderWert
Copy link
Member Author

It could be considered breaking for an instrumentation that used to report amazon_sqs before but would be supposed to change this to amazonsqs now with the MUST wording.

The value for Amazon SQS is still AmazonSQS (which is what the implementations use , at least Java does) not amazonsqs (that is only the yaml ID). Though, I don't like that AmazonSQS value because it's not following the lowercase pattern of the other messaging systems. But kept it unchanged here to not break instrumentations.

@Oberon00
Copy link
Member

Oberon00 commented Nov 14, 2023

But kept it unchanged here to not break instrumentations.

I think it would be better to change to the desired value right away, because different instrumentations are using different values here. I have even seen differences in Lambda SQS event vs. SDK SQS usage. So it will break someone anyway.

Signed-off-by: Alexander Wert <[email protected]>
Copy link
Contributor

@pyohannes pyohannes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced by the advantages of the approach taken for database.system. Open enums are a good choice for a sufficiently stable set of values, but not for a set of values that is expected to expand continuously.

  • We will end up with an ever-growing list that is never complete (there are already 51 entries for databases), with many outdated or irrelevant values that aren't of use to anybody.
  • Any time we want to add support for a new system, we'll need to change the system-independent part of semantic conventions. That's bad design in my opinion.
  • We blur the distinctions regarding ownership and stability between system-specific and system-independent conventions. This might become tricky once system-independent semantic conventions are stable, but system-specific ones aren't.
  • Technically, adding new values to this list can be seen as a breaking change. Somebody who is using the identifier AmazonSNS is compliant according to the proposed settings, but once we add amazon_sns they aren't anymore.

We need to be consistent between [messaging|database|rpc].system attributes, and I don't want to block this change at any price. However, I'd like a discussion where we settle on a consistent approach for all of those attributes before doing any changes here, as the benefits aren't immediately apparent to me.

@AlexanderWert
Copy link
Member Author

@pyohannes Thanks for your review! There are some really good points and I agree we should discuss them more, but also have different opinions on some of the points.

Any time we want to add support for a new system, we'll need to change the system-independent part of semantic conventions. That's bad design in my opinion.

It's a good point, but highly depends on whether we consider adding a new value to an open enum a breaking change or not. If not, I don#t see that as a problem (even if the tech-independent part would be stable), because the changes would be additive only.

  • Technically, adding new values to this list can be seen as a breaking change. Somebody who is using the identifier AmazonSNS is compliant according to the proposed settings, but once we add amazon_sns they aren't anymore.

I think that's what I see as a big practical problem with the current approach for messaging.system + the fact that we already have different implementation with AmazonSQS, amazon_sqs, ..., so there's no consistency just because it is not explicitly documented what values to use for known messaging systems.

IMHO, the enum is a great way to have an overview in one place and because it's an open enum users can still use custom values when needed.

We blur the distinctions regarding ownership and stability between system-specific and system-independent conventions. This might become tricky once system-independent semantic conventions are stable, but system-specific ones aren't.

That's a good point. BUT, we are talking here about one value per tech-specific convention. And practically it doesn't matter what the value is as long as it is being used consistently for a given technology. So, once defined it should not break that often / at all?

We will end up with an ever-growing list that is never complete (there are already 51 entries for databases), with many outdated or irrelevant values that aren't of use to anybody.

Why is that a problem? Even if not complete, if I'm looking for what value I need to use (as an instrumentor or user) for (let's say) MongoDB I can look up in that list and I don't care if there are other values in there as long as I can find what to use for that specific technology.

@lmolkova
Copy link
Contributor

lmolkova commented Nov 14, 2023

I like that we can unify a system string for one messaging system across different client libraries/languages.
Also, I don't believe that the enum has to be exhaustive.

We can use the same principle as for the error.type where we can define some values in the spec and (once tooling allows), extend that enum in the messaging-system-specific extension.

E.g. if I add an extension for Azure Service Bus, it'd do something like

  - ref: messaging.system
    type:
       members:
         - id: servicebus
         ... # no other members

This way core messaging spec may remain slim and we can still have documented name for each messaging system.

Signed-off-by: Alexander Wert <[email protected]>
@pyohannes
Copy link
Contributor

It's a good point, but highly depends on whether we consider adding a new value to an open enum a breaking change or not.

I checked the current stability definitions, according to that it's not a breaking change:

"Specifically for enums the list of allowed values is expected to change overtime."

We will end up with an ever-growing list that is never complete (there are already 51 entries for databases), with many outdated or irrelevant values that aren't of use to anybody.

Why is that a problem? Even if not complete, if I'm looking for what value I need to use (as an instrumentor or user) for (let's say) MongoDB I can look up in that list and I don't care if there are other values in there as long as I can find what to use for that specific technology.

Do you see it as a possibility to have these lists of systems in separate files and refer to those? As those lists might get very long (thinking years ahead maybe dozens, or in the case of databases maybe hundreds of values) and need to be duplicated for metrics and span documents.

Admittedly readability isn't a primary goal of specifications, however I think this could make it easier to consume those documents, especially the messaging document which is already quite long and complex.

@AlexanderWert
Copy link
Member Author

Do you see it as a possibility to have these lists of systems in separate files and refer to those? As those lists might get very long (thinking years ahead maybe dozens, or in the case of databases maybe hundreds of values) and need to be duplicated for metrics and span documents.

Admittedly readability isn't a primary goal of specifications, however I think this could make it easier to consume those documents, especially the messaging document which is already quite long and complex.

I agree that the list can grow long. And I like @lmolkova 's proposal on how to extend the values in tech-specific conventions (that would be in multiple files). But that's not supported yet by tooling. So how about the following?

  1. we start with this list here
  2. we start working on realising @lmolkova 's proposal in the tooling
  3. we move over to a "distributed" list once tooling supports it (this would be only a refactoring)

One think we have to keep in mind though: If there are hundreds of databases / messaging systems that we want to provide a value for (and I thing we should specify those values to avoid inconsistencies) then a simple list is the most compact approach. Writing these values into separate semantic convention files would result in hundreds of individual semantic convention pages which might be even worse, especially if the only purpose for such a page would be to document that value (without any additional attributes or conventions defined).

@pyohannes
Copy link
Contributor

pyohannes commented Nov 16, 2023

One think we have to keep in mind though: If there are hundreds of databases / messaging systems that we want to provide a value for (and I thing we should specify those values to avoid inconsistencies) then a simple list is the most compact approach. Writing these values into separate semantic convention files would result in hundreds of individual semantic convention pages which might be even worse, especially if the only purpose for such a page would be to document that value (without any additional attributes or conventions defined).

I agree. I was thinking about a separate file like known-messaging-systems.md (or any better name), which can then be linked from wherever messaging.system is used (same for db.system and rpc.system). I'm not sure that's covered by @lmolkova's proposal.

I'm willing to work on tooling for that, if there's a consensus that we want to go that route.

@AlexanderWert
Copy link
Member Author

AlexanderWert commented Nov 16, 2023

I was thinking about a separate file like known-messaging-systems.md (or any better name), which can then be linked from wherever messaging.system is used (same for db.system and rpc.system).

I like that idea!

So are you fine with making it an enum in this PR and then we'll work on making your proposal possible?

@pyohannes
Copy link
Contributor

pyohannes commented Nov 16, 2023

So are you fine with making it an enum in this PR and then we'll work on making your proposal possible?

Yes, sounds good to me.

@AlexanderWert AlexanderWert merged commit 971383f into open-telemetry:main Nov 16, 2023
9 checks passed
pyohannes pushed a commit to pyohannes/semantic-conventions that referenced this pull request Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

7 participants