-
Notifications
You must be signed in to change notification settings - Fork 50
Allow any unicode characters in the TagValue #71
Allow any unicode characters in the TagValue #71
Conversation
I feel that we are being unreasonably US-centric in restricting tag values to latin characters. It is very easy to imagine useful tags that contain non-latin character, for example the names of Chinese metro areas. We should not force people to latinize all these just to use OpenCensus. |
/cc @dinooliva |
2935694
to
d59e999
Compare
I would like to hear what others think about this. I feel this should be fine. |
Fixes #53. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
Might need a rebase. |
@Ramonza I would like to know if the set of characters that you propose here are accepted by few of the major backends that we try to support:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not find a conclusive answer for Stackdriver labels (it just says "A variable-length string" in the proto), but filters explicitly allow UTF-8 strings. Empirically, writing a UTF-8 label value and reading it back works correctly.
/cc @g-easy |
@bogdandrutu |
Prometheus permits any characters: https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels In gRPC propagation, the tags are binary encoded as UTF-8 strings, so there should be no problem there. For HTTP, we can use mime encoding.
d59e999
to
a6bf0f6
Compare
I am fine with this as long as we document in all the languages that some backends may have different restrictions and users needs to be aware of this limitation. |
Pragmatically, we didn't allow this because of compatibility issues internal to google and because it allows fast encoding/decoding to languages that don't natively use UTF8 (e.g. Java). More generally, we've had discussions about making TagValue an uninterpreted sequence of bytes, which probably makes more sense. Making this change requires some rollout effort so better to design the solution and do it once. |
@dinooliva I think it would make for very strange APIs for tag values to be uninterpreted bytes. perhaps we should just remove the statement about length limit and instead of it being an error allow the user to truncate the values at the point of serializing them (when it really matters)? cc @bogdandrutu |
Prometheus permits any characters:
https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels
In gRPC propagation, the tags are binary encoded as UTF-8 strings,
so there should be no problem there. For HTTP, we can use mime
encoding.