-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ephemeral Resource Attributes #208
Conversation
|
||
There are two types of resource attributes, **permanent** and **ephemeral**. Attributed which are labeled as permanent in the semantic conventions must be present when the SDK is initialized. They cannot be added or updated at a later date. | ||
|
||
Resources are managed via a ResourceProvider. Setting an attribute on a ResourceProvider will cause that attribute value to be included in the resource attached to any signal generated in the future. Spans which have already been started, along with any telemetry which has already been passed to the export pipeline, will not have the new attribute value. Optionally, a check can be added to ensure that permanent resources are not modified after the SDK has started |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optionally, a check can be added to ensure that permanent resources are not modified after the SDK has started
If nested attributes proposal is accepted, then one way to simplify ephemeral resources validation is to have just one attribute called ephemeral
- the ResourceProvider then allows any modification to the value of this attribute and does not need to look up for which attributes are permanent. This also avoids the need to mark the resource attributes permanent in the semantic conventions yaml files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how this would simplify things? You then still have an attribute that needs special handling. Whether it is by name or with an explicit label would not make things more/less simple, would it?
The nested attributes proposal also does not require SDKs to implement them. If we want ephemeral attributes to depend on that, it would mean that SDKs could also not implement ephemeral attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the proposal correctly, it requires that the permanent attributes be marked so in the semantic conventions. This is the part that will not be required if we limit the special handling to only one attribute with a known name.
Consider the following resource. The ResourceProvider
can allow anytime modifications to the key-value pairs within the ephemeral
attribute.
{
service.name: foo,
service.instance.id: 123,
browser.user_agent: bar,
ephemeral: {
session.id: 456
}
}
Anyway, this is an optimization step. Let's ignore this initially until the larger proposal gets acceptance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marking something in the semantic conventions is just that: A convention. If we want something to be conventionally ephemeral, we still need to have a note about that in the semantic conventions one way or another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it would simply things if ephemeral resources were kept separate from other resources.
Validator is also something which can be run in development, but disabled in production, which would work as an optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, one aside on nested attributes: my assumption is that attribute values wouldn't be merged, they would be replaced.
In other words, there is still only a single string key per attribute, but with the option of storing an object, map, or array as the value for that attribute. If you set a new value for the key, it would throw the old value away.
|
||
An alternative to ephemeral resources would be to create span, metrics, and log processors which attach these ephemeral attributes to every instance of every signal. This would not require a modification to the specification. | ||
|
||
There are two problems to this approach. One is that the duplication of attributes is very inefficient. This is a problem on clients, which have a limited newtwork bandwidth. This problem is compounded by a lack of support for gzip and other compression algorithms on the browser. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to quantify this. How inefficient is it? A benchmark demonstrating this would be a strong argument in favour of the proposed approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If processors can change scope attributes, they might be a good candidate to solve this as well.
lack of support for gzip and other compression algorithms on the browser
I'm not an expert on browser stuff but can you expand on this? On its surface it seems wrong since gzipped static resources show up everywhere on the internet and there are js implementations of gzip (like this). This stackoverflow post suggests that a part of it is because a browser client can't know if the server can accept gzipped data, but OTLP requires gzip support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's uncommon because clients may not know if the server can accept compressed data. It is not clear to me if the gzip support in the OTLP spec refers to responses only (common for web services to provide) or requests as well (uncommon).
I think there is also a danger of an attack on the server - compressed data could be expanded to a very large content. And lastly gzip compression is not native to browsers, so there is CPU overhead, which is important to consider for impact on user experience, especially when sending data while the page is unloading.
Aside from that, I think that session ID specifically does not belong on individual signals. The session is a context for many signals in a given time period; it does not vary from signal to signal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind on the OTLP gzip support, I see it says that clients MAY gzip the content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tigrannajaryan Regarding the limited network bandwidth, the sendBeacon() API has a payload limit of 64KB. Assuming session.id attribute that looks like this when sent over the wire
{"key":"session.id","value":{"stringValue":"8fded6726f630a327ee3be41174a8a91"}}
It adds 79 bytes per each signal. The number of spans/events per export will depend on the type of application and which instrumentations are present. But assuming that 100 is plausible, this adds almost 8kB to the payload.
This will further increase if we add additional context attributes (user attributes, URL etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While decompressing is common in browsers (be it gzip or brotli), none of the current request APIs expose a way to have browser compress the request (MDN: XHR, fetch)
This is incorrect. The CompressionStreams API provides a native solution for this and is supported in Chromium-based browsers already.
This does mean that indeed you have to bring your own compression methods. More code = larger bundle that the browser needs to download, parse and execute. First phase is network bound (but does benefit from compression itself), while second and third are CPU bound.
Also benefits from caching. In a network-constrained situation the cost of retrieving the additional code is paid once and the result cached. Conditional requests and etags are your friend.
Also in most cases instrumentation is required to be loaded ASAP (sometimes even before rest of the content on the page), causing site loading to be blocked until code is downloaded (should it be in the of the page)
The additional code for compression is only needed to export telemetry and does not need to be loaded at the same time as the code enabling instrumentation. Deferring until an export is required can increase the time-to-export but would not impact time-to-interaction or any other user-focused timing.
I propose adding an entirely new field called ephemeral_resource as a sibling to resource in ScopeSpans and ScopeLogs - this way, the original resource remains immutable and the new field can be use for the ephemeral attributes of the resource.
This is inverted. ResourceSpans
contain ScopeSpans
, not the other way around. Ephemeral resource attributes could be added as Scope*
attributes on each Scope*
produced during the time when the ephemeral resource attributes are active, but I'm not sure I see how changing the OTLP data structure advances the conversation in a safe way. That is the most invasive way of going about this that I could think of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Aneurysm9 sorry my bad, I meant ResourceSpans
and ResourceLogs
and not ScopeSpans
and ScopeLogs
. I corrected this in my previous comment, can you check if it makes sense this time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cost to generate, serialize, and compress that many spans is also not a synchronous process that takes x milliseconds, but many small processes which each take a small fraction of X. It is most important to ensure that each individual step doesn't impact user experience. With the example of 100 spans otlp -> protobuf -> pako on the pixel 4a given, the whole process is 4.393ms but you have 2 chances to yield to the event loop to ensure user experience is not affected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect. The CompressionStreams API provides a native solution for this and is supported in Chromium-based browsers already.
Have missed it but I generally don't consider new browser features as a solution unless usage% is >90% (and well, safari has a monopoly on ios so....) (also 90% is probably low considering how much RUM products are asked for IE11 support but they already have a miserable experience due to using IE in current year so making it optional is worth consideration)
but you have 2 chances to yield to the event loop to ensure user experience is not affected.
There is one but - not when user is leaving the page, tho generally you don't have 100 spans then, making it a question of how much do you want to maintain 2 different code paths (a sync one and an "async" one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tigrannajaryan i just want to emphasize what @scheler said, that the purpose of this OTEP is not to avoid compression or gain efficiency, but to extend our data model in a way that correctly represents these attributes.
If we don't want to extend the current Resource concept, we could add a new concept, call it ProccessScope or something similar, and have it work in effectively the same manner.
Personally, I'd prefer we extend resources over adding a new scope. But I prefer both over an approach that makes it impossible to cleanly implement RUM using OpenTelemetry.
In other words, I'm against "just tack on the process scope as span/event attributes" the same way I'd be opposed to "just tack on the instrumentation scope as span/event attributes." In both cases, yes it would "work." But it would create a headache for implementers and confusion for users.
We should strive for a clean data model, where everything is explained just by looking at the data structure.
|
||
There are two types of resource attributes, **permanent** and **ephemeral**. Attributed which are labeled as permanent in the semantic conventions must be present when the SDK is initialized. They cannot be added or updated at a later date. | ||
|
||
Resources are managed via a ResourceProvider. Setting an attribute on a ResourceProvider will cause that attribute value to be included in the resource attached to any signal generated in the future. Spans which have already been started, along with any telemetry which has already been passed to the export pipeline, will not have the new attribute value. Optionally, a check can be added to ensure that permanent resources are not modified after the SDK has started |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how this would simplify things? You then still have an attribute that needs special handling. Whether it is by name or with an explicit label would not make things more/less simple, would it?
The nested attributes proposal also does not require SDKs to implement them. If we want ephemeral attributes to depend on that, it would mean that SDKs could also not implement ephemeral attributes.
|
||
## Trade-offs and mitigations | ||
|
||
This change should be fully backwards compatible, with one potential exception: fingerprinting. It is possible that an analysis tool which accepts OTLP may identify individual services by creating an identifier by hashing all of the resource attributes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another issue: Exporters right now may be implemented to assume they only ever deal with spans with the same resource. With this proposal, they could receive a batch of mixed spans.
Such an exporter may then misbehave and e.g. use the resource from the first/last span for everything.
Implementing sorting of spans by resource can be a bit costly.
Also there may be exporters for protocols that only support a single resource per connected agent. They would then probably need to stamp the ephemeral attributes on every single telemetry item.
Similar issues may apply to span processors.
(And possibly samplers that receive a resource in their constructor, but I don't think that will be a problem in practice open-telemetry/opentelemetry-specification#1658)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, exporters must deal with more than one resource already, which is what made this change so simple!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an issue open for that: open-telemetry/opentelemetry-specification#1690
Right now, I don't think it's clear, and Dynatrace exporters take a shortcut here and always use the resource of the first item, assuming it will be the same for every item in the batch (everything else is an absolute edge case today)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I agree this should be clarified. My understanding is that a BatchSpanProcessor may be shared across multiple SDKs within the same process, and that is done in order to have different sets of resources for different sub-processes. So there is no guarantee that all spans in a batch have the same resource. I know that @MSNev has examples of this pattern.
But, I think that this pattern is extremely rare, so it doesn't surprise me that Dynatrace and other exporters could take a shortcut without anyone noticing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our examples are (currently) used using our internal (not OpenTelemetry) SDK's on clients where multiple teams provide different components to the same "view" (page etc) and need / want to report telemetry to their own backends.
And in some runtimes we have a single batching system which is shared, rather than having each component on the view creating its own SDK instance with all of the overhead and batching mechanisms. Thus reducing the runtime impact on resources for the client (CPU, Memory, etc)
|
||
This change should be fully backwards compatible, with one potential exception: fingerprinting. It is possible that an analysis tool which accepts OTLP may identify individual services by creating an identifier by hashing all of the resource attributes. | ||
|
||
In this case, it is recommended that these systems modify their behavior, and choose a subset of permanent resources to use as a hash identifier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be a pretty big deal for some, if they only allow storing one set of resource attributes per hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@open-telemetry/specs-approvers Please take a look - I suspect we may need a lot of eyes, in case somebody relies on this right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems crazy to me to use a resource hash as an identifier, given that there is no requirement that the items within it would uniquely identify a service...
But I'm throwing it out there as a possibility, just to cover all the bases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be using something that doesn't exist yet, instead of hashing the whole resources: open-telemetry/opentelemetry-specification#1034 (EDIT: To clarify: We don't do/need this at Dynatrace, I don't know anybody who does. Just a side note)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree! There are various attributes which could count as a unique identifier. We could clarify in the spec which ones are currently defined.
One possibility: by default, the SDK could generate a unique ID every time it starts, which would be a reliable identifier because we generate it ourselves. However, this identifier would not be stable across restarts. So there are limits to what can be provided without user input.
|
||
There are two problems to this approach. One is that the duplication of attributes is very inefficient. This is a problem on clients, which have a limited newtwork bandwidth. This problem is compounded by a lack of support for gzip and other compression algorithms on the browser. | ||
|
||
The second problem is that it becomes difficult to distinguish between emphemeral resources and other types pf attributes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it needed to distinguish them by type? Usually the attribute keys should be all you need. E.g. if you have a session.id
attribute, would you care whether it is an ephemeral resource or a span/event attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the browser,the overhead of applying the session.id
as an attribute on every span and event would be untenable.
As far as the need to differentiate, putting data in the proper envelope helps backend systems use it more effectively.
You might ask, why have resources at all in OTLP? Why not simple apply resources as attributes on every span and event? Besides the inefficiency, it would make life very difficult for backend systems which want to apply different analysis to resources and span attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the browser,the overhead of applying the session.id as an attribute on every span and event would be untenable.
Citation needed 😃
Would these arguments also apply against #207?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see this thread (#208 (comment)) for a lengthy discussion on data limitations in the browser.
I don't think these arguments apply to #207, that proposal would be helpful imho. Just not a solution for ephemeral resources, since many of the events which need these resources happen when there is no trace present.
|
||
An alternative to ephemeral resources would be to create span, metrics, and log processors which attach these ephemeral attributes to every instance of every signal. This would not require a modification to the specification. | ||
|
||
There are two problems to this approach. One is that the duplication of attributes is very inefficient. This is a problem on clients, which have a limited newtwork bandwidth. This problem is compounded by a lack of support for gzip and other compression algorithms on the browser. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In situations where at least one of the ephemeral attributes changes very often, telemetry items are created between the changes and there are lots of permanent attributes, attaching to to the telemetry items ("signal instance") could even be more efficient.
Generally, I wonder how many ephemeral attributes we expect relative to permanent ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not expecting large numbers of ephemeral attributes, nor are we expecting them to change with great frequency.
The expectation is that there would be between 1 and 10 ephemeral attributes set on a client, which may update after 15 minutes of inactivity, after the application reawakens, or in response to a change in user or user settings.
@@ -0,0 +1,78 @@ | |||
# Ephemeral Resource Attributes | |||
|
|||
Define a new type of resource attribute, ephemeral resources, which are allowed to change over the lifetime of the process. Existing resources are redefined as permanent resources, which must be present at SDK initialization and cannot be changed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! That looks like a good proposal, but the context scope still presumes a transactional scope within a server handling many independent transactions.
For clients, all telemetry emitted, including logs which are not bounded by a span, are related. Which is why the resource scope appears to be the correct one for things like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a continuum of use cases here, where some are better addressed by this OTEP and others better by #207. If one added the possibility to set a new context as root context (where the default is the empty context), we could have something that applies to everything.
Though the browser usually only has one thread of execution of which everything is a child context (I believe), so you probably would only need to set the attributes you want as active before starting your root spans, and it would stick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might work... but it might be better to keep the concept of a "process scope" and a "context scope" separate. I see these attributes as more similar to resources and instrumentation scopes - they represent the environment the transaction is occurring within.
Because contexts are immutable, and no rules as to when child contexts may be created, there would be synchronization issues between when ephemeral resources are updated and when they would applied, if they only change the root context and thus only affect transactions which start from a new root context.
@tedsuo Thanks - I feel like some examples would be great, as it seems it's the Validator the one separating Resources/Attributes between permanent and ephemeral? |
Sure, no problem @carlosalberto. Would you want an example implementation? Or an example use case? |
Added an example implementation and example use case. |
Yes? No? What should we do here? Based to these requirements, it would be good to understand how the TC the would like to move forward. |
@tedsuo the spec defined Resource like this:
This text is in a Stable spec document. How do we reconcile this OTEP with the spec's stance on immutability of the Resource? Are you suggesting that we break a Stable spec document? Or you do not think this is a breaking change? |
This doesn't change anything about current resource immutability - an update on the resource provider would end up in a new resource instance. To speak in code: const resourceProvider = new ResourceProvider({
// Initial set of attributes, internally does a new Resource(attrs) and stores it as current value
'session.id': '1',
});
const tracerProvider = new TracerProvider({ resourceProvider });
const tracer = tracerProvider.getTracer(/* irrelevant */);
const span1 = tracer.getSpan(/* ... */);
// internally span.resource = tracer.tracerProvider.resourceProvider.getResource()
// Some time later, user logs in and their identity is known
resourceProvider.setAttribute('enduser.id', 'superadmin');
// internally currentResource = currentResource.merge(new Resource(newAttrs)), which as per the current spec
// returns a new Resource with merged attrs
// That new Resource is set as the current value in ResourceProvider
// Or session expires and a new one is set
resourceProvider.setAttribute('session.id', '2')
const span2 = tracer.getSpan(/* ... */);
span1.resource !== span2.resource
assert.deepEquals(span1.resource.attributes, { 'session.id': '1' });
assert.deepEquals(span2.resource.attributes, { 'session.id': '2', 'enduser.id': 'superadmin' }); |
I disagree. This is not just about a Resource instance in memory. It is about the Resource that is emitted by the instrumented application. The recipients of telemetry expect that the resource is immutable, i.e. its attributes do not change over time. The OTEP talk about this in the "Trade-offs and mitigations" section. I think this is a breaking change. It breaks the contract between Otel sources and telemetry destinations. The OTEP text even recommends this:
I don't think this is acceptable. We are saying that "yes, we broke the contract, deal with it". IMO, we cannot do that. |
I thought a bit more about this, I want to find a solution. I don't think we can delete the requirement which says the Resource is immutable. I think this needs to stay otherwise we are breaking the contract. Additionally, unfortunately the spec says we are not allowed to change the association of the Resource and TracerProvider once that association is established:
However, let's step back for a moment. I don't think recipients of telemetry care about the association inside the SDK. The recipients care about the data model and data model certainly allows the SDK to emit telemetry associated with different Resources. A new TracerProvider can be created with a new Resource and can be used to emit telemetry that was previously emitted using a different TracerProvider and this is completely legal. Given the above, I do not see any clause in the spec that directly prohibits us from introducing a new way for TracerProvider to be associated with some proxy object which itself is associated with a Resource and allow that association to change over time. Yes, this is in a sense cheating, but it allows to introduce this new way such that it is not a breaking change for the SDK. That's what the proxy ResourceProvider here does. To me the following questions remain:
|
It is an attribute that applies to all telemetry coming out of the application. It does not change from signal to signal, nor is it scoped to a specific instrumentation. I don't think there is any other place it could go than the resource level (given the current data model).
I think this is an attempt to alleviate the contract between OTel sources and destinations. If there is a real reason that backends need to have an immutable set of resource attributes per application instance, then this would make it possible by defining in the semantic conventions which attributes are permanent and which can change. We assumed that the only reason backends would be relying on this contract is if they were doing something like hashing all the resource attributes (e.g. to identify the instance). Yes, this would force these backends to be updated, but it would provide them with a way to continue using the hashing. Also, since the TracerProvider can be recreated within the same application instance, defining which attributes are permanent or ephemeral is just making it explicit. |
I'm not sure I see it the same way. Does it truly apply to all telemetry coming out of the application? Is it not possible for the same application instance to have two sessions active? Doesn't the fact that it can change while the application is running necessarily mean that it does not apply to all telemetry? Yes, the "session ID" attribute as a concept does, but not any given value. That is different from all other resource attributes. As for not being scoped to a specific instrumentation, it is akin to the trace ID in that it can be used for correlation of signals. How would it be useful with distribution metrics? Do I really care to have a timeseries for every user session to track load times, or do I want to have a more general metric that has exemplars pointing at potentially interesting sessions? As for where else it could go, it could certainly be added as a scope attribute. This would require a bit more bookkeeping on the part of the instrumentor to keep a map of sessions to tracers, etc., or to store them in session-scoped storage, but is feasible. More appropriate, perhaps, would be in the context where it would be available to all signals. Propagation across process boundaries to allow for correlation (I assume a session can be serviced by application elements that are outside of the immediately user-facing process) is still an issue. I think, though that this all reinforces my belief that session ID and trace ID are synonymous and that sessions are simply long traces. Do we really need a new concept, and to contort ourselves to find ways to claim that we're not breaking compatibility with a stable specification, to handle something that the existing concepts can already handle? |
Note: I originally started this as part of response to open-telemetry/opentelemetry-specification#2500 (comment) but the first section ended up being more related to this otep being stuck, so here it is! Let's eliminate the confusion of what a session means for a bit. There are some other attributes that are
Let's bring in Currently it's defined as a span level identifying attribute. Which hey, makes total sense in a server side environment. You've got a server side service that you can have one server serve all of the users of the application. If you'd want to have enduser that caused a request set on all of the child spans, yes context makes a lot of sense since the entire server isn't dedicated to just one enduser. Anyways got my 3rd condition Let's jump to client side. I open up local food delivery app, and it's instrumented to generate telemetry. Alright, what's the resource attributes. Well you've got
I try to order something but suddenly app runs into a bug and crashes. Smash cut, support person is messaging devop team "hey got this guy going crazy over not being able to order, can you figure out what's going on there, why his app keeps crashing". Devops looks up data based on my name, sees an attempt to order 100 kebabs in tracing spans that caused Somehow you're next to me and mention you uninstalled the app due to constant crashing a while ago and now have a please come back discount on your account. I hand my phone to you, you log my account out and log in with your account. And manage to successfully order after a more reasonable order size. Now the logged in account has changed, so if the logged in account is in resource, the above telemetry should be over 3 resources: Data from me, data from anonymous user, data from you. (so now we've fulfilled condition 2) Other than local food delivery app, some other examples:
But other potential attributes:
So I think something people who haven't built a RUM need to consider is that a major difference between backend services and client side apps is that apps have (a lot more) state. A lot of this state is global (not scoped to parts of the app like within one request that is forgotten once the request is fulfilled), it changes over time (due to time, user interactions, or completely external actions) and in a lot of the cases it's useful or needed to assist in debugging using gathered telemetry (who, what device, what screen/url, what isp, geolocation) A lot of these attributes are also what you'd want to query data by. Already mentioned looking up data based on app user info, but let's consider some of the RUM use cases:
These add considerations for efficient data ingest and storage. Now every vendor will probably have different opinions on this based on how they use and store data. In July I got some knowledge from @mdubbyap on splunk/signalfx ingest side about our use cases (and probably should have used this knowledge earlier so I don't accidentally misremember it but oh well Ted's been on vacation anyway so it wouldn't have helped move this forward): For ingesting best is to minimise the amount of bytes that needs to be read in order to determine where to pipe the data to (be it partitioning, buckets or whatever optimises your infra). Worst is having to read deep enough to get into each span/log/metric and check it's attributes for the value. If it's a value on the resource, ingest only needs to read resource's bytes before determining where to send the data, not needing to parse the rest of the payload. (Since we focus on showing session experience, then obviously for us session.id attribute is 👀👀) There also can be ways fulfilling legal requirements can be easier if these attributes are more easily readable, eg. indexing data based on enduser info to make deleting data on user request (such as GDPR) to be easier Also linking open-telemetry/opentelemetry-specification#2775 as it's gone into topic of descriptive or identifying attributes, which has been to be one of the reasons against this otep so far |
Hi, wanted to give an update on this topic, since some of us from the client-side-telemetry SIG have asked a few TC members to help us on the topic further. Copying the message that @jack-berg posted on slack -
|
This is what I proposed in OTEP #207 to be a blessed concept with its own API, by the way. But as you said, it is in principle implementable today. |
Closing this in favor of a new proposal coming from the RUM/Client group. |
This is a proposal to address Resource and Entity data model interactions, including a path forward to address immediate friction and issues in the current resource specification. The proposal includes all links and context needed to justify it, but duplicating a snapshot here: ## Motivation This proposal attempts to focus on the following problems within OpenTelemetry to unblock multiple working groups: - Allowing mutating attributes to participate in Resource ([OTEP 208](#208)). - Allow Resource to handle entities whose lifetimes don't match the SDK's lifetime ([OTEP 208](#208)). - Provide support for async resource lookup ([spec#952](open-telemetry/opentelemetry-specification#952)). - Fix current Resource merge rules in the specification, which most implementations violate ([oteps#208](#208), [spec#3382](open-telemetry/opentelemetry-specification#3382), [spec#3710](open-telemetry/opentelemetry-specification#3710)). - Allow semantic convention resource modeling to progress ([spec#605](open-telemetry/opentelemetry-specification#605), [spec#559](open-telemetry/opentelemetry-specification#559), etc). --------- Co-authored-by: Tigran Najaryan <[email protected]> Co-authored-by: jack-berg <[email protected]> Co-authored-by: Arve Knudsen <[email protected]> Co-authored-by: David Ashpole <[email protected]>
This is a proposal to address Resource and Entity data model interactions, including a path forward to address immediate friction and issues in the current resource specification. The proposal includes all links and context needed to justify it, but duplicating a snapshot here: ## Motivation This proposal attempts to focus on the following problems within OpenTelemetry to unblock multiple working groups: - Allowing mutating attributes to participate in Resource ([OTEP 208](open-telemetry/oteps#208)). - Allow Resource to handle entities whose lifetimes don't match the SDK's lifetime ([OTEP 208](open-telemetry/oteps#208)). - Provide support for async resource lookup ([spec#952](open-telemetry#952)). - Fix current Resource merge rules in the specification, which most implementations violate ([oteps#208](open-telemetry/oteps#208), [spec#3382](open-telemetry#3382), [spec#3710](open-telemetry#3710)). - Allow semantic convention resource modeling to progress ([spec#605](open-telemetry#605), [spec#559](open-telemetry#559), etc). --------- Co-authored-by: Tigran Najaryan <[email protected]> Co-authored-by: jack-berg <[email protected]> Co-authored-by: Arve Knudsen <[email protected]> Co-authored-by: David Ashpole <[email protected]>
This is a proposal to address Resource and Entity data model interactions, including a path forward to address immediate friction and issues in the current resource specification. The proposal includes all links and context needed to justify it, but duplicating a snapshot here: ## Motivation This proposal attempts to focus on the following problems within OpenTelemetry to unblock multiple working groups: - Allowing mutating attributes to participate in Resource ([OTEP 208](open-telemetry#208)). - Allow Resource to handle entities whose lifetimes don't match the SDK's lifetime ([OTEP 208](open-telemetry#208)). - Provide support for async resource lookup ([spec#952](open-telemetry/opentelemetry-specification#952)). - Fix current Resource merge rules in the specification, which most implementations violate ([oteps#208](open-telemetry#208), [spec#3382](open-telemetry/opentelemetry-specification#3382), [spec#3710](open-telemetry/opentelemetry-specification#3710)). - Allow semantic convention resource modeling to progress ([spec#605](open-telemetry/opentelemetry-specification#605), [spec#559](open-telemetry/opentelemetry-specification#559), etc). --------- Co-authored-by: Tigran Najaryan <[email protected]> Co-authored-by: jack-berg <[email protected]> Co-authored-by: Arve Knudsen <[email protected]> Co-authored-by: David Ashpole <[email protected]>
This is a proposal to address Resource and Entity data model interactions, including a path forward to address immediate friction and issues in the current resource specification. The proposal includes all links and context needed to justify it, but duplicating a snapshot here: ## Motivation This proposal attempts to focus on the following problems within OpenTelemetry to unblock multiple working groups: - Allowing mutating attributes to participate in Resource ([OTEP 208](open-telemetry#208)). - Allow Resource to handle entities whose lifetimes don't match the SDK's lifetime ([OTEP 208](open-telemetry#208)). - Provide support for async resource lookup ([spec#952](open-telemetry/opentelemetry-specification#952)). - Fix current Resource merge rules in the specification, which most implementations violate ([oteps#208](open-telemetry#208), [spec#3382](open-telemetry/opentelemetry-specification#3382), [spec#3710](open-telemetry/opentelemetry-specification#3710)). - Allow semantic convention resource modeling to progress ([spec#605](open-telemetry/opentelemetry-specification#605), [spec#559](open-telemetry/opentelemetry-specification#559), etc). --------- Co-authored-by: Tigran Najaryan <[email protected]> Co-authored-by: jack-berg <[email protected]> Co-authored-by: Arve Knudsen <[email protected]> Co-authored-by: David Ashpole <[email protected]>
This is a proposal to address Resource and Entity data model interactions, including a path forward to address immediate friction and issues in the current resource specification. The proposal includes all links and context needed to justify it, but duplicating a snapshot here: ## Motivation This proposal attempts to focus on the following problems within OpenTelemetry to unblock multiple working groups: - Allowing mutating attributes to participate in Resource ([OTEP 208](open-telemetry#208)). - Allow Resource to handle entities whose lifetimes don't match the SDK's lifetime ([OTEP 208](open-telemetry#208)). - Provide support for async resource lookup ([spec#952](open-telemetry/opentelemetry-specification#952)). - Fix current Resource merge rules in the specification, which most implementations violate ([oteps#208](open-telemetry#208), [spec#3382](open-telemetry/opentelemetry-specification#3382), [spec#3710](open-telemetry/opentelemetry-specification#3710)). - Allow semantic convention resource modeling to progress ([spec#605](open-telemetry/opentelemetry-specification#605), [spec#559](open-telemetry/opentelemetry-specification#559), etc). --------- Co-authored-by: Tigran Najaryan <[email protected]> Co-authored-by: jack-berg <[email protected]> Co-authored-by: Arve Knudsen <[email protected]> Co-authored-by: David Ashpole <[email protected]>
This is a proposal to address Resource and Entity data model interactions, including a path forward to address immediate friction and issues in the current resource specification. The proposal includes all links and context needed to justify it, but duplicating a snapshot here: ## Motivation This proposal attempts to focus on the following problems within OpenTelemetry to unblock multiple working groups: - Allowing mutating attributes to participate in Resource ([OTEP 208](open-telemetry/oteps#208)). - Allow Resource to handle entities whose lifetimes don't match the SDK's lifetime ([OTEP 208](open-telemetry/oteps#208)). - Provide support for async resource lookup ([spec#952](open-telemetry#952)). - Fix current Resource merge rules in the specification, which most implementations violate ([oteps#208](open-telemetry/oteps#208), [spec#3382](open-telemetry#3382), [spec#3710](open-telemetry#3710)). - Allow semantic convention resource modeling to progress ([spec#605](open-telemetry#605), [spec#559](open-telemetry#559), etc). --------- Co-authored-by: Tigran Najaryan <[email protected]> Co-authored-by: jack-berg <[email protected]> Co-authored-by: Arve Knudsen <[email protected]> Co-authored-by: David Ashpole <[email protected]>
This is a proposal to address Resource and Entity data model interactions, including a path forward to address immediate friction and issues in the current resource specification. The proposal includes all links and context needed to justify it, but duplicating a snapshot here: ## Motivation This proposal attempts to focus on the following problems within OpenTelemetry to unblock multiple working groups: - Allowing mutating attributes to participate in Resource ([OTEP 208](open-telemetry#208)). - Allow Resource to handle entities whose lifetimes don't match the SDK's lifetime ([OTEP 208](open-telemetry#208)). - Provide support for async resource lookup ([spec#952](open-telemetry/opentelemetry-specification#952)). - Fix current Resource merge rules in the specification, which most implementations violate ([oteps#208](open-telemetry#208), [spec#3382](open-telemetry/opentelemetry-specification#3382), [spec#3710](open-telemetry/opentelemetry-specification#3710)). - Allow semantic convention resource modeling to progress ([spec#605](open-telemetry/opentelemetry-specification#605), [spec#559](open-telemetry/opentelemetry-specification#559), etc). --------- Co-authored-by: Tigran Najaryan <[email protected]> Co-authored-by: jack-berg <[email protected]> Co-authored-by: Arve Knudsen <[email protected]> Co-authored-by: David Ashpole <[email protected]>
This is a proposal to address Resource and Entity data model interactions, including a path forward to address immediate friction and issues in the current resource specification. The proposal includes all links and context needed to justify it, but duplicating a snapshot here: ## Motivation This proposal attempts to focus on the following problems within OpenTelemetry to unblock multiple working groups: - Allowing mutating attributes to participate in Resource ([OTEP 208](open-telemetry#208)). - Allow Resource to handle entities whose lifetimes don't match the SDK's lifetime ([OTEP 208](open-telemetry#208)). - Provide support for async resource lookup ([spec#952](open-telemetry/opentelemetry-specification#952)). - Fix current Resource merge rules in the specification, which most implementations violate ([oteps#208](open-telemetry#208), [spec#3382](open-telemetry/opentelemetry-specification#3382), [spec#3710](open-telemetry/opentelemetry-specification#3710)). - Allow semantic convention resource modeling to progress ([spec#605](open-telemetry/opentelemetry-specification#605), [spec#559](open-telemetry/opentelemetry-specification#559), etc). --------- Co-authored-by: Tigran Najaryan <[email protected]> Co-authored-by: jack-berg <[email protected]> Co-authored-by: Arve Knudsen <[email protected]> Co-authored-by: David Ashpole <[email protected]>
This is a proposal to address Resource and Entity data model interactions, including a path forward to address immediate friction and issues in the current resource specification. The proposal includes all links and context needed to justify it, but duplicating a snapshot here: ## Motivation This proposal attempts to focus on the following problems within OpenTelemetry to unblock multiple working groups: - Allowing mutating attributes to participate in Resource ([OTEP 208](open-telemetry/oteps#208)). - Allow Resource to handle entities whose lifetimes don't match the SDK's lifetime ([OTEP 208](open-telemetry/oteps#208)). - Provide support for async resource lookup ([spec#952](#952)). - Fix current Resource merge rules in the specification, which most implementations violate ([oteps#208](open-telemetry/oteps#208), [spec#3382](#3382), [spec#3710](#3710)). - Allow semantic convention resource modeling to progress ([spec#605](#605), [spec#559](#559), etc). --------- Co-authored-by: Tigran Najaryan <[email protected]> Co-authored-by: jack-berg <[email protected]> Co-authored-by: Arve Knudsen <[email protected]> Co-authored-by: David Ashpole <[email protected]>
This OTEP is part of the RUM/Client initiative.
Currently, we are missing a place to put important client information which applies to all telemetry emitted by an SDK. This information includes attributes such as session ID, language preference, locality/timezone, and other types of user data.
Normally, these attributes would be recorded as resources. However, on client processes, there are times when this information changes without the SDK re-initializing. For example:
In all of these cases, the application/SDK is not restarted. Currently, the resource associated with the SDK cannot be changed after it is started. This makes it very difficult to record these needed attributes.
This OTEP proposes a mechanism for updating the SDK with a new resource, which will be applied to all future telemetry created by the SDK. The proposal attempts to do this while preserving important characteristics already defined for resources:
If there are other backwards compatibility requirements for resources that I have missed, please let me know.
Cheers,
-Ted