Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open questions around url.template and url.full #1095

Open
antonfirsov opened this issue May 30, 2024 · 26 comments
Open

Open questions around url.template and url.full #1095

antonfirsov opened this issue May 30, 2024 · 26 comments
Assignees
Labels
area:http enhancement New feature or request

Comments

@antonfirsov
Copy link

antonfirsov commented May 30, 2024

Area(s)

area:http

Is your change request related to a problem? Please describe.

#675 introduced a new http attribute url.template, which is supposed to be a low-cardinality variant of url.full, however I see several open questions around the expected behavior of uri.template in regards of implementation challanges and it's relation to url.full.

From .NET's HttpClient instrumentation POV we would need to have these questions resolved and the specification adjusted before considering implementing the attribute.

Relationship with url.full

According to the uri.full spec

Sensitive content provided in url.full SHOULD be scrubbed when instrumentations can identify it.

In my understanding an ideal scrubbing implementation would make url.full == url.template. If this is the case, do we still need both properties? If this is not the case, how exactly do we imagine url.full to look like after scrubbing?

url.template has been introduced to have a low cardinality url attribute, but does it mean that we are ok keeping url.full high-cardinality? Moreover, if an instrumented library is unable to scrub sensitive data is it the right choice to define url.full as Required from privacy POV?

Wouldn't it be wiser to rather merge the two properites and state that if an implementation is enlightened to templatize/sanitize url.full it should be emitted in a template form?

Redirects

According to the http client span spec

Instrumentations SHOULD create an HTTP span for each attempt to send an HTTP request over the wire.

in case a user provides a url.template and a redirect happens, what value do we expect to be logged for url.template for the redirect span? If the scrubbing mechanism for url.full is based on the template, what do we expect to be logged in url.full?

Describe the solution you'd like

One possible resolution of the listed problems is to define a standard templating syntax, allowing instrumented libraries to pattern-match a list of pre-defined templates to the actual uri, and log the template in url.full if a match is found. Do not log url.full, or log a placeholder value otherwise.

Another, simpler and cheaper solution is to:

  • Log the template in url.full for the first request.
  • Do not log url.full for redirects or log a placeholder value.

I don't see value keeping url.full and url.template separate with neither of my proposals. Moreover, regardless of the solution we pick, I'd recommend to make url.full Conditionally Required, allowing implementations to omit it if there is no scrubbing mechanism implemented.

/cc @klauco @trask @lmolkova @joaopgrassi

@lmolkova
Copy link
Contributor

lmolkova commented May 30, 2024

Relationship with url.full

From OTel side, we're more concerned with sensitive data in the query params - #860.

The generic scrubbing is not expected to templatize the URL, it's only expected to scrub sensitive params. We don't yet know how it'll look like (blocklist, allowlist, configurable, etc). Sanitized URL may look like this https://host:port/documents/42?api-version=1.2.3,foo=bar,secret=REDACTED.

Scrubbing process is not expected to remove all dynamic components from the url.full. The result of scrubbing is not intended to have low cardinality - it's not used on metrics.

Users benefit from having both - template and full url - full provides path and safe-to-record properties that are necessary for observability. It's great to also have url.template to group metrics.

TL;DR: url.template is a result of some transformation on url.full, but both are necessary for different reasons.

@lmolkova
Copy link
Contributor

Redirects

In the perfect world we should have span/metric measurement per each request, i.e. each request would have different template.

If it's not always the case, then I'd say url.template should should be consistent with the url.full reported on the span.

@antonfirsov
Copy link
Author

The result of scrubbing is not intended to have low cardinality - it's not used on metrics.

Does this mean that the addition of url.full to client metrics like http.client.request.duration will be never considered? Isn't it a problem to have high-cardinality attributes in distributed tracing?

Regarding url.template: considering #675 mentions metrics and the server metric has http.route, shouldn't url.template be added to the Http client metrics spec?

If it's not always the case, then I'd say url.template should should be consistent with the url.full reported on the span.

The implementation we are planning for .NET will omit both on redirects. I still think that url.full should be Conditionally Required given that it's not always possible to sanitize it.

@lmolkova
Copy link
Contributor

lmolkova commented Jun 3, 2024

Does this mean that the addition of url.full to client metrics like http.client.request.duration will be never considered? Isn't it a problem to have high-cardinality attributes in distributed tracing?

Correct, url.full cannot be added to metrics due to cardinality issues.
url.template is added to client metrics (in a separate table since it's experimental)

| [`url.template`](/docs/attributes-registry/url.md) | string | The low-cardinality template of an [absolute path reference](https://www.rfc-editor.org/rfc/rfc3986#section-4.2). [1] | `/users/{id}`; `/users/:id`; `/users?id={id}` | `Opt-In` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

@lmolkova
Copy link
Contributor

lmolkova commented Jun 3, 2024

The implementation we are planning for .NET will omit both on redirects. I still think that url.full should be Conditionally Required given that it's not always possible to sanitize it.

So there would be no information about the URL for redirects? Also, which span do you refer to when talking about redirects:

  • first request(s) resulting in 30X and location header?
  • the final request to the location?
  • one span describing the whole sequence?

@antonfirsov
Copy link
Author

antonfirsov commented Jun 3, 2024

one span describing the whole sequence

There is no such span in SocketsHttpHandler, only separate spans for each request.

which span do you refer to when talking about redirects

I meant the auto-redirect span(s) following the initial 30x response.

So there would be no information about the URL for redirects?

Actually, we are considering multiple sanitization mechanisms for .NET 9. One of them is to replace url.full with url.template.
This means that when redirect happens, the original url.template becomes invalid so logging that would be problematic. There will be other options available which will allow logging the url on redirects: log the uriginal one (for debugging), without query string, do not log.

Note that currently no attributes are emitted for the http spans, adding them is part of dotnet/runtime#93019. I would only consider adding url.full if we can do it in a safe and future-proof manner.

@lmolkova
Copy link
Contributor

lmolkova commented Jun 4, 2024

Actually, we are considering multiple sanitization mechanisms for .NET 9. One of them is to replace url.full with url.template.
This means that when redirect happens, the original url.template becomes invalid so logging that would be problematic. There will be other options available which will allow logging the url on redirects: log the uriginal one (for debugging), without query string, do not log.

Note that currently no attributes are emitted for the http spans, adding them is part of dotnet/runtime#93019. I would only consider adding url.full if we can do it in a safe and future-proof manner.

So If I understand correctly:

  • redirects
    • span1: url1, status_code: 30X
    • span2: url2, status_code: e.g. 200

there will be multiple modes allowing to:

  1. use template1 (instead of url) on span1, but not possible on span2
  2. use original url1 on span1 and original url2 on span2
    a. remove query string from URLs
    b. don't include URLs at all

I'd like to suggest the following:

  • if you can add a template, always add it as a url.template. If you can't add it - don't add it.
  • always add url.full on spans - it's required, without both - url and template, HTTP client spans are kinda useless?
    • if recording URL is not safe, record it as http(s)://host:port/REDACTED/REDACTED/REDACTED?a=REDACTED (or http(s)://host:port/*/*/*?a=*, etc
    • let users record full urls without sanitization - they can do sanitization later on in the telemetry pipeline

Essentially, don't consider url and template as mutually exclusive - they should be treated independently.

@antonfirsov
Copy link
Author

antonfirsov commented Jun 4, 2024

So If I understand correctly

Yes, you understood it correctly.

http(s)://host:port/REDACTED/REDACTED/REDACTED?a=REDACTED

Building that string does not feel to be worthy of the cost, it would be simpler just to emit http(s)://host:port/REDACTED. In some cases even http(s)://REDACTED might be desirable. But honestly, I don't see how does it provide more information to the user over just omitting url.full.

if you can add a template, always add it as a url.template

Do you mean by this that if a template is provided by the user, we should only add it as url.template and not substitute it into url.full? Should url.full be the http(s)://REDACTED thingie then? I don't see how/why is this better.

without both - url and template, HTTP client spans are kinda useless?

With proper usage, url and/or template would be present in most cases. Redirections are very rare in server to server communication. If they happen during a strict logging setup, the spans would still indicate their presence. Users can switch to less restrictive logging modes to debug them.

@noahfalk
Copy link
Contributor

noahfalk commented Jun 5, 2024

if you can add a template, always add it as a url.template. If you can't add it - don't add it.

By 'template' I assume this means only strings that we have a specific expectation are low cardinality right? For example if we had some hypothetical API like this:

HttpRequestMessage.SetLoggedUrl(string urlToLog)

Users could use that API to give any string they wanted, not necessarily a low cardinality one. I assume that string would not count as a template and we should log it as the sanitized url.full, not url.template?

if you can add a template ... If you can't

I'm not sure what determines whether we can or can't? It feels like the spec implies url.template should relate to the URL the request was sent to, but exactly what is allowable and who is being proposed to enforce that relationship wasn't clear to me. A couple examples:

  1. The app sends a request to "contuso.com/userId=50" and the app code specifies that url.template="USER_PAGE". Did the dev mess up? Was the http client expected to do verification on it? Perhaps this is perfectly legitimate use of url.template despite not being so close to the spec examples.
  2. The app sends a request to "contuso.com/userId=180" which is redirected to "new.contuso.org/billingapp/users/id=180". The app developer's code specified url.template = "/userId={id}" and it was invisible to that code that a redirect even happened. Is it legal for the 2nd span tracking the redirected request to also include the same url.template value as the first span?

@lmolkova
Copy link
Contributor

lmolkova commented Jun 5, 2024

Building that string does not feel to be worthy of the cost, it would be simpler just to emit http(s)://host:port/REDACTED. In some cases even http(s)://REDACTED might be desirable. But honestly, I don't see how does it provide more information to the user over just omitting url.full.

having url.full=https://host:port/REDACTED is still better than dropping it. It tells users:

  1. It's a valid HTTP span (along with other attributes), without url.full it's not a valid HTTP client span.
  2. There was url.full populated, it's not a bug of the instrumentation which forgot/failed to stamp it.
  3. There are some sanitization settings that are applied that resulted in redaction, users can explore how to change them

@lmolkova
Copy link
Contributor

lmolkova commented Jun 5, 2024

Do you mean by this that if a template is provided by the user, we should only add it as url.template and not substitute it into url.full? Should url.full be the http(s)://REDACTED thingie then? I don't see how/why is this better.

Yes, there is no relationship between url.template and the url.full sanitization in otel semconv. Why would there be? I want url.template to group URLs by in metrics and url.full to have details on the individual spans. I can sanitize or not sanitize URL full.

@lmolkova
Copy link
Contributor

lmolkova commented Jun 5, 2024

With proper usage, url and/or template would be present in most cases. Redirections are very rare in server to server communication. If they happen during a strict logging setup, the spans would still indicate their presence. Users can switch to less restrictive logging modes to debug them.

I don't know where the assumption about server applications (you can do .NET client code) or that redirects are rare is coming from. It's common to do redirects to regional endpoints, we have some services in Azure that do redirection to other services e.g. to object store to download something.
There are applications that have tons of redirects and others which have little.

I don't know how .NET will allow users to provide the url.template, but if they provided it and there was a redirect, why not set it on both - how do you know which one they wanted to group by? Or maybe users would want to provide a callback to templatize each URL?

@lmolkova
Copy link
Contributor

lmolkova commented Jun 5, 2024

@noahfalk
I hope #1095 (comment) explains the (lack of) relationship between URL sanitization and the template.

As for relationship between the URL itself and the template - there is no requirement or enforcement for them to be strictly related.
If user provides the template they can do absolutely anything and it's not .NET or other instrumentation responsibility to check it. User in this case would be the source of high cardinality or the mismatch between the URL and the template.

Could you please point to some things in the spec that feel wrong/insufficient?

  • is it the naming? is template word confusing creating the assumption that it has to be strictly related to URL? Should it be a wider thing? (I believe http.operation.name was also considered)
  • is it just lack of details? (how user can provide the template or how it can be extracted, what to do for redirects, etc?)
  • should we call out something about the cardinality?
  • something else?

@lmolkova
Copy link
Contributor

lmolkova commented Jun 5, 2024

BTW, is there a plan to add HttpRequestMessage.SetLoggedUrl API to control the value of url.full and/or url.template on spans and metrics?
it sounds very confusing to refer to tracing/metrics options as logging options or use "log url on span or metrics" in docs. Let's try to find a better terminology.

@noahfalk
Copy link
Contributor

noahfalk commented Jun 5, 2024

As for relationship between the URL itself and the template - there is no requirement or enforcement for them to be strictly related.
If user provides the template they can do absolutely anything and it's not .NET or other instrumentation responsibility to check it.

That answered it nicely, thanks! As for my confusion with the spec, its mostly because the spec doesn't say this. If you add similar text to the spec clarifying that the string can be anything and its completely user responsibility to set it and interpret it that covers most of it. I think it also would be nice one way or another to recommend whether url.template the user specifies for the initial request should be included on redirects.

BTW, is there a plan to add HttpRequestMessage.SetLoggedUrl API to control the value of url.full and/or url.template on spans and metrics?

I just made up a random name so I don't expect the API will be named that, but yes @antonfirsov is investigating APIs in that space. Given that url.template is currently marked experimental I suggested it might be more appropriate that our API for now is just about sanitizing the URL in url.full and url.template will be handled separately once it is considered stable in the future.

@lmolkova
Copy link
Contributor

lmolkova commented Jun 5, 2024

@noahfalk

thanks!
The spec does not say where the template comes from, it can come from the user, smart regex in the telemetry processing pipeline, etc.
E.g. Spring WebClient allows this:

 // GET http://abc.com/v1/accounts/43
 Mono<Account> result = client.get()
         .uri("/accounts/{id}", 43)
         .exchange()
         .then(response -> response.bodyToMono(Account.class));

We probably should do a better job documenting it, but OTel semconv does not apply any normative requirements to user input.

I agree on the redirect - we should clarify it.

Given that url.template is currently marked experimental I suggested it might be more appropriate that our API for now is just about sanitizing the URL in url.full and url.template will be handled separately once it is considered stable in the future.

One option to solve it is to let users specify both - the attribute name and the value. There is a common ask from users to be able enrich the HTTP span with something (standard or app-specific).

It'd be nice for users to be able to do something like

var httpRequest = new ...();
httpRequest.SetTelemetryContext({{"url.template":"/users/{id}"}, {"foo": "bar"}})

this removes responsibility from the native HTTP instrumentation to populate or sanitize these extra attributes. Related to open-telemetry/oteps#207 - OTel might define a common mechanism like this in the future.

@antonfirsov
Copy link
Author

this removes responsibility from the native HTTP instrumentation to populate or sanitize these extra attributes

This looks very similar to what we call enrichment (except the sanitization part). Enrichment is already supported for metrics in .NET and we plan to add support for distributed tracing.

@lmolkova
Copy link
Contributor

lmolkova commented Jun 5, 2024

This looks very similar to what we call enrichment (except the sanitization part). Enrichment is already supported for metrics in .NET and we plan to add support for distributed tracing.

Nice! Is it static or dynamic enrichment?

@noahfalk
Copy link
Contributor

noahfalk commented Jun 7, 2024

Dynamic enrichment. Its quietly hiding out here: https://source.dot.net/#System.Net.Http/System/Net/Http/Metrics/HttpMetricsEnrichmentContext.cs,78

Originally we were expecting the Microsoft.Extensions libraries were going to build some higher level API support over the top of it but that didn't happen in .NET 8 at least. The underlying mechanism exists now but isn't publicized much.

@lmolkova
Copy link
Contributor

lmolkova commented Jun 11, 2024

Dynamic enrichment. Its quietly hiding out here: https://source.dot.net/#System.Net.Http/System/Net/Http/Metrics/HttpMetricsEnrichmentContext.cs,78

Originally we were expecting the Microsoft.Extensions libraries were going to build some higher level API support over the top of it but that didn't happen in .NET 8 at least. The underlying mechanism exists now but isn't publicized much.

Cool! So effectively .NET does not have to do anything special to support url.template - it can be done by end users with enrichment both on metrics (since .NET 8) and on traces (starting .NET 9), correct?

@noahfalk
Copy link
Contributor

.NET 8 for metrics yes, .NET 9 conditional on enrichment for tracing APIs getting completed :)

@lmolkova
Copy link
Contributor

@antonfirsov @noahfalk thank you for the context and all the questions!

I created #1143 to clarify redirects.

Is there something else here or could we close the issue?

@antonfirsov
Copy link
Author

On top of addressing #1143, it would be great if the spec could somehow elabarote on the differences between url.template and redacted url.full. There are users who want to handle it with the same logic, and may view it as the same thing. /cc @klauco as an example.

@trask
Copy link
Member

trask commented Jun 11, 2024

I'm not sure I see the connection between url.template and redacted url.full (apologies though I haven't read all the comments above so probably missed some context)

in my view:

  • url.template is about having a way to split HTTP client metrics
  • redacting url.full is about removing sensitive data from HTTP client spans

@noahfalk
Copy link
Contributor

I think @antonfirsov means it is possible for those two concepts to be conflated because one possible way to generate the value for url.full is reuse the same value as url.template. In practice making a string low-cardinality has the effect of removing sensitive data.

I'm not aware the spec says anything that is specifically misleading. I think its a suggestion that we could make some proactive clarifications about how the goals of redacted url.full differ from url.template and why you might want each attribute to have a different value even though they don't have to be different.

@klauco
Copy link
Contributor

klauco commented Jun 12, 2024

I understand that URL sanitization and making URL as a low-cardinality dimension are two seemingly different requirements. However, I think that from the perspective of OTel they should be kept separate. But I am not so sure, if in reality, users will want to separately handle url redaction and url template setting, and in even worse case, handle it per telemetry pillar - separately for metrics, separately for traces, by using pillar-specific enrichment.

I think that it is safe to say that in most cases, setting a low-cardinality template for URL (or redirect URL) could be considered as a way of removing sensitive data. If users want to have the low cardinality template, and redact the URL differently i.e. by doing a one-way hashing of the sensitive data, it would be great to have that option to handle these cases separately. But in both cases, users should not be forced to set url template more than once for a given URL (in particular - they should not be forced to enrich url.template for metrics, traces, and potentially even for logs, separately. Even in OTel, the low-cardinality attribute is defined as part of "URL" scope, so semantically, it is a piece of information attached to the URL, which is reused across pillars.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:http enhancement New feature or request
Projects
Development

No branches or pull requests

7 participants