Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit HTTP request method cardinality: canonical method (Option B/D) #34

Closed
wants to merge 4 commits into from

Conversation

lmolkova
Copy link
Contributor

Implements option b (or d) from #17

Copy link
Member

@joaopgrassi joaopgrassi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm favor of this option because of:

  1. http.request.method is kept "as-is". We don't add any automatic value to it in case it has "not known" methods like Get or xyz. Existing dashboards, queries customers may have on them will continue to work
  2. Addresses the issue with metric cardinality

Although I can see this may be a more general problem will appear in other attributes so I'm unsure if we should go ahead with it.

brief: 'HTTP request method.'
- id: request.canonical_method
type:
allow_custom_values: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this allow custom values though? If we want to offer this as a "control" for cardinality, I feel maybe we have to lock it down?

If the HTTP request method is not known to the instrumentation, it MUST set the `http.request.canonical_method` attribute to `other` and SHOULD
populate the exact method passed by client on `http.request.method` attribute.

HTTP method names are case-sensitive and `http.request.canonical_method` attribute value MUST match a standard (or documented elsewhere) HTTP method name exactly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we say it must match the enum we defined here? Otherwise it seems much like another open attribute if the standard is not clearly defined by OTel.

@trask
Copy link
Member

trask commented May 25, 2023

  1. Existing dashboards, queries customers may have on them will continue to work

we're already breaking these dashboards in the next http semconv release with the rename from http.method to http.request.method

also, I think the canonicalized http method may be better for dashboards (where you are often want to aggregate things better)

I think the primary use case that is hurt by the canonicalized value being the "default" (http.request.method) is the security use cases where you are specifically interested in invalid http.request.method values since those can be a sign of an attack

@lmolkova
Copy link
Contributor Author

lmolkova commented May 25, 2023

@joaopgrassi The problem with this approach is that high-quality instrumentations that avoid duplication would only put invalid methods into http.request.method. This PR is intentionally vague to hide this fact:)

If we populate http.request.method all the time, then we'd duplicate values on both http.request.method and http.request.canonical_method and it'll be a pure waste of resources for the absolute majority of requests.

Essentially this option makes everyone pay more for their telemetry with no clear benefit (compared to option A).

@joaopgrassi
Copy link
Member

joaopgrassi commented May 25, 2023

we're already breaking these dashboards in the next http semconv release with the rename from http.method to http.request.method

@trask Ah yes, good point forgot that was also changed recently 👍

I think the primary use case that is hurt by the canonicalized value being the "default" (http.request.method) is the security use cases where you are specifically interested in invalid http.request.method values since those can be a sign of an attack

But isn't it what the PR accommodates for today here:?

If the HTTP request method is not known to the instrumentation, it MUST set the http.request.canonical_method attribute to other and SHOULD populate the exact method passed by client on http.request.method attribute.

If there are attacks with invalid methods, those will surface and be populated on http.request.method, enabling such security use cases to know about them, no?

@lmolkova

The problem with this approach is that high-quality instrumentations that avoid duplication would only put invalid methods into http.request.method. This PR is intentionally vague to hide this fact:)

If we populate http.request.method all the time, then we'd duplicate values on both http.request.method and http.request.canonical_method and it'll be a pure waste of resources for the absolute majority of requests.

I 100% agree, we shouldn't duplicate tings. But if the goal is to surface invalid/malicious values I feel the way it is written today is enough?

What is not clear with the text in the PR is what to do with Get for example. Should instrumentation apply a case-insensitive comparison with the enum here in order to determine if the method is invalid or not and then populate http.request.method is so? To me, Get should not be considered invalid, but I may of course not have the full context on how many languages/frameworks deal with this.

@lmolkova
Copy link
Contributor Author

@joaopgrassi

But if the goal is to surface invalid/malicious values I feel the way it is written today is enough?

the problem is that with this PR we essentially reserve http.request.method for invalid methods and edge cases.

What is not clear with the text in the PR is what to do with Get for example.

HTTP method is case-sensitive, i.e. according to the standard Get is a custom method. If I send Get request to Spring app, it will return an error. If I record it as GET on the telemetry, it will be very confusing why the service returned 405 (method not allowed).

So the proposal is to allow instrumentations to take into account if their web framework supports case-insensitive methods and if yes, they should record Get as GET. I'm happy to clarify it better after we finish the discussion on the overall approach.

@lmolkova
Copy link
Contributor Author

Closing this proposal because of duplication concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants