Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disable_tracing_hostname for URL exclude lists not covering reasonable use cases #779

Closed
cnnradams opened this issue Jun 4, 2020 · 5 comments · Fixed by #872
Closed
Labels
good first issue Good first issue

Comments

@cnnradams
Copy link
Member

cnnradams commented Jun 4, 2020

There are two URL exclude lists for webservers:

disable_tracing_path

Disables paths based on a regex after (not including) the hostname. Should be able to include/exclude any combination of paths, but does not include hostname

disable_tracing_hostname

Disable path based on the exact URL being included in a list.

disable_tracing_hostname seems to be significantly less useful than disable_tracing_path, since if there is any URL parameters in the endpoint, eg http://site.com/abc?param=2 it will not have a use (specific parameter values can be disabled by disable_tracing_path, if desired). Personally I would support disable_tracing_hostname performing a regex on everything before the path, or if there are cases where we need a combination of hostname + path, it could regex the whole URL.

For example, if a user wanted to exclude all endpoints that were not called with http:// or https://, this could not be done with the current exclude lists. Or if the user wanted to exclude requests to admin.website.com/path but not website.com/path, this could also not be done.

There may also be value to using the endpoint pattern instead of the actual URL, ie for endpoint /hello/{helloid}/info a user could just add /hello/{helloid}/info to the exclude list instead of needing to write a regex. However this would also give disable_tracing_path less functionality with regards to excluding specific values passed in, so this would need some more thinking.

Thoughts?

@cnnradams cnnradams changed the title Endpoint exclude lists do not cover some reasonable patterns disable_tracing_hostname for exclude lists not covering reasonable use cases Jun 4, 2020
@cnnradams cnnradams changed the title disable_tracing_hostname for exclude lists not covering reasonable use cases disable_tracing_hostname for URL exclude lists not covering reasonable use cases Jun 4, 2020
@lzchen
Copy link
Contributor

lzchen commented Jun 4, 2020

I think originally, the logic taken from OpenCensus was to cover the use cases that were most common (users would match exact urls that they do not want to be traced with hostnames, or urls under a certain path would not want to be traced). I'm not to certain whether or not these included things like param. Regardless, if the new goal with OpenTelemetry exclude listing is to provide a mechanism in which to cover ALL uses cases, and not be very tedious for users to use (theoretically we can do exact matching for everything), then yes, the current design is lacking in some areas. However, I don't know if matching all use cases makes sense practicality wise (how often would someone want to exclude admin.website.com/path but not website.com/path.

Could this be as simple as removing disable_tracing_hostnames and change disable_tracing_path to cover the whole url instead of just the part after the domain/hostname?

Also, what did you mean by " ie for endpoint /hello/{helloid}/info a user could just add /hello/{helloid}/info to the exclude list instead of needing to write a regex."?
I believe we expect the user to only input strings to the list instead of regex?

@cnnradams
Copy link
Member Author

I think removing hostnames and making path extend to the whole URL will solve most reasonable use cases of exclude lists.

If the user isn't supposed to use regex then they can't generalize paths to be endpoints, for example if they wanted to block /hello/{helloid}/info but not /hello or /info, or block based on the value of a query parameter which could be in any order (eg, /path?a=1&b=2). The current implementation of disable_tracing_path supports regex (encoded as strings) so I assumed it was a designed feature. What I mentioned above is related to regex maybe not being the most end-user-friendly system for disabling tracing, so checking against the endpoint instead of the URL would resolve most use cases for regex.

@lzchen
Copy link
Contributor

lzchen commented Jun 5, 2020

@ocelotl Thoughts?

@ocelotl
Copy link
Contributor

ocelotl commented Jun 27, 2020

Well, after taking into consideration all the details like params and protocols, I am more inclined to use a single environment variable that uses a regex and let the user just write the regex to do the exclusions. It is the most powerful approach, since it should cover all possible cases. While regexes can be not too user friendly, I guess that in most of the cases they won't be too different from what we have now, for example, the regex to exclude abc is abc. I think that for other parts of the regex .* will do for most situations. We could compile them to minimize the performance hit that we would take by using regexes.

@lzchen
Copy link
Contributor

lzchen commented Jun 28, 2020

Sounds good. I think we'll use this as the desired solution.

@lzchen lzchen added good first issue Good first issue ext labels Jun 28, 2020
srikanthccv pushed a commit to srikanthccv/opentelemetry-python that referenced this issue Nov 1, 2020
* chore: fix API docs

* update WEB Readme: WebTracer => WebTracerProvider
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good first issue
Projects
None yet
3 participants