Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support configurable error codes in contrib/net/http #2882

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

nakkamarra
Copy link
Contributor

What does this PR do?

This PR adds a configurable WithStatusCheck(...) and isStatusError to contrib/net/http so that the ServeMux and TraceAndServe(...) can reference that to then decide whether or not to set an error tag on a span based on user defined function for status code checking.

Motivation

As outlined in #2876, I feel this makes things a bit more configurable in the case that someone wants to reporting some other status code as an error that isn't handled by current logic (500 - 600 status code) or wants to omit a specific status code in the current logic for some reason (i.e a service is expected to send a 500 or some other status in an expected case and don't want that to add to affect SLOs or some other contrived case)

Reviewer's Checklist

  • Changed code has unit tests for its functionality at or near 100% coverage.
  • System-Tests covering this feature have been added and enabled with the va.b.c-dev version tag.
  • There is a benchmark for any new code, or changes to existing code.
  • If this interacts with the agent in a new way, a system test has been added.
  • Add an appropriate team label so this PR gets put in the right place for the release notes.
  • Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild.

Unsure? Have a question? Request a review!

@nakkamarra nakkamarra requested review from a team as code owners September 19, 2024 23:55
@nakkamarra nakkamarra force-pushed the configurable-error-codes-nethttp branch from 376b457 to 9d6c3af Compare September 19, 2024 23:57
@nakkamarra nakkamarra force-pushed the configurable-error-codes-nethttp branch from 9d6c3af to bc82905 Compare September 20, 2024 00:12
@mtoffl01 mtoffl01 self-assigned this Sep 24, 2024
@mtoffl01
Copy link
Contributor

mtoffl01 commented Sep 25, 2024

Hello, thanks for your contribution!

We're currently working on support for two related features, DD_TRACE_HTTP_CLIENT_ERROR_STATUSES and DD_TRACE_HTTP_SERVER_ERROR_STATUSES. These environment variables allow customization over which HTTP "error" codes on HTTP client and HTTP server spans, respectively. Note: If set, the options will apply to all http-based integrations, not just net-http.

Would this address your use-case? Thanks!

@nakkamarra
Copy link
Contributor Author

Hello, thanks for your contribution!

We're currently working on support for two related features, DD_TRACE_HTTP_CLIENT_ERROR_STATUSES and DD_TRACE_HTTP_SERVER_ERROR_STATUSES. These environment variables allow customization over which HTTP "error" codes on HTTP client and HTTP server spans, respectively. Note: If set, the options will apply to all http-based integrations, not just net-http.

Would this address your use-case? Thanks!

Hey Mikayla, thanks for the response. Can you explain a bit further, i.e this environment variable would be something like DD_TRACE_HTTP_SERVER_ERROR_STATUSES=403,500,502 and we would need to declare each code that we want to error with? Also, do you have a timeline on when that change would land?

@mtoffl01
Copy link
Contributor

Hey @nakkamarra ,
Yes, your understanding is mostly correct, however, the environment variable also supports value ranges. So if, for example, you wanted all codes between 400 and 412 to be considered error, along with code 414, you could specify like so:
DD_TRACE_HTTP_SERVER_ERROR_STATUSES=400-412,414

The work is currently in progress, so ideally the change would land in the next dd-trace-go version, although we don't currently have an ETA for the release.

What do you think?

@nakkamarra
Copy link
Contributor Author

Hey @nakkamarra , Yes, your understanding is mostly correct, however, the environment variable also supports value ranges. So if, for example, you wanted all codes between 400 and 412 to be considered error, along with code 414, you could specify like so: DD_TRACE_HTTP_SERVER_ERROR_STATUSES=400-412,414

The work is currently in progress, so ideally the change would land in the next dd-trace-go version, although we don't currently have an ETA for the release.

What do you think?

@mtoffl01 Sorry for the late response. Yeah I do prefer to be able to do it programmatically rather than having to update the env vars of my datadog pods in my cluster, but if the plan is that this will then remove all of those similar isStatusError / WithStatusCheck functions from the other packages in the repo and this will be the only method for overriding error code decisioning then I guess it works for me.

@mtoffl01
Copy link
Contributor

mtoffl01 commented Oct 1, 2024

@nakkamarra , your preference makes sense, however, the work to support env var configuration is already in progress.
Adding a programatic option to toggle error codes for net-http is relatively trivial, so if we have the bandwidth to include this, we will do. Otherwise, we will keep you posted on the DD_TRACE_HTTP_SERVER_ERROR_STATUSES progress.

A note: DD_TRACE_HTTP_SERVER_ERROR_STATUSES will apply globally to all http [server] integrations. If the env var is configured in conjunction with an integration-level WithStatusCheck, the latter will take precedence for the given integration's spans.
E.g, you have DD_TRACE_HTTP_SERVER_ERROR_STATUSES=400 and the go-chi integration configured with WithStatusCheck(func(code int) bool { return statusCode == 404 }):

  • a go-chi request with status code 400 will not be labeled as an error
  • a go-chi request with status code 404 will be error
  • spans from any other http integration will error for only 400 status code

Let me know if you have additional questions.

@nakkamarra
Copy link
Contributor Author

@nakkamarra , your preference makes sense, however, the work to support env var configuration is already in progress. Adding a programatic option to toggle error codes for net-http is relatively trivial, so if we have the bandwidth to include this, we will do. Otherwise, we will keep you posted on the DD_TRACE_HTTP_SERVER_ERROR_STATUSES progress.

A note: DD_TRACE_HTTP_SERVER_ERROR_STATUSES will apply globally to all http [server] integrations. If the env var is configured in conjunction with an integration-level WithStatusCheck, the latter will take precedence for the given integration's spans. E.g, you have DD_TRACE_HTTP_SERVER_ERROR_STATUSES=400 and the go-chi integration configured with WithStatusCheck(func(code int) bool { return statusCode == 404 }):

  • a go-chi request with status code 400 will not be labeled as an error
  • a go-chi request with status code 404 will be error
  • spans from any other http integration will error for only 400 status code

Let me know if you have additional questions.

Oh I see, the plan is to keep the existing integration-level checks but default to the environment variables if they are set? And there's no plans to remove those integration-level checks after the fact? If that's the case then I'd definitely prefer that there is an option here to specify one for net/http like this PR does, as it solves my use case and doesn't require reconfiguring and redeploying the agents and operators in my cluster.

@mtoffl01
Copy link
Contributor

mtoffl01 commented Oct 3, 2024

Hey @nakkamarra ,
We need to think some more about how integration-level configs should interact with the global error statuses setting; I'll follow up here once I have more info.

@mtoffl01
Copy link
Contributor

Hi @nakkamarra ,
Just letting you know that work is in progress to eventually support this. Will update you when we have more information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants