-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
revise Datadog trace sampling configuration #10151
revise Datadog trace sampling configuration #10151
Conversation
✅ Deploy Preview for kubernetes-ingress-nginx canceled.
|
This issue is currently awaiting triage. If Ingress contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi @dgoffredo. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ok-to-test
Thanks for your contributions.
Error apparently was due to a github action problem, triggering here |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dgoffredo, rikatz The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thanks, @rikatz! |
/cherry-pick release-1.8 |
@strongjz: new pull request created: #10224 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Datadog customers have begun to report that trace sampling is not behaving as expected when using ingress-nginx.
With other Datadog integrations, the default sampling behavior is to consult the Datadog Agent for a sample rate, which changes dynamically. This way, trace volume can be centrally controlled. A customer may optionally specify a fixed sampling rate; but if they don't, the default behavior is to let the Datadog Agent figure it out.
I made a change in Datadog's library last March that changed the meaning of
sample_rate
in the library's configuration.sample_rate
corresponds toDatadogSampleRate
in ingress-nginx. Previously,sample_rate
was ignored by Datadog's library. This was a bug, but not a severe one, because the concept of "sampling rules" had since superceded whatsample_rate
used to configure. My change in March repurposedsample_rate
to mean "append a sampling rule that matches all traces."What I overlooked was the fact that ingress-nginx still uses
sample_rate
, and that it always specifies a value for it in/etc/nginx/opentracing.json
, defaulting to1.0
.This means that Datadog customers, since my change, have no way to say "use the rates calculated by the Datadog Agent." They can set
DatadogSampleRate
, and if they don't, they still get1.0
instead of the desired default behavior.The changes that I propose in this PR should have been proposed last March, but I didn't then notice this interaction.
These changes remove the
DatadogPrioritySampling
flag (which has not done anything for quite a long time), and change the type ofDatadogSampleRate
fromfloat32
to*float32
. This way, the default value isnil
rather than1.0
, and we can detect this when constructing/etc/nginx/opentracing.json
.Conditionally including
"sample_rate"
in the generated JSON required me to rearrange the code that produces/etc/nginx/opentracing.json
. Previously, the file content was chosen from one of multipletext/template
templates. Such templates cannot, as far as I know, express conditionally included text based on the value of a pointer. Instead, I useencoding/json
in a dedicated function to generate the Datadog JSON.This will change the default sampling behavior of the Datadog integration, which is something that I'd like to mention in ingress-nginx's release notes should these changes be merged in their current form.
Types of changes
Which issue/s this PR fixes
The issue was not with ingress-nginx, but with behavior change in ingress-nginx brought on by a change in dd-opentracing-cpp.
The concern was raised in Datadog's support channels.
How Has This Been Tested?
Manual integration testing involved a few Datadog-specific files:
DaemonSet
resource that uses the image built byagent.dockerfile
.Service
and a correspondingIngress
.Run
make dev-env
.Apply the files above (this involves
load
ing the built Docker image into the cluster, similarly to what is done inmake dev-env
).Now requests made to the host's port 80 will flow through the NGINX ingress to httpbin.
Edit the ingress controller's
Deployment
to expose the node's IP address. That's where the mock Datadog Agent will be listening (because it's aDaemonSet
, there's an instance on each node):Edit the ingress controller's
ConfigMap
to enable Datadog tracing:Verify that httpbin's
/headers
endpoint shows Datadog tracing propagation headers:Verify that the default
/etc/nginx/opentracing.json
omits"sample_rate"
:Edit the ingress controller's
ConfigMap
to specify an explicit sampling rate:Send another request to httpbin, and check the log output of the mock Datadog Agent. Verify that the tagged sampling rate (
_dd.rule_psr
) is as configured.Note that it's
0.42
, as expected.Checklist: