-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datadog chart makes bad decision on Service internalTrafficPolicy setting in K8s/EKS 1.22 #625
Comments
Hi @rodalli , maybe the doc is not up-to-date, but I check again the release note, the feature move to beta thanks to this PR: kubernetes/kubernetes#103462 I'm guessing the issue maybe something else. |
Yes, there is a Daemonset running, and the Service has the correct endpoints on each node in the cluster.
I know for sure it's not that the pods aren't there or are misconfigured or anything like that, as I've been troubleshooting this issue with Datadog support for over a week now. Now I'm starting to wonder if for some reason AWS didn't get the memo on |
As far as I can tell, the feature is default in 1.22. Even though the doc I originally linked says otherwise, the Feature Gate doc for v1.22 states that it's "beta" stage and enabled by default in 1.22. However, if somehow it's not, AWS EKS v1.22 doesn't enable it in it's feature flags. Here's the relevant API log message from my EKS 1.22 cluster
|
Regardless of what the default behavior is/isn't in EKS 1.22, it seems like bad design to force the setting of But, if for some reason the setting causes issues in a k8s cluster (like it appears to be doing in mine), the chart doesn't provide a values setting to control which way this gets set to |
We do have an option to disable the service creation. But we don't want to use the service with the "cluster" option. I let you read this comment that I made in another issue to explain why. If local traffic policy is not available, the 2 others solution the hostPort or the Uds socket. But it is very important to target the agent on the same node to get all the features working as expected. |
Gotcha, so this is actually a hard requirement. That makes sense. Back to the drawing board on why this doesn't seem to be working as expected in my EKS 1.22 cluster, I suppose. |
Unfortunately yes. could you please contact our support to better track the issue and have someone that try to reproduce the problem on EKS 1.22. 🙇 |
@rodalli Were you able to address this issue on EKS 1.22 in the end? |
No, Datadog Support team and I were not able to figure out the issue. I actually have a support case open with AWS now. Still no definitive answer yet, but it seems like it might have something to do with self-managed nodes vs. using EKS managed node groups (where |
We seem to be having similar problems to this issue. We believe that we're hitting a bug in Kubernetes which is causing it to not delete the conntrack entry for the traffic for a stale connection.
We can also reliably reproduce this issue in all of our kubernetes clusters, on both AWS and GKE. I have opened an issue with Datadog support too. |
@adrianmoisey did you happen to find anything? I think I'm hitting the same issue |
Yup, I think this bug is fixed in Kubernetes 1.29 with this PR: kubernetes/kubernetes#119394 Datadog made this change too soon, and should have made it configurable. |
The existence of the service is not configurable because it's harmless. It then depends on sender to use it or not. When senders are configured through our admission controller, you can use |
Describe what happened:
Running on EKS 1.22, the Datadog chart automatically enables the agent service's internal traffic policy for local routing, stating that the feature gate for this is beta and automatically enabled in K8s 1.22+. This is incorrect. The feature is still in
alpha
state in 1.22.This causes all requests to the agent service the Helm chart creates to fail in K8s 1.22, unless the
ServiceInternalTrafficPolicy
feature gate is enabled by the K8s admin.What's worse, this is impossible to do on EKS, as EKS does not support alpha feature gates at all, and it is not possible to enable them manually.
Describe what you expected:
The chart should have correct logic and should not deploy the Datadog agent Service resource with
spec.InternalTrafficPolicy: Local
when running on K8s 1.22. On this version of K8s, this should be an opt-in, not an opt-out.Steps to reproduce the issue:
agents.localService.*
. Full values file contents for the curious:spec.internalTrafficPolicy
value fromLocal
toCluster
:Additional environment details (Operating System, Cloud provider, etc):
AWS EKS 1.22 on Bottlerocket w/ containerd
The text was updated successfully, but these errors were encountered: