-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datadog-Agent service isn't being created and thus APM isn't working... #527
Comments
I would point you to here in the code where the chart is not naming the service correctly.
This should be traceport if you want it to actually work. Also some documentation about this setting would be nice as this is why the service wasn't being created for me automatically like the daemonset is doing... helm-charts/charts/datadog/values.yaml Line 1228 in 24a7912
|
Until you get the chart fixed if anyone else is having this problem you can pull the agent-services.yaml out of the daemonset yaml and change the name of the apm service port to traceport and apply it and things will start working...
|
Hi @jurschel could you share the Thanks |
helm install datadog-agent -f values.yaml --set datadog.clusterName='non-prod' --set datadog.site='datadoghq.com' --set datadog.apiKey='xxx' datadog/datadog Kubernetes Version 1.21.3
|
And the settings in the values yaml for apm don't enable it to work because the service is never created unless you do the force setting and when it is created it's created with the wrong name apm vs traceport which if you look in the node agent pods is what the trace-agent service is sending all it's traffic to... Also we are using .NET tracers so the UDS method of socket isn't available to us per the NOTE in the install documentation. I tried it anyway just didn't work. |
thanks for the information.
To give more context, the service need to be configured with If you are using Kubernetes version < 1.22, the setup that you describe you will need to use a communication with an hostPort binded by the Datadog agent. datadog:
dogstatsd:
useHostPort: true
apm:
portEnabled: true For the second issue that you have mentioned about the port name in the service: helm-charts/charts/datadog/templates/_container-trace-agent.yaml Lines 18 to 23 in f1820b7
and in the service with name it helm-charts/charts/datadog/templates/agent-services.yaml Lines 90 to 93 in f1820b7
|
My version is up there in my post. I edited it right after I posted. 1.21.3. So is what your telling me that I have to run 2 copies of the agent? Which I was doing and which was working. But that obviously costs me 2x the money in agent fees... If I went with the daemonset install of this agent it would infact install the service I have manually had to install. So I don't quite understand the version difference... Nowhere does any of it say ONLY if your on 1.22 or above that I saw on any of the install instructions on datadoghq.com. I get that deep in the notes of the chart it mentions it but there is a disparity between the helm and daemonset installs I believe. |
Sorry if my previous answers weren't clear. I think there is a misunderstanding. To better understand your previous setup, and why deploying the agent with the chart doesn't work for you, could you please answers these questions:
In parallel, to try better explain how the agent should be deployed on kubernetes with the helm chart The recommended agent deployment on Kubernetes is with only one agent (pod) per Node/Host. It is what the
With the |
Thanks for the questions. Initially, I just had a standard ubuntu node agent installed on these cluster nodes. However, I wasn't getting all the information I desired. So, I went ahead and used the helm chart install all per the instructions. APM worked... Obviously, because there is an Ubuntu NODE agent installed on the machine answering on port 8126... However, when I removed that agent APM stopped working but the helm installed agents etc all still were sending logs just APM stopped. Now that there is no agent on the node answering at the hostIP port on 8126 it ceases to function... I think either you have to state in the documentation that it REQUIRES two agent installs to get APM to work right or explain exactly how it is supposed to work if there is no agent on the hostIP that is returned by the downward API... When I did a netstat and grep on 8126 I did see dynamic node ports assigned on the hostIP that were from the k8s pod on a 10. address at port 8126. But these are random dynamic ports that will never match hostIP:8126. |
Sorry the PR was closed due to the github automation. And thanks for the explanation. it is more clear now. the good new is that the helm chart already handle it 😃 having 2 agents is definitely not needed To have the agent deployed with the helm check binding the host port 8126. you just need to enable it with the option: datadog:
apm:
portEnabled: true it should work as it was the case when the agent was installed on the ubuntu host. In addition, if you want the same visibility on the host you can also use the option Let me know if this recommandation solved the issue? |
Thanks for responding. What I guess your missing is I've already done that portEnabled: true and it doesn't work without a NODE agent installed. I can try it with the agents.useHostNetwork: true as that hasn't been tried. That might be the key as it sounds like that would force the node agent to use the host networking instead of the pod networking. Standby I'll try that. |
Interestingly enough it looks like traces are still flowing even though now the trace-agent sidecar is failing it's liveness check... Warning Unhealthy 106s (x16 over 6m46s) kubelet Liveness probe failed: dial tcp 192.168.4.23:8126: connect: connection refused root@platform908:~# netstat |grep 8126 |
Nope. So far the only thing that works with the chart I've got is enable the APM settings portEnabled: true and set the agents.localService.forceLocalServiceEnabled to true as well as apply a service file that changes the name of the tcp port to traceport. Then APM works and doesn't fail liveness checks. I get the feeling that this is being tested on nodes that have the full host agent installed on the nodes because if you were doing what I'm doing you would see it doesn't work unless you do it the way I'm doing it... |
I think we are getting closer to solution. We will need an Also if you can check in the Daemonset spec if the Thanks |
You can review the several flare's I've sent up over this problem in this support ticket. 656356 |
As an update here it appears that the root cause of these problems might actually be related to a calico open issue. |
Hi all, I heard back from Google - The following change within the CNI plugin is the root-cause of the issue : The fix is going to be available on the below versions: I will close this for now, we can follow up in the upstream repo if necessary. |
Describe what happened:
Installed latest helm chart and APM will not function as I also only have the helm deployed datadog-agent and NO host level datadog-agent installed.
Describe what you expected:
I install the helm chart with the appropriate APM settings and then APM works.
Steps to reproduce the issue:
Ensure no host datadog-agent is installed on the node
Install the current helm chart with apm settings enabled
kubectl get service -A and see that no datadog-agent service exists for traceport:8126 traffic to flow to
Additional environment details (Operating System, Cloud provider, etc):
Bare-Metal
Ubuntu 20.04
Datadog-Agent 7.33.0
Helm Chart - 2.30.0
The text was updated successfully, but these errors were encountered: