-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Error: socket hang up" in Lambda #3670
Comments
We're seeing this too, we're just executing in a normal NodeJS context outside of lambda, so perhaps it's more widespread. However, it is throwing an exception to the caller as well so it crashes the executing context for us. using NodeJS nodejs20.9.0
|
@Harmonickey do you know what lead to exception? For example, an outgoing request? |
We are also suffering from this issue, although we are not hitting it only on lambdas. We are using ddtrace-js v5.1.0
|
We are also seeing this issue, we see it from lambda doing POST calls to DynamoDB.
CDK Construct: v1.8.0 |
Same issue here. Not from a lambda, just when doing dynamodb calls.
|
Hi we are suffering it too. using 3.33.0 |
All from the AWS-SDK and all doing dynamo calls? Which version of the aws-sdk is everyone using? |
@astuyve here we have 2 different versions:
and
|
@astuyve We use version 3.362.0, that is provided by the lambda nodejs runtime. |
@astuyve Here you have: "@aws-sdk/client-dynamodb": "3.474.0",
"@aws-sdk/util-dynamodb": "3.474.0", |
So far everyone is using the v3 sdk, has anyone reproduced this with v2? |
@astuyve can we do something for v3 meanwhile no one with v2 answers here? 🙏🏻 |
Hi @viict - I'm not sure there's something specific we can do right now. I was hoping someone could replicate with AWS SDK v2 or demonstrate definitively that ddtrace is causing this issue. Instead, it seems that ddtrace is recording that the tcp connection was closed by the server without a response. I noticed other users reporting the same issue. The aws-sdk author also closed this issue as I could certainly be wrong here, but I'm still not sure what exactly we'd change in this project at this time. Does anyone have a minimally reproducible example? Does removing dd-trace solve this definitively? Does this impact application code, or is it successful on retries? Thanks! |
@astuyve oh I understand that of course. I'll see what I can do to improve and share here as well if I'm able to answer any of these questions. |
It was an outgoing request from the dd-trace library to DataDog sending an 'info' message. Here is my initial configuration in case that helps.
Then during runtime calling logger.info('some string message') is when it threw the exception. the message is a static string and it does not always throw. Because I haven't seen this error in a while, I suspect it was due to DataDog intake servers just being overloaded? So the connection wasn't responded to quickly enough and threw the socket hang up error. Perhaps DataDog has fixed it since then and improved their response times. |
@tlhunter any updates here? [HPM] ECONNRESET: Error: socket hang up
at connResetException (node:internal/errors:720:14)
at Socket.socketCloseListener (node:_http_client:474:25)
at Socket.emit (node:events:529:35)
at Socket.emit (node:domain:552:15)
at Socket.emit (/usr/src/app/node_modules/@letsdeel/init/node_modules/dd-trace/packages/datadog-instrumentations/src/net.js:69:25)
at TCP.<anonymous> (node:net:350:12)
at TCP.callbackTrampoline (node:internal/async_hooks:128:17) {
code: 'ECONNRESET'
} |
We are also experiencing this issue using:
|
we are having the same issue with latest version of dd-trace |
Did you switch from node18 to node20? in Node 19 they changed the keep alive default - https://nodejs.org/en/blog/announcements/v19-release-announce#https11-keepalive-by-default we see this around calls to AWS services, sns, sqs, etc. (all self heal with the SDK retry logic). What it unclear to me is if this is an from dd-trace error or is dd-trace just logging the issue from the aws call?
|
@astuyve we are experiencing the same problem but not related to an AWS SDK issue, and I've been able to track it down to a timeout on an API call. We are using {
"dependencies": {
// ...
"axios": "^1.6.7",
// ...
"datadog-lambda-js": "^7.96.0",
"dd-trace": "^4.26.0",
// ...
"serverless": "^3.38.0",
// ...
},
"devDependencies": {
// ...
"serverless-plugin-datadog": "^5.56.0",
// ...
},
// ...
} We deploy with # ...
frameworkVersion: '3'
plugins:
- serverless-plugin-datadog
provider:
name: aws
architecture: arm64
runtime: nodejs16.x
custom:
version: '1'
datadog:
addExtension: true
apiKey: ${env:DD_API_KEY, ''}
service: public-charging-api
env: ${opt:stage}
version: ${env:DD_VERSION, ''}
enableDDTracing: true
# ... We have some API call that uses const response: AxiosResponse = await axios.request({
method: 'GET',
url,
headers: { authorization },
timeout: 20000,
}); (That's wrapped in a try/catch, so we know exactly what we are logging in any case.) Functionally: we have a Lambda that makes ~50 HTTP requests in a very amount of time, and sometimes a dozen of them will take too long to resolve, so in that Lambda execution we are timing out those requests. For every request that is aborted by axios due to timeout, we are getting this "Error: socket hang up" log. The "third party frames" makes me suspect that it's the DataDog layer adding these. |
Thanks Tobias!! that's a great clue, @tlhunter any thoughts here? |
I can confirm @saibotsivad's observations as well. |
We are getting this same issue with EventBridge calls on Node18 lambdas. Lambdas execute with no issues, but dd-trace throws up the same 'socket hang up' error in our Traces |
Any updates on this? facing the same issue |
Hi I can see this issue has popped up a few times in the past but it seems like its been resolved so I am opening a new issue.
We are experiencing multiple
Error: socket hang up
in traces BUT not in logs. Our lambda finishes successfully, and there are no errors in the logs. However, where the issue is quite visible, is in APM. We have thousands of similar logs across most of our services.We went to analyze our code and really cannot seem to find an issue. Additionally if this were an issue in our code, it would break, no?
We are on Lambda
using NodeJS
nodejs16.x
Installed library version is
[email protected]
Installed DD constructs
"datadog-cdk-constructs-v2": "1.7.4",
We are using SST v2 (Serverless Stack) to deploy our lambda code
Our DD config looks like this
The text was updated successfully, but these errors were encountered: