Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Error: socket hang up" in Lambda #3670

Open
Max101 opened this issue Oct 2, 2023 · 25 comments
Open

"Error: socket hang up" in Lambda #3670

Max101 opened this issue Oct 2, 2023 · 25 comments
Assignees
Labels
bug Something isn't working

Comments

@Max101
Copy link

Max101 commented Oct 2, 2023

Hi I can see this issue has popped up a few times in the past but it seems like its been resolved so I am opening a new issue.

We are experiencing multiple Error: socket hang up in traces BUT not in logs. Our lambda finishes successfully, and there are no errors in the logs. However, where the issue is quite visible, is in APM. We have thousands of similar logs across most of our services.

image

We went to analyze our code and really cannot seem to find an issue. Additionally if this were an issue in our code, it would break, no?

We are on Lambda
using NodeJS nodejs16.x
Installed library version is [email protected]
Installed DD constructs "datadog-cdk-constructs-v2": "1.7.4",

We are using SST v2 (Serverless Stack) to deploy our lambda code

Our DD config looks like this

const dd = new Datadog(stack, `${stack.stackName}-datadog`, {
    nodeLayerVersion: 91, // Releases: https://github.com/DataDog/datadog-lambda-js/releases
    addLayers: true,
    extensionLayerVersion: 43, // Releases: https://github.com/DataDog/datadog-lambda-extension/releases
    captureLambdaPayload: true,
    enableColdStartTracing: true,
    apiKey: process.env.DATADOG_API_KEY,
    site: 'datadoghq.com',
    enableDatadogTracing: true,
    enableDatadogLogs: true,
    injectLogContext: true,
    env: process.env.NODE_ENV,
    service: stack.stackName,
    version: getDeploymentId(),
  } satisfies DatadogPropsV2);
@Max101 Max101 added the bug Something isn't working label Oct 2, 2023
@Harmonickey
Copy link

Harmonickey commented Nov 20, 2023

We're seeing this too, we're just executing in a normal NodeJS context outside of lambda, so perhaps it's more widespread. However, it is throwing an exception to the caller as well so it crashes the executing context for us.

using NodeJS nodejs20.9.0
Installed library version is [email protected]

Error: socket hang up at connResetException (node:internal/errors:721:14) 
    at TLSSocket.socketOnEnd (node:_http_client:519:23) 
    at TLSSocket.emit (node:events:526:35) 
    at TLSSocket.emit (node:domain:488:12) 
    at TLSSocket.emit (/app/node_modules/dd-trace/packages/datadog-instrumentations/src/net.js:61:25) 
    at endReadableNT (node:internal/streams/readable:1408:12) 
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21)

@tlhunter
Copy link
Member

@Harmonickey do you know what lead to exception? For example, an outgoing request?

@viict
Copy link

viict commented Jan 26, 2024

We are also suffering from this issue, although we are not hitting it only on lambdas.
Most of the time it is related to dynamodb calls.

We are using ddtrace-js v5.1.0

Error: socket hang up
    at connResetException (node:internal/errors:720:14)
    at TLSSocket.socketOnEnd (node:_http_client:525:23)
    at TLSSocket.emit (node:events:529:35)
    at TLSSocket.emit (/usr/src/api/node_modules/dd-trace/packages/datadog-instrumentations/src/net.js:69:25)
    at endReadableNT (node:internal/streams/readable:1400:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21)

@luxion-lennart
Copy link

luxion-lennart commented Jan 26, 2024

We are also seeing this issue, we see it from lambda doing POST calls to DynamoDB.

Error: socket hang up
    at connResetException (node:internal/errors:720:14)
    at TLSSocket.socketCloseListener (node:_http_client:474:25)
    at TLSSocket.emit (node:events:529:35)
    at TLSSocket.emit (/opt/nodejs/node_modules/dd-trace/packages/datadog-instrumentations/src/net.js:61:25)
    at node:net:350:12
    at TCP.done (node:_tls_wrap:614:7)
    at TCP.callbackTrampoline (node:internal/async_hooks:130:17)

CDK Construct: v1.8.0
Extension version: 48

@carmargut
Copy link

Same issue here. Not from a lambda, just when doing dynamodb calls.
Using 5.2.0

Error: socket hang up
    at connResetException (node:internal/errors:720:14)
    at TLSSocket.socketOnEnd (node:_http_client:525:23)
    at TLSSocket.emit (node:events:529:35)
    at TLSSocket.emit (/app/node_modules/dd-trace/packages/datadog-instrumentations/src/net.js:61:25)
    at endReadableNT (node:internal/streams/readable:1400:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21)

@IgnacioFerreras
Copy link

Hi we are suffering it too. using 3.33.0
Error: socket hang up at connResetException (node:internal/errors:705:14) at TLSSocket.socketCloseListener (node:_http_client:467:25) at TLSSocket.emit (node:events:525:35) at TLSSocket.emit (/var/task/node_modules/dd-trace/packages/datadog-instrumentations/src/net.js:61:25) at node:net:301:12 at TCP.done (node:_tls_wrap:588:7) at TCP.callbackTrampoline (node:internal/async_hooks:130:17)

@astuyve
Copy link
Collaborator

astuyve commented Feb 1, 2024

All from the AWS-SDK and all doing dynamo calls? Which version of the aws-sdk is everyone using?

@viict
Copy link

viict commented Feb 1, 2024

@astuyve here we have 2 different versions:

    "@aws-sdk/client-dynamodb": "^3.387.0",
    "@aws-sdk/lib-dynamodb": "^3.387.0",
    "@aws-sdk/smithy-client": "^3.374.0",

and

   "@aws-sdk/client-dynamodb": "=3.40.0",
    "@aws-sdk/lib-dynamodb": "=3.40.0",
    "@aws-sdk/smithy-client": "=3.40.0",

@luxion-lennart
Copy link

@astuyve We use version 3.362.0, that is provided by the lambda nodejs runtime.

@carmargut
Copy link

@astuyve Here you have:

"@aws-sdk/client-dynamodb": "3.474.0",
"@aws-sdk/util-dynamodb": "3.474.0",

@astuyve
Copy link
Collaborator

astuyve commented Feb 1, 2024

So far everyone is using the v3 sdk, has anyone reproduced this with v2?

@viict
Copy link

viict commented Feb 20, 2024

@astuyve can we do something for v3 meanwhile no one with v2 answers here? 🙏🏻

@astuyve
Copy link
Collaborator

astuyve commented Feb 20, 2024

Hi @viict - I'm not sure there's something specific we can do right now. I was hoping someone could replicate with AWS SDK v2 or demonstrate definitively that ddtrace is causing this issue.

Instead, it seems that ddtrace is recording that the tcp connection was closed by the server without a response. I noticed other users reporting the same issue. The aws-sdk author also closed this issue as something that can happen.

I could certainly be wrong here, but I'm still not sure what exactly we'd change in this project at this time.

Does anyone have a minimally reproducible example? Does removing dd-trace solve this definitively? Does this impact application code, or is it successful on retries?

Thanks!

@viict
Copy link

viict commented Feb 22, 2024

@astuyve oh I understand that of course. I'll see what I can do to improve and share here as well if I'm able to answer any of these questions.

@Harmonickey
Copy link

Harmonickey commented Feb 27, 2024

@Harmonickey do you know what lead to exception? For example, an outgoing request?

It was an outgoing request from the dd-trace library to DataDog sending an 'info' message.

Here is my initial configuration in case that helps.

const httpTransportOptions = {
	host: 'http-intake.logs.datadoghq.com',
	path: `/v1/input/${environment.datadog.apiKey}?ddsource=nodejs&service=${service}`
		+ `&env=${environment.name}&envType=${isWorkerEnv ? 'work' : 'web'}`,
	ssl: true,
};

const logger = createLogger({
	level: 'info',
	exitOnError: false,
	format: format.json(),
	transports: [
		new transports.Http(httpTransportOptions),
	],
});

Then during runtime calling logger.info('some string message') is when it threw the exception. the message is a static string and it does not always throw.

Because I haven't seen this error in a while, I suspect it was due to DataDog intake servers just being overloaded? So the connection wasn't responded to quickly enough and threw the socket hang up error. Perhaps DataDog has fixed it since then and improved their response times.

@atif-saddique-deel
Copy link
Contributor

@tlhunter any updates here?
We are getting a lot socket hang up recently, we are using 4.34.0 version of dd-trace.

[HPM] ECONNRESET: Error: socket hang up
    at connResetException (node:internal/errors:720:14)
    at Socket.socketCloseListener (node:_http_client:474:25)
    at Socket.emit (node:events:529:35)
    at Socket.emit (node:domain:552:15)
    at Socket.emit (/usr/src/app/node_modules/@letsdeel/init/node_modules/dd-trace/packages/datadog-instrumentations/src/net.js:69:25)
    at TCP.<anonymous> (node:net:350:12)
    at TCP.callbackTrampoline (node:internal/async_hooks:128:17) {
  code: 'ECONNRESET'
}

@antamb
Copy link

antamb commented Apr 30, 2024

We are also experiencing this issue using:
"dd-trace": "^5.6.0"

Error: socket hang up
    at Socket.socketOnEnd (node:_http_client:524:23)
    at Socket.emit (node:events:530:35)
    at Socket.emit (/opt/nodejs/node_modules/dd-trace/packages/datadog-instrumentations/src/net.js:69:25)
    at endReadableNT (node:internal/streams/readable:1696:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21)

@atif-saddique-deel
Copy link
Contributor

we are having the same issue with latest version of dd-trace v4.36.0

@dvictory
Copy link

Did you switch from node18 to node20? in Node 19 they changed the keep alive default - https://nodejs.org/en/blog/announcements/v19-release-announce#https11-keepalive-by-default
Leading to a number of issues: Some outlined here: nodejs/node#47130

we see this around calls to AWS services, sns, sqs, etc. (all self heal with the SDK retry logic). What it unclear to me is if this is an from dd-trace error or is dd-trace just logging the issue from the aws call?

Error: socket hang up
    at TLSSocket.socketOnEnd (node:_http_client:524:23)
    at TLSSocket.emit (node:events:530:35)
    at TLSSocket.emit (/opt/nodejs/node_modules/dd-trace/packages/datadog-instrumentations/src/net.js:69:25)
    at endReadableNT (node:internal/streams/readable:1696:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21)

Here is the info tab from this same raw error:
Screenshot 2024-05-19 at 11 24 26 PM

@saibotsivad
Copy link

@astuyve we are experiencing the same problem but not related to an AWS SDK issue, and I've been able to track it down to a timeout on an API call.

We are using axios for requests, so the package.json file has:

{
  "dependencies": {
    // ...
    "axios": "^1.6.7",
    // ...
    "datadog-lambda-js": "^7.96.0",
    "dd-trace": "^4.26.0",
    // ...
    "serverless": "^3.38.0",
    // ...
  },
  "devDependencies": {
    // ...
    "serverless-plugin-datadog": "^5.56.0",
    // ...
  },
  // ...
}

We deploy with serverless which has:

# ...
frameworkVersion: '3'
plugins:
  - serverless-plugin-datadog
provider:
  name: aws
  architecture: arm64
  runtime: nodejs16.x
custom:
  version: '1'
  datadog:
    addExtension: true
    apiKey: ${env:DD_API_KEY, ''}
    service: public-charging-api
    env: ${opt:stage}
    version: ${env:DD_VERSION, ''}
    enableDDTracing: true
# ...

We have some API call that uses axios in a pretty normal way, like this:

const response: AxiosResponse = await axios.request({
  method: 'GET',
  url,
  headers: { authorization },
  timeout: 20000,
});

(That's wrapped in a try/catch, so we know exactly what we are logging in any case.)

Functionally: we have a Lambda that makes ~50 HTTP requests in a very amount of time, and sometimes a dozen of them will take too long to resolve, so in that Lambda execution we are timing out those requests.

For every request that is aborted by axios due to timeout, we are getting this "Error: socket hang up" log.

image image

The "third party frames" makes me suspect that it's the DataDog layer adding these.

@astuyve
Copy link
Collaborator

astuyve commented May 23, 2024

Thanks Tobias!! that's a great clue, @tlhunter any thoughts here?

@rockymadden
Copy link

I can confirm @saibotsivad's observations as well.

@chris-sidestep
Copy link

We are getting this same issue with EventBridge calls on Node18 lambdas. Lambdas execute with no issues, but dd-trace throws up the same 'socket hang up' error in our Traces

@gregoryorton-ws
Copy link

We're getting this error running in EKS with DD JS version v5.12.0 which is causing our health checks to fail because it's taking > 3 seconds to finish a request. The root cause is a delay before this socket hang up.

CleanShot 2024-09-27 at 11 58 08

@joonatanvanhala
Copy link

joonatanvanhala commented Oct 23, 2024

Any updates on this? facing the same issue

@tlhunter tlhunter assigned tlhunter and unassigned astuyve Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests