Running Gateway on AWS Lambda hits socket errors when loading managed schema #949

sdemjanenko · 2021-08-07T18:33:28Z

I run @apollo/gateway@^0.36.0 on AWS Lambda and it appears that the background Promise that fetches managed schema from uplink.api.apollographql.com can fail, putting the entire server on the lambda in a bad state. The only way this resolves itself is when the Lambda cools off and gets destroyed.

The errors that I observe in logs are:

This data graph is missing a valid configuration. More details may be available in the server logs.

An error occurred during Apollo Server startup. All GraphQL requests will now fail. The startup error was: An error occurred while fetching your schema from Apollo: request to https://uplink.api.apollographql.com/ failed, reason: Client network socket disconnected before secure TLS connection was established

Note: I've seen reasons

Client network socket disconnected before secure TLS connection was established
socket hang up

This mainly seems to happen when the lambda experiences an error (e.g. throwing AuthenticationError in context, an query result validation error or even a lambda timeout).

I was partially able to mitigate this providing a fetcher to ApolloGateway to force the connection: close header to prevent connection reuse. Despite this change, the error still happens. My current working theory is that the background Promise to load the managed schema is running when the Lambda is frozen (because callbackWaitsForEmptyEventLoop is set to false)

callbackWaitsForEmptyEventLoop – Set to false to send the response right away when the callback runs, instead of waiting for the Node.js event loop to be empty. If this is false, any outstanding events continue to run during the next invocation.

https://docs.aws.amazon.com/lambda/latest/dg/nodejs-context.html

It appears that this may be a bug with AWS Lambda itself: aws/aws-sdk-js#3591

Given that Lambda can freeze when the client response is complete (and we shouldn't delay the client response to refresh the managed schema), does it make sense to turn off the refresh mechanism in Lambda or implement it as a CI/CD step?

The text was updated successfully, but these errors were encountered:

Nicoowr · 2021-08-13T14:11:12Z

Same error here, we had to add schemaConfigDeliveryEndpoint: null to the gateway config to keep the old system.

moraisp · 2021-09-01T11:50:51Z

I also have the same error.
I understand this is a common problem with lambda.
Do we have a way to manually trigger the managed schema update in a way that allow us to wait for it to complete before exiting the lambda?

lassesteffen · 2021-09-21T13:01:40Z

Same error for me, running on Node 14

MaLub · 2021-11-09T17:01:39Z

Same here - it works well for a while and that it breaks down. But afterwards runs for days.
Try to set the experimental_pollInterval do a few minutes - hopefully finishing the Lambda before fetch the managed federation config again.

adikari · 2022-04-17T14:23:18Z

any updates on this from the team? I am getting this error constantly

UplinkFetcher failed to update supergraph with the following error: An error occurred while fetching your schema from Apollo: request to https://uplink.api.apollographql.com/ failed, reason: socket hang up

c0pp3rt0p · 2022-05-09T12:28:06Z

Just to add my findings to this as well. MFH will not retry on post methods and that is what is being used when fetching the remote SDLs from the subgraphs with federation. Even with retries you could still find yourself in this strange state that occurs when the failure happens. So adding retries is not a silver bullet.

trevor-scheer added 2021-10 component/gateway runtime labels Oct 4, 2021

Nicoowr mentioned this issue Jan 19, 2022

Gateway fetching uplink indefinitely #1405

Closed

Nicoowr mentioned this issue Jan 26, 2022

Gateway fetching uplink indefinitely apollographql/apollo-server#6036

Closed

hwillson removed the 2021-10 label Mar 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running Gateway on AWS Lambda hits socket errors when loading managed schema #949

Running Gateway on AWS Lambda hits socket errors when loading managed schema #949

sdemjanenko commented Aug 7, 2021

Nicoowr commented Aug 13, 2021 •

edited

Loading

moraisp commented Sep 1, 2021

lassesteffen commented Sep 21, 2021

MaLub commented Nov 9, 2021 •

edited

Loading

adikari commented Apr 17, 2022

c0pp3rt0p commented May 9, 2022

Running Gateway on AWS Lambda hits socket errors when loading managed schema #949

Running Gateway on AWS Lambda hits socket errors when loading managed schema #949

Comments

sdemjanenko commented Aug 7, 2021

Nicoowr commented Aug 13, 2021 • edited Loading

moraisp commented Sep 1, 2021

lassesteffen commented Sep 21, 2021

MaLub commented Nov 9, 2021 • edited Loading

adikari commented Apr 17, 2022

c0pp3rt0p commented May 9, 2022

Nicoowr commented Aug 13, 2021 •

edited

Loading

MaLub commented Nov 9, 2021 •

edited

Loading