RDSDataService "Communications link failure" when unpausing #2914

thomasmichaelwallace · 2019-10-22T14:10:30Z

Confirm by changing [ ] to [x] below to ensure that it's a bug:

I've gone though Developer Guide and API reference
I've checked AWS Forums and StackOverflow* for answers
I've searched for previous similar issues and didn't find any solution

Describe the bug

Running AWS.RDSDataService.executeStatement against a auto-paused Serverless Aurora instance results in the following, unhandled, error:

BadRequestException: Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
    at Object.extractError (/Users/michael/Work/scratch/mysql/node_modules/aws-sdk/lib/protocol/json.js:51:27)
    at Request.extractError (/Users/michael/Work/scratch/mysql/node_modules/aws-sdk/lib/protocol/rest_json.js:55:8)
    at Request.callListeners (/Users/michael/Work/scratch/mysql/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
    at Request.emit (/Users/michael/Work/scratch/mysql/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
    at Request.emit (/Users/michael/Work/scratch/mysql/node_modules/aws-sdk/lib/request.js:683:14)
    at Request.transition (/Users/michael/Work/scratch/mysql/node_modules/aws-sdk/lib/request.js:22:10)
    at AcceptorStateMachine.runTo (/Users/michael/Work/scratch/mysql/node_modules/aws-sdk/lib/state_machine.js:14:12)
    at /Users/michael/Work/scratch/mysql/node_modules/aws-sdk/lib/state_machine.js:26:10
    at Request.<anonymous> (/Users/michael/Work/scratch/mysql/node_modules/aws-sdk/lib/request.js:38:9)
    at Request.<anonymous> (/Users/michael/Work/scratch/mysql/node_modules/aws-sdk/lib/request.js:685:12)

Apparently the Communications link failure is expected while compute is being automatically re-provisioned (see additional context), and the client is expected to retry with a delay.

Is the issue in the browser/Node.js?
Node.js

If on Node.js, are you running this on AWS Lambda?
Issue can be reproduced on local machine.

Details of the browser/Node.js version
v12.13.0

SDK version number
Example: v2.553.0

To Reproduce (observed behavior)

Create a Serverless Aurora cluster instructions
With the "pause compute capacity after consecutive minutes of inactivity" set to the minimum of 5 minutes
Once commissioned wait 5-10 minutes for Aurora to scale to 0 capacity units (i.e. paused).
Run minimal query script below (replacing ARNs as required) - it will fail with the error reported above.
Wait ~30-60 seconds and re-run the query script - it will return a result.

import AWS from 'aws-sdk';

const db = new AWS.RDSDataService({ region: 'eu-west-1' });

async function main() {
  const sql = ;
  const result = await db
    .executeStatement({
      sql: 'select * from information_schema.tables;',
      resourceArn: 'DATABASE_ARN',
      secretArn: 'CONNECTION_SECRET_ARN',
    })
    .promise();
  console.log(JSON.stringify(result, null, 2));
}
main();

Expected behavior

I expected the aws-sdk to retry with a delay after receiving the BadRequestException.

Additional context

*A search across StackOverflow found a response from AWS support that the failure itself is expected behaviour (https://stackoverflow.com/questions/58192747/aws-aurora-serverless-communication-link-failure).

Clients are expected to wait and retry.

The text was updated successfully, but these errors were encountered:

thomasmichaelwallace · 2019-10-22T14:12:40Z

n.b. I'm happy to look into implementing a fix for this, if other's agree that the aws-sdk should be retrying.

ajredniwja · 2019-10-22T21:12:45Z

Hey @thomasmichaelwallace,

Thank-you for reaching out, can you please provide the HTTP response for this failure?

thomasmichaelwallace · 2019-10-22T21:23:07Z

Switching on the logging I can see that the log line is:

[AWS rdsdataservice 400 4.138s 0 retries].

With the following full request/response object as:

{
  request: {
    service: 'rdsdataservice',
    operation: 'executeStatement',
    params: {
      sql: '...',
      resourceArn: '...',
      secretArn: '...'
    }
  },
  success: false,
  response: null,
  error: {
    message: 'Communications link failure\n' +
      '\n' +
      'The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.',
    code: 'BadRequestException',
    time: '2019-10-22T21:00:00.000Z',
    requestId: '...',
    statusCode: 400,
    retryable: false,
    retryDelay: 100
  },
  attempt: 0,
  status: 400,
  latency: 100
}

Martii · 2019-10-23T07:48:53Z

Would this particular issue occur if:

$ ping s3.amazonaws.com
ping: s3.amazonaws.com: Name or service not known

... was happening intermittently instead of resolving? i.e. "has not received any packets from the server" could be a non-existent DNS record or routing issue?

thomasmichaelwallace · 2019-10-23T08:18:53Z

@Martii - I think it's an artefact of how Aurora Serverless works.

As far as I understand it, you get this error because the rdsdataservice endpoint for your database is always availble, but the underlying compute to respond may be in a paused state which can take ~30 seconds to resume.

ajredniwja · 2019-10-29T22:12:50Z

@thomasmichaelwallace you are welcome to open a PR for our team to review.

thomasmichaelwallace · 2019-11-01T20:04:30Z

Cool - @ajredniwja - I've given it a go. See #2931

lock · 2020-01-16T22:51:01Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.

thomasmichaelwallace added the bug This issue is a bug. label Oct 22, 2019

AndrewKvalheim added a commit to AndrewKvalheim/typeorm-aurora-data-api-driver that referenced this issue Oct 27, 2019

Work around aws/aws-sdk-js#2914.

6c6208b

thomasmichaelwallace mentioned this issue Nov 1, 2019

Retry on RDSDataService Communications link failure #2931

Merged

6 tasks

AllanZhengYP closed this as completed in #2931 Jan 9, 2020

coyoteecd mentioned this issue Jan 13, 2020

Frequent BadRequestException: Communications link failure jeremydaly/data-api-client#11

Closed

lock bot locked as resolved and limited conversation to collaborators Jan 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RDSDataService "Communications link failure" when unpausing #2914

RDSDataService "Communications link failure" when unpausing #2914

thomasmichaelwallace commented Oct 22, 2019 •

edited

Loading

thomasmichaelwallace commented Oct 22, 2019

ajredniwja commented Oct 22, 2019

thomasmichaelwallace commented Oct 22, 2019 •

edited

Loading

Martii commented Oct 23, 2019

thomasmichaelwallace commented Oct 23, 2019

ajredniwja commented Oct 29, 2019 •

edited

Loading

thomasmichaelwallace commented Nov 1, 2019

lock bot commented Jan 16, 2020

RDSDataService "Communications link failure" when unpausing #2914

RDSDataService "Communications link failure" when unpausing #2914

Comments

thomasmichaelwallace commented Oct 22, 2019 • edited Loading

thomasmichaelwallace commented Oct 22, 2019

ajredniwja commented Oct 22, 2019

thomasmichaelwallace commented Oct 22, 2019 • edited Loading

Martii commented Oct 23, 2019

thomasmichaelwallace commented Oct 23, 2019

ajredniwja commented Oct 29, 2019 • edited Loading

thomasmichaelwallace commented Nov 1, 2019

lock bot commented Jan 16, 2020

thomasmichaelwallace commented Oct 22, 2019 •

edited

Loading

thomasmichaelwallace commented Oct 22, 2019 •

edited

Loading

ajredniwja commented Oct 29, 2019 •

edited

Loading