Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDSDataService "Communications link failure" when unpausing #2914

Closed
3 tasks done
thomasmichaelwallace opened this issue Oct 22, 2019 · 8 comments · Fixed by #2931
Closed
3 tasks done

RDSDataService "Communications link failure" when unpausing #2914

thomasmichaelwallace opened this issue Oct 22, 2019 · 8 comments · Fixed by #2931
Labels
bug This issue is a bug.

Comments

@thomasmichaelwallace
Copy link
Contributor

thomasmichaelwallace commented Oct 22, 2019

Confirm by changing [ ] to [x] below to ensure that it's a bug:

Describe the bug

Running AWS.RDSDataService.executeStatement against a auto-paused Serverless Aurora instance results in the following, unhandled, error:

BadRequestException: Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
    at Object.extractError (/Users/michael/Work/scratch/mysql/node_modules/aws-sdk/lib/protocol/json.js:51:27)
    at Request.extractError (/Users/michael/Work/scratch/mysql/node_modules/aws-sdk/lib/protocol/rest_json.js:55:8)
    at Request.callListeners (/Users/michael/Work/scratch/mysql/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
    at Request.emit (/Users/michael/Work/scratch/mysql/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
    at Request.emit (/Users/michael/Work/scratch/mysql/node_modules/aws-sdk/lib/request.js:683:14)
    at Request.transition (/Users/michael/Work/scratch/mysql/node_modules/aws-sdk/lib/request.js:22:10)
    at AcceptorStateMachine.runTo (/Users/michael/Work/scratch/mysql/node_modules/aws-sdk/lib/state_machine.js:14:12)
    at /Users/michael/Work/scratch/mysql/node_modules/aws-sdk/lib/state_machine.js:26:10
    at Request.<anonymous> (/Users/michael/Work/scratch/mysql/node_modules/aws-sdk/lib/request.js:38:9)
    at Request.<anonymous> (/Users/michael/Work/scratch/mysql/node_modules/aws-sdk/lib/request.js:685:12)

Apparently the Communications link failure is expected while compute is being automatically re-provisioned (see additional context), and the client is expected to retry with a delay.

Is the issue in the browser/Node.js?
Node.js

If on Node.js, are you running this on AWS Lambda?
Issue can be reproduced on local machine.

Details of the browser/Node.js version
v12.13.0

SDK version number
Example: v2.553.0

To Reproduce (observed behavior)

  1. Create a Serverless Aurora cluster instructions
  2. With the "pause compute capacity after consecutive minutes of inactivity" set to the minimum of 5 minutes
  3. Once commissioned wait 5-10 minutes for Aurora to scale to 0 capacity units (i.e. paused).
  4. Run minimal query script below (replacing ARNs as required) - it will fail with the error reported above.
  5. Wait ~30-60 seconds and re-run the query script - it will return a result.
import AWS from 'aws-sdk';

const db = new AWS.RDSDataService({ region: 'eu-west-1' });

async function main() {
  const sql = ;
  const result = await db
    .executeStatement({
      sql: 'select * from information_schema.tables;',
      resourceArn: 'DATABASE_ARN',
      secretArn: 'CONNECTION_SECRET_ARN',
    })
    .promise();
  console.log(JSON.stringify(result, null, 2));
}
main();

Expected behavior

I expected the aws-sdk to retry with a delay after receiving the BadRequestException.

Additional context

*A search across StackOverflow found a response from AWS support that the failure itself is expected behaviour (https://stackoverflow.com/questions/58192747/aws-aurora-serverless-communication-link-failure).

Clients are expected to wait and retry.

@thomasmichaelwallace thomasmichaelwallace added the bug This issue is a bug. label Oct 22, 2019
@thomasmichaelwallace
Copy link
Contributor Author

n.b. I'm happy to look into implementing a fix for this, if other's agree that the aws-sdk should be retrying.

@ajredniwja
Copy link
Contributor

Hey @thomasmichaelwallace,

Thank-you for reaching out, can you please provide the HTTP response for this failure?

@thomasmichaelwallace
Copy link
Contributor Author

thomasmichaelwallace commented Oct 22, 2019

Switching on the logging I can see that the log line is:

[AWS rdsdataservice 400 4.138s 0 retries].

With the following full request/response object as:

{
  request: {
    service: 'rdsdataservice',
    operation: 'executeStatement',
    params: {
      sql: '...',
      resourceArn: '...',
      secretArn: '...'
    }
  },
  success: false,
  response: null,
  error: {
    message: 'Communications link failure\n' +
      '\n' +
      'The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.',
    code: 'BadRequestException',
    time: '2019-10-22T21:00:00.000Z',
    requestId: '...',
    statusCode: 400,
    retryable: false,
    retryDelay: 100
  },
  attempt: 0,
  status: 400,
  latency: 100
}

@Martii
Copy link

Martii commented Oct 23, 2019

Would this particular issue occur if:

$ ping s3.amazonaws.com
ping: s3.amazonaws.com: Name or service not known

... was happening intermittently instead of resolving? i.e. "has not received any packets from the server" could be a non-existent DNS record or routing issue?

@thomasmichaelwallace
Copy link
Contributor Author

@Martii - I think it's an artefact of how Aurora Serverless works.

As far as I understand it, you get this error because the rdsdataservice endpoint for your database is always availble, but the underlying compute to respond may be in a paused state which can take ~30 seconds to resume.

AndrewKvalheim added a commit to AndrewKvalheim/typeorm-aurora-data-api-driver that referenced this issue Oct 27, 2019
@ajredniwja
Copy link
Contributor

ajredniwja commented Oct 29, 2019

@thomasmichaelwallace you are welcome to open a PR for our team to review.

@thomasmichaelwallace
Copy link
Contributor Author

Cool - @ajredniwja - I've given it a go. See #2931

@lock
Copy link

lock bot commented Jan 16, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.

@lock lock bot locked as resolved and limited conversation to collaborators Jan 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue is a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants