Skip to content

Commit

Permalink
Add retries to upload logic
Browse files Browse the repository at this point in the history
When uploading large amounts of data to S3, we occasionally see failures where the AWS sdk tries to use a closed network connection. The upstream bug appears to be aws/aws-sdk-go#3406.  I'm not sure why the error manifests but it's causing us significant pain. Rather than retry the entire base backup, we'll retry the WAL segment upload.

```
ERROR: 2020/08/06 08:39:52.198782 failed to upload 'basebackups_005/base_0000000100006F04000000E4/tar_partitions/part_19148.tar.br' to bucket 'S3_BUCKET':
    MultipartUpload: upload multipart failed
caused by: RequestError: send request failed
caused by: Put https://S3_BUCKET/basebackups_005/base_0000000100006F04000000E4/tar_partitions/part_19148.tar.br?partNumber=2:
    write tcp 10.64.18.161:42118->52.216.134.19:443: use of closed network connection
ERROR: 2020/08/06 08:39:52.198805 upload: could not upload 'base_0000000100006F04000000E4/tar_partitions/part_19148.tar.br'
ERROR: 2020/08/06 08:39:52.198818 failed to upload 'basebackups_005/base_0000000100006F04000000E4/tar_partitions/part_19148.tar.br' to bucket 'S3_BUCKET':
    MultipartUpload: upload multipart failed
caused by: RequestError: send request failed
caused by: Put https://S3_BUCKET/basebackups_005/base_0000000100006F04000000E4/tar_partitions/part_19148.tar.br?partNumber=2
    write tcp 10.64.18.161:42118->52.216.134.19:443: use of closed network connection
ERROR: 2020/08/06 08:39:52.198833 Unable to complete uploads
```
  • Loading branch information
jschaf committed Aug 7, 2020
1 parent 3cba6b0 commit 23212f7
Showing 1 changed file with 10 additions and 4 deletions.
14 changes: 10 additions & 4 deletions internal/uploader.go
Original file line number Diff line number Diff line change
Expand Up @@ -100,12 +100,18 @@ func (uploader *Uploader) Upload(path string, content io.Reader) error {
if uploader.tarSize != nil {
content = &WithSizeReader{content, uploader.tarSize}
}
err := uploader.UploadingFolder.PutObject(path, content)
if err == nil {
return nil
// Add retries to work around https://github.com/aws/aws-sdk-go/issues/3406
const retries = 3
var err error
for i := 0; i < retries; i++ {
err = uploader.UploadingFolder.PutObject(path, content)
if err == nil {
return nil
}
tracelog.ErrorLogger.Printf(tracelog.GetErrorFormatter()+"Retrying upload error:\n", err)
}
tracelog.ErrorLogger.Printf(tracelog.GetErrorFormatter()+"Exhausted upload retries:\n", err)
uploader.Failed.Store(true)
tracelog.ErrorLogger.Printf(tracelog.GetErrorFormatter()+"\n", err)
return err
}

Expand Down

0 comments on commit 23212f7

Please sign in to comment.