Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add backup/restore with optional automatic backups bucket #92

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

mogul
Copy link
Collaborator

@mogul mogul commented Apr 13, 2022

Relates to GSA/data.gov#3745

Not working yet... The broker user is going to need more perms to be able to directly make an S3 bucket and generating credentials for accessing it.

Also need to

  • verify that the variables are in the right place for our various static/brokered workflows
    • I think another refactoring of the module layout is in order...
  • expose the parameters through the EKS service definition
  • add tests

Note this won't pass tests until the AWS user used by the broker has permissions to create S3 buckets and IAM users.
@mogul
Copy link
Collaborator Author

mogul commented Apr 18, 2022

Not quite working yet...

  • The IRSA role creation seems to require two apply operations; that should not be necessary AFAIK
  • Backups seem to be resulting in state Failed, without much information provided...
$ velero backup create backup-test --ttl 24h --include-namespaces=default --wait
Backup request "backup-test" submitted successfully.
Waiting for backup to complete. You may safely press ctrl-c to stop waiting - your backup will continue in the background.
................................................
Backup completed with status: Failed. You may check for more information using the commands `velero backup describe backup-test` and `velero backup logs backup-test`.
bmogilefsky@rocinante-w10:~/Documents/Code/datagov-brokerpak-eks/terraform/modules/provision-aws$ velero backup describe backup-test
Name:         backup-test
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.21.9-eks-0d102a7
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=21+

Phase:  Failed (run `velero backup logs backup-test` for more information)

Errors:    0
Warnings:  2

Namespaces:
  Included:  default
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  auto

TTL:  24h0m0s

Hooks:  <none>

Backup Format Version:  1.1.0

Started:    2022-04-17 23:08:33 -0700 PDT
Completed:  2022-04-17 23:09:21 -0700 PDT

Expiration:  2022-04-18 23:08:33 -0700 PDT

Total items to be backed up:  35
Items backed up:              35

Velero-Native Snapshots: <none included>

bmogilefsky@rocinante-w10:~/Documents/Code/datagov-brokerpak-eks/terraform/modules/provision-aws$ velero backup logs backup-test
An error occurred: file not found

Next step in the morning: check Velero logs.

@mogul
Copy link
Collaborator Author

mogul commented Apr 18, 2022

Attempting to delete and then looking at the details of a backup sheds some light on what might be happening:

Deletion Attempts (1 failed):
  2022-04-17 23:04:54 -0700 PDT: Processed
  Errors:
    rpc error: code = Unknown desc = NoSuchBucket: The specified bucket does not exist
  status code: 404, request id: <redacted>, host id: <redacted>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant