Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove volume IOPS limit #483

Merged
merged 2 commits into from
Jul 21, 2020
Merged

Remove volume IOPS limit #483

merged 2 commits into from
Jul 21, 2020

Conversation

jacobmarble
Copy link
Contributor

@jacobmarble jacobmarble commented Apr 9, 2020

Is this a bug fix or adding new feature?

Bug fix.
Fixes: #306

What is this PR about? / Why do we need it?

Our project requires the full IOPS capacity of EBS volumes.

Max IOPS for SSD (io1) volumes was increased from 20,000 to 32,000:
https://aws.amazon.com/about-aws/whats-new/2017/12/amazon-ebs-provisioned-iops-ssd--io1--volumes-now-support-32-000-iops-and-500-mbs-per-volume/
and later to 64,000:
https://aws.amazon.com/about-aws/whats-new/2018/11/amazon-elastic-block-store-announces-double-the-performance-of-provisioned-iops-volumes/

Rather than chasing this limit, allow the AWS SDK to surface errors for IOPS value out of range.

What testing is done?

go test

@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
  • If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
  • Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 9, 2020
@k8s-ci-robot
Copy link
Contributor

Welcome @jacobmarble!

It looks like this is your first PR to kubernetes-sigs/aws-ebs-csi-driver 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/aws-ebs-csi-driver has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @jacobmarble. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Apr 9, 2020
@coveralls
Copy link

coveralls commented Apr 9, 2020

Pull Request Test Coverage Report for Build 1119

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.3%) to 79.136%

Totals Coverage Status
Change from base Build 1111: 0.3%
Covered Lines: 1411
Relevant Lines: 1783

💛 - Coveralls

pkg/cloud/cloud.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 9, 2020
@jacobmarble jacobmarble changed the title Increase EBS volume max IOPS Remove volume IOPS limit Apr 9, 2020
@msau42
Copy link
Contributor

msau42 commented Apr 10, 2020

/ok-to-test
/assign @leakingtapan

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Apr 10, 2020
@leakingtapan
Copy link
Contributor

Could you test on what’s the behavior when IOPS is either under the limit or over the limit with the change on a real cluster? And how is the error surfaced?

/retest

@jacobmarble
Copy link
Contributor Author

@leakingtapan I have tested the PR against the legacy implementation. kubernetes/kubernetes#90014 (comment)

The errors exposed in logs and kubectl described are short, clear, useful.

@leakingtapan
Copy link
Contributor

The legacy in-tree plugin exercises different code paths than CSI driver, please verify the behavior against CSI as well, in case something unexpected happens for error propagation. Thx

@BondAnthony
Copy link
Contributor

@leakingtapan do you have any pointers to testing this by using local-up-cluster.sh or another method? I’m curious what development flow is normal.

@leakingtapan
Copy link
Contributor

I use kops to test the driver. Let me know if you need any help on setting it up.

@jacobmarble
Copy link
Contributor Author

@leakingtapan thanks for the kops tip. I had not used kops before.

Setup

  • created a k8s cluster with kops, looks like k8s version is 1.17.6
  • built the CSI image against this branch with make image
  • pushed the Docker image to a publicly-readable repo for easy access
  • modified deploy/overlays/dev/kustomization.yml to point to my image
  • applied the above
  • pods ebs-csi-controller-blah and ebs-csi-node-blah are "Running"
  • applied the StorageClass and PVCs described below
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-io1-1000g-50000iops
  namespace: default
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1000000M
  storageClassName: io1-50
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-io1-1500g-75000iops
  namespace: default
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1500000M
  storageClassName: io1-50
---
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: io1-50
parameters:
  iopsPerGB: "50"
  type: io1
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowedTopologies:
- matchLabelExpressions:
  - key: failure-domain.beta.kubernetes.io/zone
    values:
    - us-west-2a

Result - SUCCESS

PVC data-io1-1000g-50000iops was created and bound successfully. The AWS console shows the corresponding EBS volume is io1 with 46600 IOPS.

PVC data-io1-1500g-75000iops was not created successfully. The AWS console shows no corresponding EBS volume. Errors from kubectl describe pvc data-io1-1500g-75000iops are clear and helpful: "Could not create volume "pvc-ddc4aaed-a361-4475-b073-ee5a5e04ae2a": could not create volume in EC2: InvalidParameterValue: Volume iops of 69850 is too high; maximum is 64000."

$ kubectl describe pvc data-io1-1500g-75000iops
Name:          data-io1-1500g-75000iops
Namespace:     default
StorageClass:  io1-50
Status:        Pending
Volume:
Labels:        <none>
Annotations:   kubectl.kubernetes.io/last-applied-configuration:
                 {"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{},"name":"data-io1-1500g-75000iops","namespace":"default"},"s...
               volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Mounted By:    <none>
Events:
  Type     Reason              Age    From                                                                                      Message
  ----     ------              ----   ----                                                                                      -------
  Warning  ProvisioningFailed  7m24s  persistentvolume-controller                                                               storageclass.storage.k8s.io "io1-50" not found
  Warning  ProvisioningFailed  7m17s  ebs.csi.aws.com_ebs-csi-controller-5c5f68cc5c-wxwsb_58832278-71bd-4bcc-959b-b6c57f389065  failed to provision volume with StorageClass "io1-50": rpc error: code = Internal desc = Could not create volume "pvc-ddc4aaed-a361-4475-b073-ee5a5e04ae2a": could not create volume in EC2: InvalidParameterValue: Volume iops of 69850 is too high; maximum is 64000.
           status code: 400, request id: 866fcb08-2f8e-4f9c-ac49-32714bc6ce1b
  Warning  ProvisioningFailed  7m16s  ebs.csi.aws.com_ebs-csi-controller-5c5f68cc5c-wxwsb_58832278-71bd-4bcc-959b-b6c57f389065  failed to provision volume with StorageClass "io1-50": rpc error: code = Internal desc = Could not create volume "pvc-ddc4aaed-a361-4475-b073-ee5a5e04ae2a": could not create volume in EC2: InvalidParameterValue: Volume iops of 69850 is too high; maximum is 64000.
           status code: 400, request id: 51dedfd0-4592-451d-b4c3-f8b80ad553cf
  Warning  ProvisioningFailed  7m14s  ebs.csi.aws.com_ebs-csi-controller-5c5f68cc5c-wxwsb_58832278-71bd-4bcc-959b-b6c57f389065  failed to provision volume with StorageClass "io1-50": rpc error: code = Internal desc = Could not create volume "pvc-ddc4aaed-a361-4475-b073-ee5a5e04ae2a": could not create volume in EC2: InvalidParameterValue: Volume iops of 69850 is too high; maximum is 64000.
           status code: 400, request id: e00e34ee-543c-493a-8778-d40383c0b86f
  Warning  ProvisioningFailed  7m9s  ebs.csi.aws.com_ebs-csi-controller-5c5f68cc5c-wxwsb_58832278-71bd-4bcc-959b-b6c57f389065  failed to provision volume with StorageClass "io1-50": rpc error: code = Internal desc = Could not create volume "pvc-ddc4aaed-a361-4475-b073-ee5a5e04ae2a": could not create volume in EC2: InvalidParameterValue: Volume iops of 69850 is too high; maximum is 64000.
           status code: 400, request id: fe1715ac-b4ca-4f43-b8c7-9a27e9dc8576
  Warning  ProvisioningFailed  7m1s  ebs.csi.aws.com_ebs-csi-controller-5c5f68cc5c-wxwsb_58832278-71bd-4bcc-959b-b6c57f389065  failed to provision volume with StorageClass "io1-50": rpc error: code = Internal desc = Could not create volume "pvc-ddc4aaed-a361-4475-b073-ee5a5e04ae2a": could not create volume in EC2: InvalidParameterValue: Volume iops of 69850 is too high; maximum is 64000.
           status code: 400, request id: 5c912971-537a-43da-a831-c77b960ee17b
  Warning  ProvisioningFailed  6m45s  ebs.csi.aws.com_ebs-csi-controller-5c5f68cc5c-wxwsb_58832278-71bd-4bcc-959b-b6c57f389065  failed to provision volume with StorageClass "io1-50": rpc error: code = Internal desc = Could not create volume "pvc-ddc4aaed-a361-4475-b073-ee5a5e04ae2a": could not create volume in EC2: InvalidParameterValue: Volume iops of 69850 is too high; maximum is 64000.
           status code: 400, request id: c6d77d96-1720-4194-8de9-7f6b929e0e8d
  Warning  ProvisioningFailed  6m13s  ebs.csi.aws.com_ebs-csi-controller-5c5f68cc5c-wxwsb_58832278-71bd-4bcc-959b-b6c57f389065  failed to provision volume with StorageClass "io1-50": rpc error: code = Internal desc = Could not create volume "pvc-ddc4aaed-a361-4475-b073-ee5a5e04ae2a": could not create volume in EC2: InvalidParameterValue: Volume iops of 69850 is too high; maximum is 64000.
           status code: 400, request id: ec383444-a89c-43cf-a93c-5fea830148fd
  Warning  ProvisioningFailed  5m8s  ebs.csi.aws.com_ebs-csi-controller-5c5f68cc5c-wxwsb_58832278-71bd-4bcc-959b-b6c57f389065  failed to provision volume with StorageClass "io1-50": rpc error: code = Internal desc = Could not create volume "pvc-ddc4aaed-a361-4475-b073-ee5a5e04ae2a": could not create volume in EC2: InvalidParameterValue: Volume iops of 69850 is too high; maximum is 64000.
           status code: 400, request id: b3062a7b-b068-4c20-b189-6a8c233b1b34
  Normal   Provisioning        3m (x9 over 7m21s)  ebs.csi.aws.com_ebs-csi-controller-5c5f68cc5c-wxwsb_58832278-71bd-4bcc-959b-b6c57f389065  External provisioner is provisioning volume for claim "default/data-io1-1500g-75000iops"
  Warning  ProvisioningFailed  3m                  ebs.csi.aws.com_ebs-csi-controller-5c5f68cc5c-wxwsb_58832278-71bd-4bcc-959b-b6c57f389065  failed to provision volume with StorageClass "io1-50": rpc error: code = Internal desc = Could not create volume "pvc-ddc4aaed-a361-4475-b073-ee5a5e04ae2a": could not create volume in EC2: InvalidParameterValue: Volume iops of 69850 is too high; maximum is 64000.
           status code: 400, request id: 95f79b06-3c05-455f-8971-0fa006a2b87e
  Normal   ExternalProvisioning  81s (x25 over 7m21s)  persistentvolume-controller  waiting for a volume to be created, either by external provisioner "ebs.csi.aws.com" or manually created by system administrator

@jacobmarble
Copy link
Contributor Author

/assign @leakingtapan

@jacobmarble
Copy link
Contributor Author

/retest

@jacobmarble
Copy link
Contributor Author

Friendly note here. I believe that this and kubernetes/kubernetes#90014 are ready for consideration to be merged.

@leakingtapan
Copy link
Contributor

Thx for the detailed testing! It looks great
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 26, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jacobmarble, leakingtapan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 26, 2020
@leakingtapan
Copy link
Contributor

/retest

6 similar comments
@leakingtapan
Copy link
Contributor

/retest

@leakingtapan
Copy link
Contributor

/retest

@leakingtapan
Copy link
Contributor

/retest

@philjb
Copy link

philjb commented Jul 20, 2020

/retest

@philjb
Copy link

philjb commented Jul 20, 2020

/retest

@philjb
Copy link

philjb commented Jul 21, 2020

/retest

@k8s-ci-robot k8s-ci-robot merged commit 43bad88 into kubernetes-sigs:master Jul 21, 2020
@jacobmarble jacobmarble deleted the max-total-iops branch July 21, 2020 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make IOPS max cap dynamic per instance
7 participants