Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: clone volume content to requested volume #1504

Merged
merged 1 commit into from
Jul 26, 2024

Conversation

acortelyou
Copy link
Contributor

@acortelyou acortelyou commented Jul 19, 2024

/kind bug

What this PR does / why we need it:

CreateVolume() now follows the established volume provisioning logic when cloning volume content.

This ensures the created volume has the expected configuration, metadata and content.

Which issue(s) this PR fixes:

Requirements:

Special notes for your reviewer:

Release note:

  • Volumes are now provisioned correctly when cloning

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 19, 2024
Copy link

linux-foundation-easycla bot commented Jul 19, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: acortelyou / name: Alex Cortelyou (89a10e7)

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Jul 19, 2024
@k8s-ci-robot
Copy link
Contributor

Welcome @acortelyou!

It looks like this is your first PR to kubernetes-sigs/blob-csi-driver 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/blob-csi-driver has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @acortelyou. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jul 19, 2024
@andyzhangx
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 20, 2024
@acortelyou
Copy link
Contributor Author

/retest

@andyzhangx andyzhangx changed the title fix: separate srcAccount and dstAccount for copyBlobContainer fix: set correct destination account name if specified in copy volume Jul 22, 2024
@andyzhangx
Copy link
Member

@acortelyou I have pushed a new commit(b40c5f9), only copy volume to different account if storageAccount is specified

@acortelyou
Copy link
Contributor Author

Is some code missing from the commit? Or did you want me to implement?

@andyzhangx
Copy link
Member

Is some code missing from the commit? Or did you want me to implement?

@acortelyou I have made the changes already, thx

@andyzhangx andyzhangx changed the title fix: set correct destination account name if specified in copy volume fix: set the correct destination account name for copy volume if storageAccount is specified in storage class Jul 22, 2024
@acortelyou
Copy link
Contributor Author

I have some concerns about these changes, will review more thoroughly tomorrow to make sure I'm not missing something. In short, if I don't specify an account in the volume request storage class I would expect the destination to be dynamically provisioned the exact same way it would be if there was not a data source specified. I have no expectation for specifying a data source to influence the data destination.

@andyzhangx
Copy link
Member

I have some concerns about these changes, will review more thoroughly tomorrow to make sure I'm not missing something. In short, if I don't specify an account in the volume request storage class I would expect the destination to be dynamically provisioned the exact same way it would be if there was not a data source specified. I have no expectation for specifying a data source to influence the data destination.

@acortelyou the original behavior is copying the volume into the same account as source volume, we should keep the same behavior unless user specifies storageAccount in storage class. Also if we copy the volume to the account with dynamic matched by the driver, it's difficult for user to find out which destination account has the copied volume since the driver has it's own logic to search a matching account. that's the reason why we only supports copying the volume to the source account in the beginning.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 23, 2024
@acortelyou
Copy link
Contributor Author

acortelyou commented Jul 23, 2024

Thanks @andyzhangx, I appreciate your quick responses and willingness to collaborate on a solution.

I believe the user expects their volume to be provisioned as configured whether or not they provide a datasource.

the original behavior is copying the volume into the same account as source volume

Correct, this is a bug and contradicts the documentation.

we should keep the same behavior unless user specifies storageAccount in storage class

I disagree.

We should have the volume provisioning behavior always match the documentation and user intent.

Arbitrarily preventing the use of volume cloning and dynamic provisioning at the same time would break many useful scenarios, mine included.

it's difficult for user to find out which destination account has the copied volume

The documentation provides clear guidance on where/how volumes are to be provisioned and also provides the means to locate and co-locate the volumes if desired.

parameters:
  subscriptionID: >
    Specify Azure subscription ID where blob storage directory will be created.
  resourceGroup: >
    If empty, driver will use the same resource group name as current cluster.
  storageAccount: >
    When a specific storage account name is not provided, the driver will look for a suitable storage account that matches the account settings within the same resource group. 
    If it fails to find a matching storage account, it will create a new one. 
    However, if a storage account name is specified, the storage account must already exist.
  containerName: >
    If empty, driver creates a new container name, starting with pvc-fuse for blobfuse or pvc-nfs for NFS v3.
  containerNamePrefix: >
    Specify Azure storage directory prefix created by driver.
  tags: >
    Specify tags for storage account created by the driver.
  matchTags: >
    Match tags when driver tries to find a suitable storage account.
  skuName: >
    Specify skuName for storage account.
  location: >
    Specify location for storage account.

https://learn.microsoft.com/en-us/azure/aks/azure-csi-blob-storage-provision#storage-class-parameters-for-dynamic-persistent-volumes

I suspect we'll want to refactor some of these changes so that we can fail faster before potentially creating an account or container, but it would require additional modifications to the tests.

Thanks!

FYI I pulled the logs for the last month and it looks like no-one other than my team and the E2EAKS runners are even trying to use blob volume cloning. I suspect it's because the current behavior is so incredibly limiting. After fixing this bug, blob volume cloning will be extremely useful for a bunch of scenarios. Blobfuse2 can be way more performant than the other csi options we've benchmarked and we want to see it succeed.

pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
@acortelyou acortelyou changed the title fix: set the correct destination account name for copy volume if storageAccount is specified in storage class fix: copy volume content source to requested destination Jul 23, 2024
@acortelyou
Copy link
Contributor Author

@umagnus I have removed the extraneous funcs and added a check before regenerating the azcopy auth env.
@andyzhangx I have added if volContentSource != nil around all appropriate code blocks and comments to show the logical code flow.

One now-redundant test has been removed.
I have reverted most non-essential changes and scoped this PR tightly to the issue at hand.

@acortelyou
Copy link
Contributor Author

I think I spotted a volume cloning bug where GetAzcopyJob was grep'ing using the request volName instead of the actual destination validContainerName, which could result in multiple duplicate azcopy jobs being started.

I left that fix as a separate commit so that it can be easily reverted if the existing behavior was intentional.

pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
pkg/blob/controllerserver.go Outdated Show resolved Hide resolved
@acortelyou acortelyou changed the title fix: copy volume content source to requested destination fix: clone volume content to requested volume Jul 26, 2024
@acortelyou
Copy link
Contributor Author

/retest

3 similar comments
@acortelyou
Copy link
Contributor Author

/retest

@acortelyou
Copy link
Contributor Author

/retest

@andyzhangx
Copy link
Member

/retest

Copy link
Member

@andyzhangx andyzhangx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 26, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: acortelyou, andyzhangx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 26, 2024
@andyzhangx andyzhangx merged commit fe95621 into kubernetes-sigs:master Jul 26, 2024
22 checks passed
@andyzhangx
Copy link
Member

/cherrypick release-1.24

@k8s-infra-cherrypick-robot

@andyzhangx: new pull request created: #1509

In response to this:

/cherrypick release-1.24

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@andyzhangx
Copy link
Member

/cherrypick release-1.23

@k8s-infra-cherrypick-robot

@andyzhangx: #1504 failed to apply on top of branch "release-1.23":

Applying: fix: clone volume content to requested volume
Using index info to reconstruct a base tree...
M	pkg/blob/controllerserver.go
M	pkg/blob/controllerserver_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/blob/controllerserver_test.go
CONFLICT (content): Merge conflict in pkg/blob/controllerserver_test.go
Auto-merging pkg/blob/controllerserver.go
CONFLICT (content): Merge conflict in pkg/blob/controllerserver.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 fix: clone volume content to requested volume
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherrypick release-1.23

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants