Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make reading Kubeconfig from DCP resilient #3132

Merged
merged 12 commits into from
Mar 26, 2024

Conversation

mitchdenny
Copy link
Member

@mitchdenny mitchdenny commented Mar 25, 2024

This PR adds some retry logic to EnsureKubernetes. As part of the process the method has become async and I'm making use of a semaphore to stop the stampede to read the file it'll only read once whilst the other threads slowly spin if they get caught up in the semaphore.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-integrations Issues pertaining to Aspire Integrations packages label Mar 25, 2024
@mitchdenny mitchdenny force-pushed the mitchdenny/retry-read-kubeconfig branch from d099b79 to 4b8a994 Compare March 25, 2024 05:28
@mitchdenny mitchdenny marked this pull request as ready for review March 25, 2024 10:43
@mitchdenny mitchdenny self-assigned this Mar 25, 2024
@mitchdenny mitchdenny added the area-app-model Issues pertaining to the APIs in Aspire.Hosting, e.g. DistributedApplication label Mar 25, 2024
@mitchdenny mitchdenny added this to the preview 5 (Apr) milestone Mar 25, 2024
@karolz-ms
Copy link
Member

I think a better fix would be to address this on the DCP side. Namely, produce the kubeconfig file with the following sequence:

  1. Create a temporary file
  2. Write the content and close the file
  3. Rename the file so that it matches the name the readers expect.

This should be a pretty easy fix on DCP side

@karolz-ms
Copy link
Member

Had a brief chat with @davidfowl , working on a DCP fix for kubeconfig access conflict.

@mitchdenny
Copy link
Member Author

I think a better fix would be to address this on the DCP side. Namely, produce the kubeconfig file with the following sequence:

  1. Create a temporary file
  2. Write the content and close the file
  3. Rename the file so that it matches the name the readers expect.

This should be a pretty easy fix on DCP side

We have to do this both on the DCP side and here. We have to add the retry logic here because we are looking for it before DCP has put the file down on disk. What you are proposing for DCP will improve things as well.

Copy link
Member

@karolz-ms karolz-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like the direction, my main concern is super low frequency of probing for the file in this attempt.

src/Aspire.Hosting/Dcp/KubernetesService.cs Outdated Show resolved Hide resolved
src/Aspire.Hosting/Dcp/KubernetesService.cs Outdated Show resolved Hide resolved
src/Aspire.Hosting/Dcp/KubernetesService.cs Outdated Show resolved Hide resolved
src/Aspire.Hosting/Dcp/KubernetesService.cs Show resolved Hide resolved
@dotnet-policy-service dotnet-policy-service bot added the needs-author-action An issue or pull request that requires more info or actions from the author. label Mar 25, 2024
@karolz-ms
Copy link
Member

We have to do this both on the DCP side and here. We have to add the retry logic here because we are looking for it before DCP has put the file down on disk. What you are proposing for DCP will improve things as well.

Makes sense. I gave you some suggestions, HTH!

@dotnet-policy-service dotnet-policy-service bot removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Mar 26, 2024
@mitchdenny
Copy link
Member Author

@karolz-ms could you please reset your vote?

@mitchdenny
Copy link
Member Author

/backport to release/8.0-preview5

Copy link
Contributor

Started backporting to release/8.0-preview5: https://github.com/dotnet/aspire/actions/runs/8431360102

Copy link
Member

@davidfowl davidfowl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit

@mitchdenny
Copy link
Member Author

/backport to release/8.0-preview5

Copy link
Contributor

Started backporting to release/8.0-preview5: https://github.com/dotnet/aspire/actions/runs/8433947069

@mitchdenny
Copy link
Member Author

/backport to release/8.0-preview5

Copy link
Contributor

Started backporting to release/8.0-preview5: https://github.com/dotnet/aspire/actions/runs/8434031136

@mitchdenny mitchdenny dismissed karolz-ms’s stale review March 26, 2024 13:34

Discussed with Karol who is out

@mitchdenny mitchdenny merged commit 47d5440 into main Mar 26, 2024
8 checks passed
@mitchdenny mitchdenny deleted the mitchdenny/retry-read-kubeconfig branch March 26, 2024 13:34
@github-actions github-actions bot locked and limited conversation to collaborators Apr 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-app-model Issues pertaining to the APIs in Aspire.Hosting, e.g. DistributedApplication area-integrations Issues pertaining to Aspire Integrations packages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants