Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endpoint allocation can get stuck some times #3265

Closed
radical opened this issue Mar 28, 2024 · 1 comment · Fixed by #3294
Closed

Endpoint allocation can get stuck some times #3265

radical opened this issue Mar 28, 2024 · 1 comment · Fixed by #3294
Assignees
Labels
area-app-model Issues pertaining to the APIs in Aspire.Hosting, e.g. DistributedApplication area-orchestrator

Comments

@radical
Copy link
Member

radical commented Mar 28, 2024

This code in ApplicationExecutor.CreateServicesAsync should have a timeout to either fail or at least log a message that it is still waiting on address allocation for some services, and list those services.

// We do not specify the initial list version, so the watcher will give us all updates to Service objects.
IAsyncEnumerable<(WatchEventType, Service)> serviceChangeEnumerator = kubernetesService.WatchAsync<Service>(cancellationToken: cancellationToken);
await foreach (var (evt, updated) in serviceChangeEnumerator)
{
if (evt == WatchEventType.Bookmark) { continue; } // Bookmarks do not contain any data.
var srvResource = needAddressAllocated.Where(sr => sr.Service.Metadata.Name == updated.Metadata.Name).FirstOrDefault();
if (srvResource == null) { continue; } // This service most likely already has full address information, so it is not on needAddressAllocated list.
if (updated.HasCompleteAddress || updated.Spec.AddressAllocationMode == AddressAllocationModes.Proxyless)
{
srvResource.Service.ApplyAddressInfoFrom(updated);
needAddressAllocated.Remove(srvResource);
}
if (needAddressAllocated.Count == 0)
{
return; // We are done
}
}

This was hit in running Aspire.EndToEnd.Tests where 1/20 runs fails where it gets stuck waiting for a service to get allocated.

Screenshot 2024-03-28 at 15 25 23 In the above run it is stuck waiting for `redis`, till the test infrastructure times out after 5 minutes.

Full log. And corresponding dcp logs are here in logs/dcp-diag-logs, and the corresponding build.

Note that this is not specific to redis.

cc @karolz-ms @eerhardt

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-app-model Issues pertaining to the APIs in Aspire.Hosting, e.g. DistributedApplication label Mar 28, 2024
@karolz-ms
Copy link
Member

CC @dbreshears @danegsta

In a way similar to what @mitchdenny has done, we should probably retry this watch a few times before timing out. Alternative (or last-ditch effort, we could do both) could be to explicitly query the remaining resources and see if they have the address information we are looking for.

@karolz-ms karolz-ms self-assigned this Mar 29, 2024
karolz-ms added a commit that referenced this issue Mar 30, 2024
@dbreshears dbreshears added this to the preview 6 (Apr) milestone Apr 1, 2024
karolz-ms added a commit that referenced this issue Apr 1, 2024
karolz-ms added a commit that referenced this issue Apr 2, 2024
karolz-ms added a commit that referenced this issue Apr 3, 2024
* Improve service address allocation
Should fix #3265
radical pushed a commit to radical/aspire that referenced this issue Apr 3, 2024
* Improve service address allocation
Should fix dotnet#3265
davidfowl pushed a commit that referenced this issue Apr 10, 2024
* Improve service address allocation
Should fix #3265
joperezr pushed a commit that referenced this issue Apr 10, 2024
* Improve service address allocation (#3294)

* Improve service address allocation
Should fix #3265

* Make the dashboard an appmodel resource (#3453)

* Make the dashboard an appmodel resource

- Moved dashboard resource into a lifecycle hook instead of making it a dcp resource.
This removes the specialized code from ApplicationExecutor from knowing about the dashboard.
As a result of this change I also cleaned up how we configure and validate dcp options to use IConfigureOptions and IValidateOptions.
- Added tests for the dashboard resource
- Made a change to ApplicationExecutor to allow resources that start as
hidden to remain hidden.
- Added hidden to a new known resource states class
- Added more test cases

* Only add dashboard services if the dashboard is enabled (#3489)

* Only add dashboard services if the dashboard is enabled

* Don't wait until after we've started the entire app to print the token (#3472)

- Print it right after we print the dashboard url
- Refactored the dashboard resource to use DashboardOptions instead of DcpOptions

---------

Co-authored-by: Karol Zadora-Przylecki <[email protected]>
Co-authored-by: Reuben Bond <[email protected]>
@github-actions github-actions bot locked and limited conversation to collaborators May 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-app-model Issues pertaining to the APIs in Aspire.Hosting, e.g. DistributedApplication area-orchestrator
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants