-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Orchestration instance started stuck in pending status (Functions v2 breaking change) #342
Comments
After looking over the execution history a little more, it appears that the last time the orchestration was successfully started, one of its sub orchestrations never returned. Or maybe the parent never picked up again after the sub completed. The parent orchestration was with instance ID 29a352c4-aec5-4154-add8-fc6ba4b977db. It calls two subs. The execution history and output indicates the first sub's final activity completed successfully (and finished its work correctly). Then there is never a returning invocation of the sub orchestration nor execution of the second sub within the parent. |
Thanks. Two questions:
|
I meant a misbehaving durable task function, mentioned here. I manually cleared a control queue in order to kill the fail/retry cycle. I do not know the internals of the framework and wondered if this might have created a bad state with ripple effects into what I am seeing now. The instances are not directly related. They are referencing the same orchestration and the one with the non-completing sub was the last time the job was triggered and did not sit in pending status indefinitely. |
After restarting the app service, the orchestration most recently started began executing and ran to completion. |
Clearing control queues is generally fine. It will cause all pending instances to stop processing, but shouldn't otherwise put anything into a bad state. Glad to know you were able to restart the function app to mitigate the stuck instance. I would still like to investigate that cause - it seems something must have gotten stuck in memory. Is this the same |
Yes, that was the ID of the instance that hung during execution. All steps that ran completed successfully, but somewhere in between it got hung and did not run all the steps. It was in a running status when the next timer trigger fired and was terminated at that time. From then on is when the indefinite pending began. |
I've hit this issue too. The only thing I can add as far as what I've seen is the runtime version. Then, moved the to Australia East yesterday (12 Jun) and it was working fine - can recall the runtime version, but it was != 2.0.11857.0 (beta) Now, its failing on Aus East again, and the Aus East runtime is showing as 2.0.11857.0 (beta) |
Ah, I feel dumb for not asking this right away but are both of you running Functions v2? All of my testing trying to (unsuccessfully) reproduce this were using v1. |
For me, yes.
|
It does not work currently in East US, East US2. Durable Azure Functions stopped working, DurableFunctionsHubInstances Table records states that Runtime Status for Activity is always Pending, durablefunctionshubcontrol Queues messages can not be properly read, Dequeue Count always increases. |
I suspect I have run into this problem too. I've deleted the entire resource group with the function app and the associated storage account and redeployed. No luck. The logs reveal that a new instance of the orchestrator function gets scheduled, but it is never picked up from the control queue. .csproj
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>netstandard2.0</TargetFramework>
<AzureFunctionsVersion>v2</AzureFunctionsVersion>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="Microsoft.NET.Sdk.Functions" Version="1.0.13"/>
<PackageReference Include="Microsoft.Azure.Management.Sql" Version="1.16.0-preview"/>
<PackageReference Include="Microsoft.Azure.Services.AppAuthentication" Version="1.1.0-preview"/>
<PackageReference Include="Microsoft.Azure.WebJobs.Extensions.DurableTask" Version="1.4.1"/>
<PackageReference Include="WindowsAzure.Storage" Version="8.6.0"/>
</ItemGroup>
</Project> |
@vludax Exactly the same issue |
Thanks for confirming. This may have been caused by the recent update of the Azure Functions v2 runtime into Azure mentioned here. Can you please try setting the |
@cgillum Thank you, FUNCTIONS_EXTENSION_VERSION setting change seemed to help. |
Yep, worked for me too, cheers! |
Trying to reproduce this to get to the root cause. @nagoh , @OTedikov , @pluralmonad , can you provide the following information about the environment you experienced this issue in?
Thanks! |
Same issue here. Environment information shown below:
|
Using Microsoft.Azure.WebJobs.Extensions.DurableTask 1.4.1 |
Thanks for the information! I was able to reproduce this - the functions runtime host was recently updated to use Newtonsoft.Json 11.0.2, while one of the Durable Functions extension's dependencies was still on 10.0.3. There's a breaking change between the two versions that was causing an exception to be thrown whenever the Durable Task Framework tried to process a work item, which is why orchestrators were stuck in Pending state. I've submitted a fix which will be going out in a release. In the meantime, please use Chris's workaround. |
@cgillum @kashimiz Just realized one of my apps was effected too - went fine once I dropped back to previous version. I wonder if there's another task here we could track which would enable a |
We actually used to wait for the orchestrator to start before returning from the Were you able to observe the problem if you checked your app insights function logs? It ideally should have been visible in the host logs (Katy saw it in Kusto - not sure if that necessarily means it surfaced in AI). If so, I wonder if we could add some portal UX to make it clear that the orchestrator and activity trigger functions were unhealthy (I thought I saw an email thread discussing this possibility). |
Yes unfortunately I see the request go into HTTP and execute succeeded, but nothing in "exceptions" table in last 24 hours. So I at least don't see anything in my AI. Agree on the load piece. If interested to see if exceptions were thrown at a different layer the function app is named |
Please note that after many tests, fixing Just an FYI for everyone. |
Getting caught up on this discussion... The The specific |
@pluralmonad My issue was with .NET Core which is only supported in the beta v2 runtime. |
@pluralmonad Unfortunately I think your original issue may have been overtaken by a similar issue which only impacted Functions v2. We've fixed the v2 issue, and several people are watching this issue to track that. It will be available in the 1.5 release, which is on nuget.org now (release announcement pending). For that reason, I'd like to close this issue. Can you open a new issue with your latest information, and specify that this is impacting Functions v1? Sorry for the mixup... |
I have a timer that fires ever 6 hours. It triggers a orchestration starter function that checks for an existing instance and starts/terminates/restarts the orchestration.
Recently the trigger log output for each trigger event has indicated an existing instance was found in pending status and with a last update time equal to its start time (6 hours previously from the last trigger event).
Here is a representative trigger function output.
Any ideas why it might never leave pending status?
I recently had a feral orchestration persist in a retry and fail state for an extended period of time. Might I need to do some clean up in the dtfx control mechanisms in storage?
This application is running in the Central US region.
The text was updated successfully, but these errors were encountered: