You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue Description
We are seeing duplicate sub_workflows were getting triggered for a workflow
On further investigation, We identified a race condition b/w sweeper thread and system task worker thread leading to a duplicate SUB_WORKFLOW system task. The race condition is not just limited to sub_workflow system task but can be replicated with any async System task.
Explanation:
The system task worker thread-1 polls and acknowledges the task from the task's corresponding queue & triggers the AsyncSystemTaskExecutor::execute. The execute method isn't yet completed, therefore the status is still SCHEDULED
Meanwhile, sweeper tries to sweep this workflow and finds it's task is in repairable state. It then adds the task to the queue as it finds the task is not in the queue.
The thread-1 now marks the status of the sub_workflow task to IN_PROGRESS
The system task worker thread-2 polls the same task (added by sweeper) from the queue and executes it as new sub_workflow.
Task definition: We are not setting any task definition
Event handler definition: N/A in this case.
To Reproduce
Steps to reproduce the behavior: Race conditions are not straight-forward to reproduce, however to mimic the behaviour of race condition,
We can add a delay in AsyncSystemTaskExecutor::execute method before executionDAOFacade.updateTask(task)
The delay should be large enough to let sweeper trigger it's sweep logic for this workflow and finds out sub_workflow task in SCHEDULED state and the taskId is not present in the queue thereby putting the task back to the queue.
Expected behavior
Tasks shouldn't be duplicated
The text was updated successfully, but these errors were encountered:
Issue Description
We are seeing duplicate sub_workflows were getting triggered for a workflow
On further investigation, We identified a race condition b/w sweeper thread and system task worker thread leading to a duplicate SUB_WORKFLOW system task. The race condition is not just limited to sub_workflow system task but can be replicated with any async System task.
Explanation:
SCHEDULED
Details
Conductor version: 3.20.0
Persistence implementation: Postgres
Queue implementation: Dynoqueues
Lock: Redis
Workflow definition: This definition is just for reference purpose, to explain that we are using a sub_workflow task which is getting duplicated.
{ "name": "parent_workflow", "description": "Parent Workflow", "version": 1, "schemaVersion": 2, "inputParameters": [], "tasks": [ { "name": "loopTask", "taskReferenceName": "loopTask", "type": "DO_WHILE", "inputParameters": { "batch": "${workflow.input.batch}", "batchSize": "${workflow.input.batch.length()}" }, "loopCondition": "$.loopTask['iteration'] < $.batchSize", "loopOver": [ { "name": "loopTask_prepare", "taskReferenceName": "loopTask_prepare", "type": "INLINE", "inputParameters": { "evaluatorType": "javascript", "expression": "function sFun(){ if($.batch === null) {return null} else { return $.batch.get($.iteration-1) } } sFun();", "batch": "${loopTask.input.batch}", "iteration": "${loopTask.output.iteration}" }, "asyncComplete": false }, { "name": "sub_workflow_task", "taskReferenceName": "sub_workflow_task", "type": "SUB_WORKFLOW", "inputParameters": { "item": "${loopTask_prepare.output.result}", "index": "${loopTask.output.iteration}" }, "asyncComplete": false, "isOptional": true, "subWorkflowParam": { "name": "test_sub_workflow", "version": 1 } } ], "asyncComplete": false } ], "ownerEmail": "[email protected]", "outputParameters": {}, "inputTemplate": {}, "timeoutSeconds": 86400, "timeoutPolicy": "ALERT_ONLY", "restartable": true }
Task definition: We are not setting any task definition
Event handler definition: N/A in this case.
To Reproduce
Steps to reproduce the behavior: Race conditions are not straight-forward to reproduce, however to mimic the behaviour of race condition,
executionDAOFacade.updateTask(task)
Expected behavior
Tasks shouldn't be duplicated
The text was updated successfully, but these errors were encountered: