-
Notifications
You must be signed in to change notification settings - Fork 657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core feature] Create or document a fast way to query the status of large workflows #4056
Comments
Thank you for opening your first issue here! 🛠 |
cc @pingsutw when we added system tags did we add a query method for this - in list executions? |
Thanks for the replies. I gave the tags filtering a try but I don't really understand how the tags can help. I want to see the status of all the nodes in a particular workflow quickly but I can't find any API that can list all the nodes in a workflow. All the APIs seem to list only immediate child nodes, rather that the whole graph. I don't really think filtering is the problem, all the APIs I've tested already have too much filtering. |
I would still be keen to know if there is a better way but I ended up creating a version of |
This is awesome, @Tom-Newton... |
What I did was a fairly naive approach that could definitely be optimised. The best way though would be to add native support for |
If anyone is interested in the code I'm using personally |
I just saw #2193 which seems to be about an async interface to flyte admin, which is one possible solution to this |
🤖 Hey @Tom-Newton! Great to see you diving into the depths of Flyte workflows again. Looks like we're on another interesting adventure. 🚀 Based on your requirements, I would recommend using the Here is a sample usage of the nodeExecutionListRequest := admin.NodeExecutionListRequest{
WorkflowExecutionId: &core.WorkflowExecutionIdentifier{
Project: "my_project",
Domain: "my_domain",
Name: "my_workflow_execution_name",
},
Filters: "eq(phase, SUCCEEDED)",
Limit: 100,
Token: "",
SortBy: &admin.Sort{Key: "created_at", Direction: admin.Sort_DESCENDING},
}
nodeExecutionList, err := nodeExecutionManager.ListNodeExecutions(context.Background(), nodeExecutionListRequest)
if err != nil {
log.Fatalf("Failed to list node executions: %v", err)
}
for _, nodeExecution := range nodeExecutionList.NodeExecutions {
fmt.Printf("Node execution: %v\n", nodeExecution)
} In this example, the Please note that you need to replace The You can find more details about the Regarding the async interface to Flyte admin discussed in issue #2193, it could potentially improve the performance of querying the status of large workflows. However, it would require significant changes to the Flyte admin and might not be available in the near future. For now, I would recommend using the I hope this helps! If you have any further questions, feel free to ask. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Motivation: Why do you think this is important?
For large workflows with ~1000s of nodes its difficult to monitor the overall progress. At this scale we need to be able to get things like counts of how many nodes or sub-workflows are in each status. e.g. 900 success, 200: unknown 2 failed, 98: in progress.
I think supporting large workflows like this would be a valuable feature and its critical to what I want to do with flyte.
Goal: What should the final outcome look like, ideally?
I think ideal would be an interface similar to
remote.sync_execution(sync_nodes=True)
but fast for large workflows. This would be very flexible.Other interfaces would also be fine - I'm mostly just interested in it being fast.
Describe alternatives you've considered
The flyte UI:
It displays lists and graphs but at the scale of 1000s of nodes these are impossible to parse by eye. Additionally in tends to crash my browser.
flytectl get execution
Can get information about nodes when using
--details
but it seems to be incomplete. Writing to a.yaml
file and searching, I find quite a lot of nodes are missing.flytekit
remote.sync_execution(sync_nodes=True)
This does fetch all the important information and could certainly be parsed by some python code to extra whatever metrics are needed. The problem is that it takes about 12 minutes to run on a workflow with 3000 nodes.
EDIT: It actually doesn't fetch information about nodes that haven't started processing yet. So nodes that would show with unknown status on the UI are missed.
Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: