-
Notifications
You must be signed in to change notification settings - Fork 659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate flyteconsole from Travis -> Github Actions #423
Labels
ui
Admin console user interface
Comments
eapolinario
pushed a commit
to eapolinario/flyte
that referenced
this issue
Dec 6, 2022
* Moving all plugins to a common package, for easy loading Signed-off-by: Ketan Umare <[email protected]> * updated Signed-off-by: Ketan Umare <[email protected]>
eapolinario
pushed a commit
to eapolinario/flyte
that referenced
this issue
Dec 20, 2022
…flyteorg#423) Signed-off-by: Daniel Rammer <[email protected]>
eapolinario
pushed a commit
to eapolinario/flyte
that referenced
this issue
Dec 20, 2022
Signed-off-by: Nastya Rusina <[email protected]>
eapolinario
pushed a commit
to eapolinario/flyte
that referenced
this issue
Aug 9, 2023
* Moving all plugins to a common package, for easy loading Signed-off-by: Ketan Umare <[email protected]> * updated Signed-off-by: Ketan Umare <[email protected]>
eapolinario
pushed a commit
to eapolinario/flyte
that referenced
this issue
Aug 21, 2023
…flyteorg#423) Signed-off-by: Daniel Rammer <[email protected]>
eapolinario
pushed a commit
to eapolinario/flyte
that referenced
this issue
Apr 30, 2024
…rg#423) * WIP. Marked places where an acknowledgement before an update is needed. Signed-off-by: Kamal Eybov <[email protected]> * (1) Added error handling for methods fetching matchable attributes when attributes do not exist. (2) Added fetching data that is needed for diffing during updates. Signed-off-by: Kamal Eybov <[email protected]> * Diff and ask for ack. Signed-off-by: Kamal Eybov <[email protected]> * Fixed some of the TODOs. Signed-off-by: Kamal Eybov <[email protected]> * Cleaned up error handling. Signed-off-by: Kamal Eybov <[email protected]> * Updated tests. Signed-off-by: Kamal Eybov <[email protected]> * More tests. Signed-off-by: Kamal Eybov <[email protected]> * Fix case for No string to no (flyteorg#419) Signed-off-by: Future Outlier <[email protected]> Co-authored-by: Future Outlier <[email protected]> Signed-off-by: Kamal Eybov <[email protected]> * Replaced diffing implementation. Signed-off-by: Kamal Eybov <[email protected]> * Addressed pull request comments. Signed-off-by: Kamal Eybov <[email protected]> * Fixed linter errors. Signed-off-by: Kamal Eybov <[email protected]> --------- Signed-off-by: Kamal Eybov <[email protected]> Signed-off-by: Future Outlier <[email protected]> Co-authored-by: Future-Outlier <[email protected]> Co-authored-by: Future Outlier <[email protected]>
eapolinario
pushed a commit
to eapolinario/flyte
that referenced
this issue
Apr 30, 2024
…rg#423) * WIP. Marked places where an acknowledgement before an update is needed. Signed-off-by: Kamal Eybov <[email protected]> * (1) Added error handling for methods fetching matchable attributes when attributes do not exist. (2) Added fetching data that is needed for diffing during updates. Signed-off-by: Kamal Eybov <[email protected]> * Diff and ask for ack. Signed-off-by: Kamal Eybov <[email protected]> * Fixed some of the TODOs. Signed-off-by: Kamal Eybov <[email protected]> * Cleaned up error handling. Signed-off-by: Kamal Eybov <[email protected]> * Updated tests. Signed-off-by: Kamal Eybov <[email protected]> * More tests. Signed-off-by: Kamal Eybov <[email protected]> * Fix case for No string to no (flyteorg#419) Signed-off-by: Future Outlier <[email protected]> Co-authored-by: Future Outlier <[email protected]> Signed-off-by: Kamal Eybov <[email protected]> * Replaced diffing implementation. Signed-off-by: Kamal Eybov <[email protected]> * Addressed pull request comments. Signed-off-by: Kamal Eybov <[email protected]> * Fixed linter errors. Signed-off-by: Kamal Eybov <[email protected]> --------- Signed-off-by: Kamal Eybov <[email protected]> Signed-off-by: Future Outlier <[email protected]> Co-authored-by: Future-Outlier <[email protected]> Co-authored-by: Future Outlier <[email protected]>
austin362667
pushed a commit
to austin362667/flyte
that referenced
this issue
May 7, 2024
…rg#423) * WIP. Marked places where an acknowledgement before an update is needed. Signed-off-by: Kamal Eybov <[email protected]> * (1) Added error handling for methods fetching matchable attributes when attributes do not exist. (2) Added fetching data that is needed for diffing during updates. Signed-off-by: Kamal Eybov <[email protected]> * Diff and ask for ack. Signed-off-by: Kamal Eybov <[email protected]> * Fixed some of the TODOs. Signed-off-by: Kamal Eybov <[email protected]> * Cleaned up error handling. Signed-off-by: Kamal Eybov <[email protected]> * Updated tests. Signed-off-by: Kamal Eybov <[email protected]> * More tests. Signed-off-by: Kamal Eybov <[email protected]> * Fix case for No string to no (flyteorg#419) Signed-off-by: Future Outlier <[email protected]> Co-authored-by: Future Outlier <[email protected]> Signed-off-by: Kamal Eybov <[email protected]> * Replaced diffing implementation. Signed-off-by: Kamal Eybov <[email protected]> * Addressed pull request comments. Signed-off-by: Kamal Eybov <[email protected]> * Fixed linter errors. Signed-off-by: Kamal Eybov <[email protected]> --------- Signed-off-by: Kamal Eybov <[email protected]> Signed-off-by: Future Outlier <[email protected]> Co-authored-by: Future-Outlier <[email protected]> Co-authored-by: Future Outlier <[email protected]>
robert-ulbrich-mercedes-benz
pushed a commit
to robert-ulbrich-mercedes-benz/flyte
that referenced
this issue
Jul 2, 2024
…rg#423) * WIP. Marked places where an acknowledgement before an update is needed. Signed-off-by: Kamal Eybov <[email protected]> * (1) Added error handling for methods fetching matchable attributes when attributes do not exist. (2) Added fetching data that is needed for diffing during updates. Signed-off-by: Kamal Eybov <[email protected]> * Diff and ask for ack. Signed-off-by: Kamal Eybov <[email protected]> * Fixed some of the TODOs. Signed-off-by: Kamal Eybov <[email protected]> * Cleaned up error handling. Signed-off-by: Kamal Eybov <[email protected]> * Updated tests. Signed-off-by: Kamal Eybov <[email protected]> * More tests. Signed-off-by: Kamal Eybov <[email protected]> * Fix case for No string to no (flyteorg#419) Signed-off-by: Future Outlier <[email protected]> Co-authored-by: Future Outlier <[email protected]> Signed-off-by: Kamal Eybov <[email protected]> * Replaced diffing implementation. Signed-off-by: Kamal Eybov <[email protected]> * Addressed pull request comments. Signed-off-by: Kamal Eybov <[email protected]> * Fixed linter errors. Signed-off-by: Kamal Eybov <[email protected]> --------- Signed-off-by: Kamal Eybov <[email protected]> Signed-off-by: Future Outlier <[email protected]> Co-authored-by: Future-Outlier <[email protected]> Co-authored-by: Future Outlier <[email protected]>
pvditt
added a commit
that referenced
this issue
Aug 20, 2024
## Overview when [informer cache has stale values](https://github.com/unionai/flyte/blob/1e82352dd95f89630e333fe6105d5fdb5487a24e/flytepropeller/pkg/controller/nodes/task/k8s/plugin_manager.go#L478), we cannot update the k8s resource when [clearing finalizers](https://github.com/unionai/flyte/blob/1e82352dd95f89630e333fe6105d5fdb5487a24e/flytepropeller/pkg/controller/nodes/task/k8s/plugin_manager.go#L450) and get `Error: Operation cannot be fulfilled on pods.` The current implementation bubbles up the error resulting in a system retry. By the next loop, the informer cache is up to date and the update is able to be applied. However, in an ArrayNode with many subnodes getting executed in parallel, the execution can easily run out of retries. This update adds a basic retry with exponential backoff to wait for the informer cache to get up to date. ## Test Plan Ran in dogfood-gcp - https://buildkite.com/unionai/managed-cluster-staging-sync/builds/4622 + manually updated configmap to enabled finalizers - Run without change (https://dogfood-gcp.cloud-staging.union.ai/console/projects/flytesnacks/domains/development/executions/fd16ac81fd7b5480fb6f/nodes) - Run with change (https://dogfood-gcp.cloud-staging.union.ai/console/projects/flytesnacks/domains/development/executions/f016a3be7fa304db5a77/nodeId/n0/nodes) confirmed in logs that conflict errors: ``` {"json":{"exec_id":"f016a3be7fa304db5a77","node":"n0/n42","ns":"development","res_ver":"146129599","routine":"worker-66","src":"plugin_manager.go:455","wf":"flytesnacks:development:tests.flytekit.integration.map_task_issue.wf8"},"level":"warning","msg":"Failed to clear finalizers for Resource with name: development/f016a3be7fa304db5a77-n0-0-n42-0. Error: Operation cannot be fulfilled on pods \"f016a3be7fa304db5a77-n0-0-n42-0\": the object has been modified; please apply your changes to the latest version and try again","ts":"2024-08-17T02:02:48Z"} ``` did not bubble up + confirmed finalizers were removed: ``` ➜ ~ k get pods -n development f016a3be7fa304db5a77-n0-0-n42-0 -o json | grep -i final INFO[0000] [0] Couldn't find a config file []. Relying on env vars and pflags. ➜ ~ ``` ## Rollout Plan (if applicable) - revert changes to customer's config that disabled finalizers ## Upstream Changes Should this change be upstreamed to OSS (flyteorg/flyte)? If not, please uncheck this box, which is used for auditing. Note, it is the responsibility of each developer to actually upstream their changes. See [this guide](https://unionai.atlassian.net/wiki/spaces/ENG/pages/447610883/Flyte+-+Union+Cloud+Development+Runbook/#When-are-versions-updated%3F). - [x] To be upstreamed to OSS ## Issue fixes: https://linear.app/unionai/issue/COR-1558/investigate-why-finalizers-consume-system-retries-in-map-tasks ## Checklist * [ ] Added tests * [x] Ran a deploy dry run and shared the terraform plan * [ ] Added logging and metrics * [ ] Updated [dashboards](https://unionai.grafana.net/dashboards) and [alerts](https://unionai.grafana.net/alerting/list) * [ ] Updated documentation Signed-off-by: Paul Dittamo <[email protected]>
eapolinario
pushed a commit
that referenced
this issue
Aug 21, 2024
## Overview when [informer cache has stale values](https://github.com/unionai/flyte/blob/1e82352dd95f89630e333fe6105d5fdb5487a24e/flytepropeller/pkg/controller/nodes/task/k8s/plugin_manager.go#L478), we cannot update the k8s resource when [clearing finalizers](https://github.com/unionai/flyte/blob/1e82352dd95f89630e333fe6105d5fdb5487a24e/flytepropeller/pkg/controller/nodes/task/k8s/plugin_manager.go#L450) and get `Error: Operation cannot be fulfilled on pods.` The current implementation bubbles up the error resulting in a system retry. By the next loop, the informer cache is up to date and the update is able to be applied. However, in an ArrayNode with many subnodes getting executed in parallel, the execution can easily run out of retries. This update adds a basic retry with exponential backoff to wait for the informer cache to get up to date. ## Test Plan Ran in dogfood-gcp - https://buildkite.com/unionai/managed-cluster-staging-sync/builds/4622 + manually updated configmap to enabled finalizers - Run without change (https://dogfood-gcp.cloud-staging.union.ai/console/projects/flytesnacks/domains/development/executions/fd16ac81fd7b5480fb6f/nodes) - Run with change (https://dogfood-gcp.cloud-staging.union.ai/console/projects/flytesnacks/domains/development/executions/f016a3be7fa304db5a77/nodeId/n0/nodes) confirmed in logs that conflict errors: ``` {"json":{"exec_id":"f016a3be7fa304db5a77","node":"n0/n42","ns":"development","res_ver":"146129599","routine":"worker-66","src":"plugin_manager.go:455","wf":"flytesnacks:development:tests.flytekit.integration.map_task_issue.wf8"},"level":"warning","msg":"Failed to clear finalizers for Resource with name: development/f016a3be7fa304db5a77-n0-0-n42-0. Error: Operation cannot be fulfilled on pods \"f016a3be7fa304db5a77-n0-0-n42-0\": the object has been modified; please apply your changes to the latest version and try again","ts":"2024-08-17T02:02:48Z"} ``` did not bubble up + confirmed finalizers were removed: ``` ➜ ~ k get pods -n development f016a3be7fa304db5a77-n0-0-n42-0 -o json | grep -i final INFO[0000] [0] Couldn't find a config file []. Relying on env vars and pflags. ➜ ~ ``` ## Rollout Plan (if applicable) - revert changes to customer's config that disabled finalizers ## Upstream Changes Should this change be upstreamed to OSS (flyteorg/flyte)? If not, please uncheck this box, which is used for auditing. Note, it is the responsibility of each developer to actually upstream their changes. See [this guide](https://unionai.atlassian.net/wiki/spaces/ENG/pages/447610883/Flyte+-+Union+Cloud+Development+Runbook/#When-are-versions-updated%3F). - [x] To be upstreamed to OSS ## Issue fixes: https://linear.app/unionai/issue/COR-1558/investigate-why-finalizers-consume-system-retries-in-map-tasks ## Checklist * [ ] Added tests * [x] Ran a deploy dry run and shared the terraform plan * [ ] Added logging and metrics * [ ] Updated [dashboards](https://unionai.grafana.net/dashboards) and [alerts](https://unionai.grafana.net/alerting/list) * [ ] Updated documentation Signed-off-by: Paul Dittamo <[email protected]>
pmahindrakar-oss
pushed a commit
that referenced
this issue
Sep 9, 2024
## Overview when [informer cache has stale values](https://github.com/unionai/flyte/blob/1e82352dd95f89630e333fe6105d5fdb5487a24e/flytepropeller/pkg/controller/nodes/task/k8s/plugin_manager.go#L478), we cannot update the k8s resource when [clearing finalizers](https://github.com/unionai/flyte/blob/1e82352dd95f89630e333fe6105d5fdb5487a24e/flytepropeller/pkg/controller/nodes/task/k8s/plugin_manager.go#L450) and get `Error: Operation cannot be fulfilled on pods.` The current implementation bubbles up the error resulting in a system retry. By the next loop, the informer cache is up to date and the update is able to be applied. However, in an ArrayNode with many subnodes getting executed in parallel, the execution can easily run out of retries. This update adds a basic retry with exponential backoff to wait for the informer cache to get up to date. ## Test Plan Ran in dogfood-gcp - https://buildkite.com/unionai/managed-cluster-staging-sync/builds/4622 + manually updated configmap to enabled finalizers - Run without change (https://dogfood-gcp.cloud-staging.union.ai/console/projects/flytesnacks/domains/development/executions/fd16ac81fd7b5480fb6f/nodes) - Run with change (https://dogfood-gcp.cloud-staging.union.ai/console/projects/flytesnacks/domains/development/executions/f016a3be7fa304db5a77/nodeId/n0/nodes) confirmed in logs that conflict errors: ``` {"json":{"exec_id":"f016a3be7fa304db5a77","node":"n0/n42","ns":"development","res_ver":"146129599","routine":"worker-66","src":"plugin_manager.go:455","wf":"flytesnacks:development:tests.flytekit.integration.map_task_issue.wf8"},"level":"warning","msg":"Failed to clear finalizers for Resource with name: development/f016a3be7fa304db5a77-n0-0-n42-0. Error: Operation cannot be fulfilled on pods \"f016a3be7fa304db5a77-n0-0-n42-0\": the object has been modified; please apply your changes to the latest version and try again","ts":"2024-08-17T02:02:48Z"} ``` did not bubble up + confirmed finalizers were removed: ``` ➜ ~ k get pods -n development f016a3be7fa304db5a77-n0-0-n42-0 -o json | grep -i final INFO[0000] [0] Couldn't find a config file []. Relying on env vars and pflags. ➜ ~ ``` ## Rollout Plan (if applicable) - revert changes to customer's config that disabled finalizers ## Upstream Changes Should this change be upstreamed to OSS (flyteorg/flyte)? If not, please uncheck this box, which is used for auditing. Note, it is the responsibility of each developer to actually upstream their changes. See [this guide](https://unionai.atlassian.net/wiki/spaces/ENG/pages/447610883/Flyte+-+Union+Cloud+Development+Runbook/#When-are-versions-updated%3F). - [x] To be upstreamed to OSS ## Issue fixes: https://linear.app/unionai/issue/COR-1558/investigate-why-finalizers-consume-system-retries-in-map-tasks ## Checklist * [ ] Added tests * [x] Ran a deploy dry run and shared the terraform plan * [ ] Added logging and metrics * [ ] Updated [dashboards](https://unionai.grafana.net/dashboards) and [alerts](https://unionai.grafana.net/alerting/list) * [ ] Updated documentation Signed-off-by: Paul Dittamo <[email protected]> Signed-off-by: pmahindrakar-oss <[email protected]>
bgedik
pushed a commit
to bgedik/flyte
that referenced
this issue
Sep 12, 2024
…eorg#5673) ## Overview when [informer cache has stale values](https://github.com/unionai/flyte/blob/1e82352dd95f89630e333fe6105d5fdb5487a24e/flytepropeller/pkg/controller/nodes/task/k8s/plugin_manager.go#L478), we cannot update the k8s resource when [clearing finalizers](https://github.com/unionai/flyte/blob/1e82352dd95f89630e333fe6105d5fdb5487a24e/flytepropeller/pkg/controller/nodes/task/k8s/plugin_manager.go#L450) and get `Error: Operation cannot be fulfilled on pods.` The current implementation bubbles up the error resulting in a system retry. By the next loop, the informer cache is up to date and the update is able to be applied. However, in an ArrayNode with many subnodes getting executed in parallel, the execution can easily run out of retries. This update adds a basic retry with exponential backoff to wait for the informer cache to get up to date. ## Test Plan Ran in dogfood-gcp - https://buildkite.com/unionai/managed-cluster-staging-sync/builds/4622 + manually updated configmap to enabled finalizers - Run without change (https://dogfood-gcp.cloud-staging.union.ai/console/projects/flytesnacks/domains/development/executions/fd16ac81fd7b5480fb6f/nodes) - Run with change (https://dogfood-gcp.cloud-staging.union.ai/console/projects/flytesnacks/domains/development/executions/f016a3be7fa304db5a77/nodeId/n0/nodes) confirmed in logs that conflict errors: ``` {"json":{"exec_id":"f016a3be7fa304db5a77","node":"n0/n42","ns":"development","res_ver":"146129599","routine":"worker-66","src":"plugin_manager.go:455","wf":"flytesnacks:development:tests.flytekit.integration.map_task_issue.wf8"},"level":"warning","msg":"Failed to clear finalizers for Resource with name: development/f016a3be7fa304db5a77-n0-0-n42-0. Error: Operation cannot be fulfilled on pods \"f016a3be7fa304db5a77-n0-0-n42-0\": the object has been modified; please apply your changes to the latest version and try again","ts":"2024-08-17T02:02:48Z"} ``` did not bubble up + confirmed finalizers were removed: ``` ➜ ~ k get pods -n development f016a3be7fa304db5a77-n0-0-n42-0 -o json | grep -i final INFO[0000] [0] Couldn't find a config file []. Relying on env vars and pflags. ➜ ~ ``` ## Rollout Plan (if applicable) - revert changes to customer's config that disabled finalizers ## Upstream Changes Should this change be upstreamed to OSS (flyteorg/flyte)? If not, please uncheck this box, which is used for auditing. Note, it is the responsibility of each developer to actually upstream their changes. See [this guide](https://unionai.atlassian.net/wiki/spaces/ENG/pages/447610883/Flyte+-+Union+Cloud+Development+Runbook/#When-are-versions-updated%3F). - [x] To be upstreamed to OSS ## Issue fixes: https://linear.app/unionai/issue/COR-1558/investigate-why-finalizers-consume-system-retries-in-map-tasks ## Checklist * [ ] Added tests * [x] Ran a deploy dry run and shared the terraform plan * [ ] Added logging and metrics * [ ] Updated [dashboards](https://unionai.grafana.net/dashboards) and [alerts](https://unionai.grafana.net/alerting/list) * [ ] Updated documentation Signed-off-by: Paul Dittamo <[email protected]> Signed-off-by: Bugra Gedik <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We're currently running a hybrid setup with Travis and GH actions. We can/should remove the Travis CI bits which are duplicated (the PR image build) and consider migrating any other checks (I believe just the unit tests/linting?) over to GH actions.
The text was updated successfully, but these errors were encountered: