Make the kubeflow-m2m-oidc-configurator a CronJob #2667

kromanow94 · 2024-04-04T15:43:39Z

Which issue is resolved by this Pull Request:
Resolves #2646

Description of your changes:
Changing the Job to CronJob improves the robustness of the setup in case if the JWKS will change or the user accidentally overwrote the requestauthentication.

Checklist:

Tested on kind and on vcluster.

kromanow94 · 2024-04-04T16:05:44Z

@juliusvonkohout or @kimwnasptd can we restart the tests? Both of them failed because of a non-related issue:

timed out waiting for the condition on pods/kubeflow-m2m-oidc-configurator-28537425-s8kzm
timed out waiting for the condition on pods/activator-bd5fdc585-rrnqf
timed out waiting for the condition on pods/autoscaler-5655dd9df5-4knpj
timed out waiting for the condition on pods/controller-5447f77dc5-ljx5r
timed out waiting for the condition on pods/domain-mapping-757799d898-knf69
timed out waiting for the condition on pods/domainmapping-webhook-5d875ccb7d-z2qjv
timed out waiting for the condition on pods/net-istio-controller-5f89595bcb-dv7h2
timed out waiting for the condition on pods/net-istio-webhook-dc448cfc4-rws5f
timed out waiting for the condition on pods/webhook-578c5cf66f-25sf9
timed out waiting for the condition on pods/coredns-5dd5756b68-hpg77
timed out waiting for the condition on pods/coredns-5dd5756b68-vv66m
timed out waiting for the condition on pods/etcd-kind-control-plane
timed out waiting for the condition on pods/kindnet-9l886
timed out waiting for the condition on pods/kindnet-pftsz
timed out waiting for the condition on pods/kindnet-z5qpl
timed out waiting for the condition on pods/kube-apiserver-kind-control-plane
timed out waiting for the condition on pods/kube-controller-manager-kind-control-plane
timed out waiting for the condition on pods/kube-proxy-64vj7
timed out waiting for the condition on pods/kube-proxy-vk4lr
timed out waiting for the condition on pods/kube-proxy-xwm8d
timed out waiting for the condition on pods/kube-scheduler-kind-control-plane
timed out waiting for the condition on pods/local-path-provisioner-7577fdbbfb-7zv5k
timed out waiting for the condition on pods/oauth2-proxy-86d8c97455-hvjl8
timed out waiting for the condition on pods/oauth2-proxy-86d8c97455-z9vjw
Error: Process completed with exit code 1.

juliusvonkohout · 2024-04-08T05:10:47Z

@KRomanov, i restarted the tests. If they fail again we might have to increase the timeouts in this PR.

kromanow94 · 2024-04-11T14:26:05Z

@juliusvonkohout this is super weird. I limited the CronJob with concurrencyPolicy: Forbid. I don't know if this should be handled with increasing the timeout or by increaseing the resources for the CICD Jobs... I can also try to split the installation steps to limit how many pods are created at the same time...

juliusvonkohout · 2024-04-15T06:54:51Z

I restarted the tests. yeah our CICD is a bit problematic at the moment. If we can specify more resources in this public repository yes, otherwise we have to increase the timeouts. https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories

kromanow94 · 2024-04-15T09:53:49Z

@juliusvonkohout maybe the issue is with CICD resource sharing? If the memory and cpu is shared between multiple workflows, it may be problematic. I see one of the failing tests completed with success. Can you restart the last test workflow?

Also, is this something I could do myself, for example with the github bot with commands in comment?

...nts/configure-self-signed-kubernetes-oidc-issuer/cronjob.kubeflow-m2m-oidc-configurator.yaml

juliusvonkohout · 2024-04-15T13:34:48Z

...nts/configure-self-signed-kubernetes-oidc-issuer/cronjob.kubeflow-m2m-oidc-configurator.yaml

+  name: kubeflow-m2m-oidc-configurator
+  namespace: istio-system
+spec:
+  schedule: '* * * * *'


SHould we not go with every 5 minutes instead of every minute?

I can change to every 5 minutes. There is also configuration for not adding more jobs until the last one is completed and from the latest log from cicd workflows shows that there is no more than 1 job created at a time.

juliusvonkohout · 2024-04-15T13:50:40Z

...nts/configure-self-signed-kubernetes-oidc-issuer/cronjob.kubeflow-m2m-oidc-configurator.yaml

+                defaultMode: 0777
+                items:
+                  - key: script.sh
+                    path: script.sh


are you sure that script.sh is idempotent?

Huh, well it doesn't verify if the JWKS is present and after all is always performing the patch so this might be an improvement. I think the JWKS value should be also compared and only patched if different.

I made changes so the script will first check for the JWKS present in RequestAuthentication and only patch if not equal to the desired JWKS.

juliusvonkohout · 2024-04-15T13:53:47Z

@juliusvonkohout maybe the issue is with CICD resource sharing? If the memory and cpu is shared between multiple workflows, it may be problematic. I see one of the failing tests completed with success. Can you restart the last test workflow?

Also, is this something I could do myself, for example with the github bot with commands in comment?

I did restart and it failed again. In the KFP repository that was possible with /retest or /retest-failed or so. Probably something i can investigate in the next weeks when i am less busy.

kromanow94 · 2024-04-15T15:14:47Z

@juliusvonkohout maybe we could add verbosity to the logs in CICD GH Workflows? We currently know that the pods aren't ready but what is the actual reason? DockerHub pull rate limits? Not enough resources? Failing Pod?

juliusvonkohout · 2024-04-16T08:41:27Z

@juliusvonkohout maybe we could add verbosity to the logs in CICD GH Workflows? We currently know that the pods aren't ready but what is the actual reason? DockerHub pull rate limits? Not enough resources? Failing Pod?

Yes, lets do that in a separate PR with @codablock as well.

juliusvonkohout · 2024-04-30T16:04:49Z

The tests in #2696 were successful so i reran the test and hope that the CICD is happy now. If not please rebase the PR against the master branch.

juliusvonkohout · 2024-04-30T16:06:09Z

https://github.com/kubeflow/manifests/actions/runs/8891109875 here is the successful test.

juliusvonkohout · 2024-04-30T16:36:42Z

So we need a rebase and step by step debugging with minimal changes.

juliusvonkohout · 2024-04-30T16:48:27Z

/hold

juliusvonkohout · 2024-05-13T07:27:08Z

/retest

Signed-off-by: Krzysztof Romanowski <[email protected]> Signed-off-by: Krzysztof Romanowski <[email protected]>

Signed-off-by: Krzysztof Romanowski <[email protected]>

It was tested with self-hosted runner using custom dockerconfig credentials for debugging. Signed-off-by: Krzysztof Romanowski <[email protected]>

…tor.yaml Signed-off-by: Krzysztof Romanowski <[email protected]>

Signed-off-by: Krzysztof Romanowski <[email protected]>

kromanow94 · 2024-06-13T15:24:27Z

@diegolovison this is the PR we've discussed on the Manifests WG Call.

juliusvonkohout · 2024-06-13T15:34:31Z

@kimwnasptd @rimolive please also review

juliusvonkohout · 2024-06-13T16:06:06Z

/lgtm

juliusvonkohout · 2024-06-13T16:06:17Z

/hold

kromanow94 · 2024-06-13T16:09:19Z

Huh, the CICD failed again with:

Reason: Unauthorized
HTTP response headers: HTTPHeaderDict({'x-powered-by': 'Express', 'www-authenticate': '***"http://10.96.103.33:8888/pipeline/apis/v2beta1/experiments?filter=%7B%22predicates%22%3A+%5B%7B%22operation%22%3A+1%2C+%22key%22%3A+%22display_name%22%2C+%22stringValue%22%3A+%22m2m-test%22%7D%5D%7D&namespace=kubeflow-user-example-com", error="invalid_token"', 'content-length': '22', 'content-type': 'text/plain', 'date': 'Thu, 13 Jun 2024 15:35:11 GMT', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '3'})

https://github.com/kubeflow/manifests/actions/runs/9502404627/job/26190294755?pr=2667

But, only 1/3 m2m cicd workflows failed... I'll have another look and try to pinpoint why this is happening. I guess this should be rather a small change.

diegolovison · 2024-06-13T16:43:42Z

Going to wait for the feedback

Signed-off-by: Krzysztof Romanowski <[email protected]>

kromanow94 · 2024-06-13T19:31:45Z

I made changes to the script so it will patch and then verify if the patch with jwks persisted. If it's not persisted, the Pod will finish with failure. The CronJob is configured to restart failed Job Pod 3 times. If this also fails, the ./tests/gh-actions/wait_for_kubeflow_m2m_oidc_configurator.sh will also fail.

@diegolovison , @juliusvonkohout , @kimwnasptd , please review.

juliusvonkohout · 2024-06-17T08:43:30Z

@diegolovison can you test now?

@rimolive we have to merge this for rc.2

juliusvonkohout · 2024-06-21T07:57:54Z

/lgtm
/approve

There was no feedback for a week and we need it in the next RC @rimolive

google-oss-prow · 2024-06-21T07:58:06Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: juliusvonkohout

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [juliusvonkohout]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

juliusvonkohout · 2024-06-21T08:02:35Z

/unhold

google-oss-prow bot added the size/M label Apr 4, 2024

google-oss-prow bot requested review from juliusvonkohout and kimwnasptd April 4, 2024 15:43

google-oss-prow bot added the approved label Apr 4, 2024

kromanow94 force-pushed the make-the-oidc-configurator-a-cronjob branch from b98a24d to 4abca40 Compare April 11, 2024 13:41

juliusvonkohout reviewed Apr 15, 2024

View reviewed changes

...nts/configure-self-signed-kubernetes-oidc-issuer/cronjob.kubeflow-m2m-oidc-configurator.yaml Outdated Show resolved Hide resolved

juliusvonkohout reviewed Apr 15, 2024

View reviewed changes

juliusvonkohout mentioned this pull request Apr 22, 2024

Manifests WG KF 1.9 tracker #2592

Closed

google-oss-prow bot added the do-not-merge/hold label Apr 30, 2024

juliusvonkohout mentioned this pull request May 30, 2024

Central Dashboard not visible #2735

Closed

7 tasks

kromanow94 force-pushed the make-the-oidc-configurator-a-cronjob branch from 4abca40 to 0a707ba Compare June 12, 2024 07:08

google-oss-prow bot added size/XXL and removed size/M approved labels Jun 12, 2024

kromanow94 force-pushed the make-the-oidc-configurator-a-cronjob branch from 0a707ba to e791d53 Compare June 12, 2024 07:12

google-oss-prow bot removed the size/XXL label Jun 12, 2024

kromanow94 and others added 5 commits June 13, 2024 15:21

cronjob.kubeflow-m2m-oidc-configurator: concurrencyPolicy: Forbid

9fcf494

Signed-off-by: Krzysztof Romanowski <[email protected]> Signed-off-by: Krzysztof Romanowski <[email protected]>

add tests/gh-actions/wait_for_kubeflow_m2m_oidc_configurator.sh

5c2157b

Signed-off-by: Krzysztof Romanowski <[email protected]>

Improve wait routine for m2m oidc configurator (#2)

442bc77

It was tested with self-hosted runner using custom dockerconfig credentials for debugging. Signed-off-by: Krzysztof Romanowski <[email protected]>

use docker.io/curlimages/curl for cronjob.kubeflow-m2m-oidc-configura…

2af863d

…tor.yaml Signed-off-by: Krzysztof Romanowski <[email protected]>

make the m2m oidc configurator idempotent

8d54066

Signed-off-by: Krzysztof Romanowski <[email protected]>

kromanow94 force-pushed the make-the-oidc-configurator-a-cronjob branch from 4eb270b to 8d54066 Compare June 13, 2024 15:23

juliusvonkohout self-assigned this Jun 13, 2024

google-oss-prow bot added the lgtm label Jun 13, 2024

google-oss-prow bot removed the lgtm label Jun 13, 2024

kromanow94 force-pushed the make-the-oidc-configurator-a-cronjob branch from aa466e7 to 6972652 Compare June 13, 2024 19:19

verify jwks configuration in requestauthentication

c830a7a

Signed-off-by: Krzysztof Romanowski <[email protected]>

kromanow94 force-pushed the make-the-oidc-configurator-a-cronjob branch from 6972652 to c830a7a Compare June 13, 2024 19:22

juliusvonkohout mentioned this pull request Jun 18, 2024

Update Kubeflow 1.9 schedule kubeflow/community#724

Merged

kromanow94 mentioned this pull request Jun 18, 2024

WIP PR for CICD kromanow94/kubeflow-manifests#3

Merged

google-oss-prow bot added the lgtm label Jun 21, 2024

google-oss-prow bot added the approved label Jun 21, 2024

google-oss-prow bot removed the do-not-merge/hold label Jun 21, 2024

google-oss-prow bot merged commit a1dbf47 into kubeflow:master Jun 21, 2024
8 checks passed

kromanow94 deleted the make-the-oidc-configurator-a-cronjob branch June 21, 2024 08:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the kubeflow-m2m-oidc-configurator a CronJob #2667

Make the kubeflow-m2m-oidc-configurator a CronJob #2667

kromanow94 commented Apr 4, 2024

kromanow94 commented Apr 4, 2024 •

edited

Loading

juliusvonkohout commented Apr 8, 2024

kromanow94 commented Apr 11, 2024

juliusvonkohout commented Apr 15, 2024 •

edited

Loading

kromanow94 commented Apr 15, 2024

juliusvonkohout Apr 15, 2024

kromanow94 Apr 15, 2024

juliusvonkohout Apr 15, 2024

kromanow94 Apr 15, 2024

kromanow94 Jun 13, 2024

juliusvonkohout commented Apr 15, 2024

kromanow94 commented Apr 15, 2024

juliusvonkohout commented Apr 16, 2024

juliusvonkohout commented Apr 30, 2024

juliusvonkohout commented Apr 30, 2024

juliusvonkohout commented Apr 30, 2024

juliusvonkohout commented Apr 30, 2024

juliusvonkohout commented May 13, 2024

kromanow94 commented Jun 13, 2024

juliusvonkohout commented Jun 13, 2024 •

edited

Loading

juliusvonkohout commented Jun 13, 2024

juliusvonkohout commented Jun 13, 2024

kromanow94 commented Jun 13, 2024

diegolovison commented Jun 13, 2024

kromanow94 commented Jun 13, 2024

juliusvonkohout commented Jun 17, 2024

juliusvonkohout commented Jun 21, 2024

google-oss-prow bot commented Jun 21, 2024

juliusvonkohout commented Jun 21, 2024

Make the kubeflow-m2m-oidc-configurator a CronJob #2667

Make the kubeflow-m2m-oidc-configurator a CronJob #2667

Conversation

kromanow94 commented Apr 4, 2024

kromanow94 commented Apr 4, 2024 • edited Loading

juliusvonkohout commented Apr 8, 2024

kromanow94 commented Apr 11, 2024

juliusvonkohout commented Apr 15, 2024 • edited Loading

kromanow94 commented Apr 15, 2024

juliusvonkohout Apr 15, 2024

Choose a reason for hiding this comment

kromanow94 Apr 15, 2024

Choose a reason for hiding this comment

juliusvonkohout Apr 15, 2024

Choose a reason for hiding this comment

kromanow94 Apr 15, 2024

Choose a reason for hiding this comment

kromanow94 Jun 13, 2024

Choose a reason for hiding this comment

juliusvonkohout commented Apr 15, 2024

kromanow94 commented Apr 15, 2024

juliusvonkohout commented Apr 16, 2024

juliusvonkohout commented Apr 30, 2024

juliusvonkohout commented Apr 30, 2024

juliusvonkohout commented Apr 30, 2024

juliusvonkohout commented Apr 30, 2024

juliusvonkohout commented May 13, 2024

kromanow94 commented Jun 13, 2024

juliusvonkohout commented Jun 13, 2024 • edited Loading

juliusvonkohout commented Jun 13, 2024

juliusvonkohout commented Jun 13, 2024

kromanow94 commented Jun 13, 2024

diegolovison commented Jun 13, 2024

kromanow94 commented Jun 13, 2024

juliusvonkohout commented Jun 17, 2024

juliusvonkohout commented Jun 21, 2024

google-oss-prow bot commented Jun 21, 2024

juliusvonkohout commented Jun 21, 2024

kromanow94 commented Apr 4, 2024 •

edited

Loading

juliusvonkohout commented Apr 15, 2024 •

edited

Loading

juliusvonkohout commented Jun 13, 2024 •

edited

Loading