Collect logging for exited containers #70

adriansuarez · 2024-10-03T02:05:29Z

For any containers with non-0 restart counts, collect logging for exited containers using kubectl logs ... --previous.

For any containers with non-0 restart counts, collect logging for exited containers using `kubectl logs ... --previous`.

adriansuarez · 2024-10-03T02:10:46Z

FYI @sivanov-nuodb, I created the branch asuarez/test-minikube-2.5.0 which includes this change and also reverts the Control Plane used in Minikube to 2.5.0. You should be able to run the pipeline on that branch to reproduce the webhook issue.

The root cause seems to be that the container exits unexpectedly and takes a long time to acquire the lease on restart. Not sure what is causing the crashes on 2.5.0 or whether it is still relevant in newer versions.

deploy/k8s/logs.sh

sivanov-nuodb

Thanks for addign this!

sivanov-nuodb · 2024-10-03T10:10:07Z

FYI @sivanov-nuodb, I created the branch asuarez/test-minikube-2.5.0 which includes this change and also reverts the Control Plane used in Minikube to 2.5.0. You should be able to run the pipeline on that branch to reproduce the webhook issue.

The root cause seems to be that the container exits unexpectedly and takes a long time to acquire the lease on restart. Not sure what is causing the crashes on 2.5.0 or whether it is still relevant in newer versions.

Thanks for creating the branch so that we can investigate further!

The webhook server is started immediately and does not wait for the leader assignment. I agree that the webhook error (connection refused) is due to the operator container being restarted.

The operator will fail if the VolumeSnapshot CRD is not installed in the cluster but the embedded backup plugin is enabled. In CP 2.6.0, the operator Helm chart will automatically disable the backup plugin (fixed by you in https://github.com/nuodb/nuodb-control-plane/commit/9759ef911e4163ee2d48a3cd376d1a3a86487107) which is why the issue is not exposed in CP 2.7.0.

adriansuarez · 2024-10-03T12:04:57Z

Thanks for creating the branch so that we can investigate further!

The webhook server is started immediately and does not wait for the leader assignment. I agree that the webhook error (connection refused) is due to the operator container being restarted.

The operator will fail if the VolumeSnapshot CRD is not installed in the cluster but the embedded backup plugin is enabled. In CP 2.6.0, the operator Helm chart will automatically disable the backup plugin (fixed by you in nuodb/nuodb-control-plane@9759ef9) which is why the issue is not exposed in CP 2.7.0.

Okay, so it is no longer an issue.

I will keep using 2.6.1 rather than explicitly disabling the backup manager, since it does give us more coverage of past product versions. Currently we have coverage of 2.5.0 via KWOK, and both the KWOK and Minikube variants exercise 2.7.0 (latest).

Collect logging for exited containers

05100e9

For any containers with non-0 restart counts, collect logging for exited containers using `kubectl logs ... --previous`.

adriansuarez requested a review from sivanov-nuodb October 3, 2024 02:05

sivanov-nuodb reviewed Oct 3, 2024

View reviewed changes

deploy/k8s/logs.sh Outdated Show resolved Hide resolved

sivanov-nuodb approved these changes Oct 3, 2024

View reviewed changes

Actually collect previous container logs, for all namespaces

5237ba5

adriansuarez merged commit 0809af3 into main Oct 3, 2024
8 of 9 checks passed

adriansuarez deleted the asuarez/collect-previous branch October 3, 2024 12:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collect logging for exited containers #70

Collect logging for exited containers #70

adriansuarez commented Oct 3, 2024

adriansuarez commented Oct 3, 2024 •

edited

Loading

sivanov-nuodb left a comment

sivanov-nuodb commented Oct 3, 2024 •

edited

Loading

adriansuarez commented Oct 3, 2024

Collect logging for exited containers #70

Collect logging for exited containers #70

Conversation

adriansuarez commented Oct 3, 2024

adriansuarez commented Oct 3, 2024 • edited Loading

sivanov-nuodb left a comment

Choose a reason for hiding this comment

sivanov-nuodb commented Oct 3, 2024 • edited Loading

adriansuarez commented Oct 3, 2024

adriansuarez commented Oct 3, 2024 •

edited

Loading

sivanov-nuodb commented Oct 3, 2024 •

edited

Loading