-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k8s runtime: force deletion to avoid hung function worker during connector restart #12504
Conversation
// https://amalgjose.com/2021/07/28/how-to-delete-a-kubernetes-pod-which-is-stuck-in-terminating-state/ | ||
// https://www.ibm.com/support/pages/kubernetes-pods-are-stuck-terminating-state | ||
// https://github.com/kubernetes-client/java/issues/770 | ||
options.setGracePeriodSeconds(0L); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make the grace period configurable as part of the k8s runtime factory config:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerrypeng fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
Co-authored-by: Anonymitaet <[email protected]>
@jerrypeng @Anonymitaet Please take another look; I addressed your suggestions. |
@dlg99 LGTM from a tech writing perspective. |
* up/master: (55 commits) [broker] remove useless method "PersistentTopic#getPersistentTopic" (apache#12655) [Python Schema] Python schema support custom Avro configurations for Enum type (apache#12642) Allow to configure different implementations for Pulsar functions state store (apache#12646) Remove replicator global test from the quarantine group (apache#12648) [Java Client] Remove invalid call to Thread.currentThread().interrupt(); (apache#12652) k8s runtime: force deletion to avoid hung function worker during connector restart (apache#12504) [Broker] Optimize exception information for schemas (apache#12647) Close Zk database on unit tests (apache#12649) Fix call sync method in an async callback when enabling geo replicator. (apache#12590) [pulsar-broker] Add git branch information for PulsarVersion (apache#12541) PulsarAdmin: Fix last exit code storage (apache#12581) Add @test annotation to test methods (apache#12640) Upgrade debezium to 1.7.1 (apache#12644) [ML] Avoid passing OpAddEntry across a thread boundary in asyncAddEntry (apache#12606) [Functions] Prevent NPE while stopping a non started Pulsar LogAppender (apache#12643) Update io-debezium-source.md (apache#12638) Add missing cmds on pulsar-admin document page (apache#12634) Clean up the metadata of the non-persistent partitioned topics. (apache#12550) modify check waitingForPingResponse with volatile (apache#12615) [pulsar-admin] Check backlog quota policy for namespace (apache#12512) ...
* up/master: (55 commits) [broker] remove useless method "PersistentTopic#getPersistentTopic" (apache#12655) [Python Schema] Python schema support custom Avro configurations for Enum type (apache#12642) Allow to configure different implementations for Pulsar functions state store (apache#12646) Remove replicator global test from the quarantine group (apache#12648) [Java Client] Remove invalid call to Thread.currentThread().interrupt(); (apache#12652) k8s runtime: force deletion to avoid hung function worker during connector restart (apache#12504) [Broker] Optimize exception information for schemas (apache#12647) Close Zk database on unit tests (apache#12649) Fix call sync method in an async callback when enabling geo replicator. (apache#12590) [pulsar-broker] Add git branch information for PulsarVersion (apache#12541) PulsarAdmin: Fix last exit code storage (apache#12581) Add @test annotation to test methods (apache#12640) Upgrade debezium to 1.7.1 (apache#12644) [ML] Avoid passing OpAddEntry across a thread boundary in asyncAddEntry (apache#12606) [Functions] Prevent NPE while stopping a non started Pulsar LogAppender (apache#12643) Update io-debezium-source.md (apache#12638) Add missing cmds on pulsar-admin document page (apache#12634) Clean up the metadata of the non-persistent partitioned topics. (apache#12550) modify check waitingForPingResponse with volatile (apache#12615) [pulsar-admin] Check backlog quota policy for namespace (apache#12512) ...
…ector restart (apache#12504) (cherry picked from commit a3f6aba)
…ector restart (apache#12504) (cherry picked from commit a3f6aba) (cherry picked from commit 82c01bc)
Motivation
Restart of connector via
pulsar-admin source restart
(debezium postgres, but reprod with another too) failed and the function worker became non-responsive, repeatedly loggingModifications
the rootcause tracked to the k8s client call timing out.
Looks like V1DeleteOptions weren't passed to the corresponding calls, and
Foreground
policy was not passed properly AFAICT from the k8s-client github/issues.I also moved grace period for deleteNamespacedStatefulSetCall into the config.
Verifying this change
Tested on the env.
Don't know how to unit test this.
Does this pull request potentially affect one of the following parts:
NO, AFAIK.
New config parameter is added, keeps the same value as hardcoded one it replaced.
If
yes
was chosen, please highlight the changesDocumentation
doc