Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KRaft] KafkaRoller is strugling to transition controller-only nodes to mixed nodes #9434

Closed
scholzj opened this issue Dec 5, 2023 · 1 comment · Fixed by #9686
Closed

Comments

@scholzj
Copy link
Member

scholzj commented Dec 5, 2023

When having a controller-only node pool and trying to add a broker role to it, the operator / KafkaRoller seem to strugle to roll the nodes to change the configuration. It seems they try to treat the nodes as mixed nodes already, but fail to connect to them on the port 9091 as they are still controllers only.

It looks like the eventually roll them after the timeout expires. But it is not clear if they properly ensure the quorum availability during this. If nothing else, we have to check that this does not break the quorum -> but if possible, we should try to improve the process.

2023-12-05 15:05:47 INFO  KafkaAssemblyOperator:844 - Reconciliation #735(watch) Kafka(myproject/my-cluster): KafkaNodePool controllers in namespace myproject was MODIFIED
2023-12-05 15:05:47 INFO  AbstractOperator:265 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Kafka my-cluster will be checked for creation or modification
2023-12-05 15:05:47 INFO  CrdOperator:122 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Status of KafkaNodePool controllers in namespace myproject has been updated
2023-12-05 15:05:47 WARN  NetworkClient:814 - [AdminClient clientId=adminclient-226] Connection to node -7 (my-cluster-controllers-0.my-cluster-kafka-brokers.myproject.svc.cluster.local/172.16.14.234:9091) could not be established. Broker may not be available.
2023-12-05 15:06:17 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-0/0 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 250ms
2023-12-05 15:06:47 INFO  AbstractOperator:400 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Reconciliation is in progress
2023-12-05 15:06:47 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-1/1 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 250ms
2023-12-05 15:07:17 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-2/2 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 250ms
2023-12-05 15:07:47 INFO  AbstractOperator:400 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Reconciliation is in progress
2023-12-05 15:07:47 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-0/0 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 500ms
2023-12-05 15:08:06 INFO  ClusterOperator:142 - Triggering periodic reconciliation for namespace myproject
2023-12-05 15:08:17 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-1/1 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 500ms
2023-12-05 15:08:47 INFO  AbstractOperator:400 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Reconciliation is in progress
2023-12-05 15:08:48 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-2/2 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 500ms
2023-12-05 15:09:18 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-0/0 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 1000ms
2023-12-05 15:09:47 INFO  AbstractOperator:400 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Reconciliation is in progress
2023-12-05 15:09:48 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-1/1 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 1000ms
2023-12-05 15:10:18 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-2/2 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 1000ms
2023-12-05 15:10:47 INFO  AbstractOperator:400 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Reconciliation is in progress
2023-12-05 15:10:48 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-0/0 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 2000ms
2023-12-05 15:11:18 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-1/1 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 2000ms
2023-12-05 15:11:47 INFO  AbstractOperator:400 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Reconciliation is in progress
2023-12-05 15:11:48 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-2/2 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 2000ms
2023-12-05 15:12:06 INFO  ClusterOperator:142 - Triggering periodic reconciliation for namespace myproject
2023-12-05 15:12:18 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-0/0 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 4000ms
2023-12-05 15:12:47 INFO  AbstractOperator:400 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Reconciliation is in progress
2023-12-05 15:12:48 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-1/1 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 4000ms
2023-12-05 15:13:18 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-2/2 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 4000ms
2023-12-05 15:13:47 INFO  AbstractOperator:400 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Reconciliation is in progress
2023-12-05 15:13:48 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-0/0 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 8000ms
2023-12-05 15:14:18 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-1/1 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 8000ms
2023-12-05 15:14:47 INFO  AbstractOperator:400 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Reconciliation is in progress
2023-12-05 15:14:48 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-2/2 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 8000ms
2023-12-05 15:15:18 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-0/0 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 16000ms
2023-12-05 15:15:47 INFO  AbstractOperator:400 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Reconciliation is in progress
2023-12-05 15:15:48 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-1/1 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 16000ms
2023-12-05 15:16:18 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-2/2 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 16000ms
2023-12-05 15:16:47 INFO  AbstractOperator:400 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Reconciliation is in progress
2023-12-05 15:16:48 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-0/0 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 32000ms
2023-12-05 15:17:18 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-1/1 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 32000ms
2023-12-05 15:17:47 INFO  AbstractOperator:400 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Reconciliation is in progress
2023-12-05 15:17:48 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-2/2 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 32000ms
2023-12-05 15:18:18 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-0/0 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 64000ms
2023-12-05 15:18:47 INFO  AbstractOperator:400 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Reconciliation is in progress
2023-12-05 15:18:48 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-1/1 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 64000ms
2023-12-05 15:19:18 INFO  KafkaRoller:396 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Will temporarily skip verifying pod my-cluster-controllers-2/2 is up-to-date due to ForceableProblem: Error getting broker config, retrying after at least 64000ms
2023-12-05 15:19:47 INFO  AbstractOperator:400 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Reconciliation is in progress
2023-12-05 15:19:52 WARN  KafkaRoller:501 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Pod my-cluster-controllers-0/0 will be force-rolled, due to error: Pod my-cluster-controllers-0 is the active controller and there are other pods to verify first
2023-12-05 15:19:52 INFO  PodOperator:54 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Rolling pod my-cluster-controllers-0
2023-12-05 15:20:44 WARN  KafkaRoller:501 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Pod my-cluster-controllers-1/1 will be force-rolled, due to error: Pod my-cluster-controllers-1 is the active controller and there are other pods to verify first
2023-12-05 15:20:44 INFO  PodOperator:54 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Rolling pod my-cluster-controllers-1
2023-12-05 15:20:47 INFO  AbstractOperator:400 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Reconciliation is in progress
2023-12-05 15:21:37 INFO  KafkaRoller:476 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Rolling Pod my-cluster-controllers-2/2 due to [Pod has old revision]
2023-12-05 15:21:37 INFO  PodOperator:54 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Rolling pod my-cluster-controllers-2
2023-12-05 15:21:47 INFO  AbstractOperator:400 - Reconciliation #735(watch) Kafka(myproject/my-cluster): Reconciliation is in progress
2023-12-05 15:22:06 INFO  ClusterOperator:142 - Triggering periodic reconciliation for namespace myproject
2023-12-05 15:22:21 INFO  AbstractOperator:537 - Reconciliation #735(watch) Kafka(myproject/my-cluster): reconciled
@scholzj
Copy link
Member Author

scholzj commented Dec 14, 2023

Discussed in the community call on 14.12.: Should be looked into and improved. There seem to be several options how to find out the current role of the node rather than the desired role from the NodeRef objects:

  • Use the labels from the Pods in Kafka Roller to determine the current roles
  • Use the Admin API on a cluster level to determine it

scholzj added a commit to scholzj/strimzi-kafka-operator that referenced this issue Feb 14, 2024
scholzj added a commit to scholzj/strimzi-kafka-operator that referenced this issue Feb 14, 2024
scholzj added a commit to scholzj/strimzi-kafka-operator that referenced this issue Feb 14, 2024
scholzj added a commit to scholzj/strimzi-kafka-operator that referenced this issue Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant