Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Wrong error message on resource conflicts due to same topicName #9270

Closed
fvaleri opened this issue Oct 19, 2023 · 5 comments
Closed

Comments

@fvaleri
Copy link
Contributor

fvaleri commented Oct 19, 2023

Bug Description

This is related to the UnidirectionalTopicOperator (UTO).

It may happen that two topics with the same topicName are created at the exact same creationTime. Nevertheless, only the first one is reconciled, while the other reports a ResourceConflict in its status (single resource principle: 1 kt -> 1 topic -> 1 cluster).

The problem is that, when we create the condition message for the failing KafkaTopic status, we use creationTime to determine the name of the reconciled topic, so it can happen to get the wrong name (i.e. KubeRef of the failing KafkaTopic).

https://github.com/strimzi/strimzi-kafka-operator/blob/main/topic-operator/src/main/java/io/strimzi/operator/topic/v2/BatchingTopicController.java#L209

Example:

$ kubectl get kt -o yaml -n test 
apiVersion: v1
items:
- apiVersion: kafka.strimzi.io/v1beta2
  kind: KafkaTopic
  metadata:
    creationTimestamp: "2023-10-19T14:46:47Z"
    generation: 1
    labels:
      bar: BAR
      foo: FOO
    name: bar
    namespace: test
    resourceVersion: "56920"
    uid: 89947e54-f9b8-4bf7-a287-1e1b3427fe2b
  spec:
    config:
      compression.type: snappy
    partitions: 1
    replicas: 1
    topicName: foo
  status:
    conditions:
    - lastTransitionTime: "2023-10-19T14:46:47.311487120Z"
      message: Managed by Ref{namespace='test', name='bar'}
      reason: ResourceConflict
      status: "False"
      type: Ready
    observedGeneration: 1
    topicName: foo
- apiVersion: kafka.strimzi.io/v1beta2
  kind: KafkaTopic
  metadata:
    creationTimestamp: "2023-10-19T14:46:47Z"
    finalizers:
    - strimzi.io/topic-operator
    generation: 1
    labels:
      bar: BAR
      foo: FOO
    name: foo
    namespace: test
    resourceVersion: "56918"
    uid: f7e7bae1-3bd6-4afa-aa2e-45349e926190
  spec:
    partitions: 1
    replicas: 1
  status:
    conditions:
    - lastTransitionTime: "2023-10-19T14:46:47.230152899Z"
      status: "True"
      type: Ready
    observedGeneration: 1
    topicName: foo
kind: List
metadata:
  resourceVersion: ""

Steps to reproduce

Run the flaky test TopicControllerIT.shouldFailIfNumPartitionsDivergedWithConfigChange multiple times.

Expected behavior

No response

Strimzi version

0.38.0

Kubernetes version

1.27.3

Installation method

No response

Infrastructure

No response

Configuration files and logs

No response

Additional context

No response

@fvaleri
Copy link
Contributor Author

fvaleri commented Oct 19, 2023

@tombentley fyi

@tombentley
Copy link
Member

Can we sort first by creationTimestamp and then by (e.g.) uid and change the logic around lines 214 to 230?

@ppatierno
Copy link
Member

Discussed on the Community call on 02.11.2023: @fvaleri is already working on it.

@tombentley
Copy link
Member

I've looked at this and #9303 in more detail. The current algorithm is certainly wrong. The question is really how best to fix it. Fundamentally the problem revolves around the fact that Kubernetes doesn't provide us with an efficient way to query the KafkaTopics to find those trying to manage the same topic in Kafka. It's for this reason that we maintain the BatchingTopicController#topics map.

That map isn't being maintained correctly. In particular, it is added-to only as a side-effect of a reconciliation. This means that when the UTO starts up (when there are pre-existing KafkaTopics), it has partial information for a time. But the UTO will still act on this partial information as it is complete.

I think the algorithm we want would prioritise the existing KafkaTopic which "owns" the topic in Kafka, even if it's not as old as some other KafkaTopic for the same topic. In other words, consistency over ownership is more important than age. Its only when there is no existing owner that we should consider the creation time. And when we do use the creation time it's only really best effort (i.e. we shouldn't assume that the modifications to #topics are driven by observations where creation time is monotonic). It looks something like this:

  1. find any existing owner, defined as a ready KafkaTopic resource managing the topic in Kafka
  2. If there is an existing owner
    1. If this resource is the existing owner, then return right(true) (i.e. reconcile it)
    2. If this resource is not the existing owner then return left(ResourceConflict) (i.e. mark it as ready=false, reason=ResourceConflict)
  3. Otherwise there is no existing owner. Find all the resources managing this topic and order them first by increasing creation date and then by something that cannot vary over time (metadata.uid will do).
    1. If the first resource in this list is this resource then return right(true)
    2. Otherwise return left(ResourceConflict)

Important properties of this algorithm are:

  1. Only a single KafkaTopic will be reconciled with ready=true (i.e. the owner is unique)
  2. The owner cannot change, even if the operator doesn't encounter KafkaTopic resources in order of increasing creation time, and even when there are several oldest resources.
  3. The algorithm works consistently no matter how KafkaTopics get batched (a consequence of 2).

One necessary precondition of this algorithm is that we build BatchingTopicController#topics before processing any events, so that when we do process events we're using complete information. Once built, it's enough to maintain the field as a side effect of reconciliations.

This is of sufficient complexity that it's probably worth pulling the management of #topics in to a separate class (OwnerIndex?) that can easily be understood in isolation and unit tested independently.

Thoughts @fvaleri ?

@fvaleri
Copy link
Contributor Author

fvaleri commented Nov 24, 2023

TBH, I don't see any risk of ownership change. The problem is limited to the error message on the failed KafkaTopic CR, which sometimes reports the wrong KafkaTopic name that is managing the Kafka topic. This is a corner case that can happen only when there are two KafkaTopics with the same topicName and creationTime processed in the same batch.

As an example, let's say we have t1 and t2 with the same topicName that are created at about the same time (t1.topicName == t2.topicName, t1.creationTime == t2.creationTime). If the UTO process them in the same batch, then it can happen that t1 is reconciled and t2 fails with an error that says it is already managed by t2, when it should say t1. This is a random behavior determined by the wrong ordering in validateSingleManagingResource.

My last commit in #9303 addresses exactly this corner case, introducing the concept of processing order. Do you see anything wrong with that logic?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants