Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of alive brokers '1' does not meet the required replication factor '3' for the transactions state topic #1147

Closed
ebbnflow opened this issue Jan 15, 2021 · 10 comments
Labels
Impacts Documentation Use this to tag Pull Requests that introduce changes that should be documented. type/question Further information is requested

Comments

@ebbnflow
Copy link

ebbnflow commented Jan 15, 2021

Hi all,

I am using docker to spin up a kafka broker and zookeeper in a single node config for local development.

When I spin up the kstreams flavor of the registry it brings down my broker.

    broker:
      image: docker.csx.com/confluentinc/cp-kafka
      hostname: broker
      container_name: broker
      depends_on:
        - zookeeper
      ports:
        - "9092:9092"
        - "29092:29092"
      environment:
        KAFKA_BROKER_ID: 1
        KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
        KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
        KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:9092,PLAINTEXT_HOST://localhost:29092
        KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      networks:
        - testharnessnetwork
        
    zookeeper:
      image: docker.csx.com/confluentinc/cp-zookeeper
      hostname: zookeeper
      container_name: zookeeper
      ports:
        - "2181:2181"
      environment:
        ZOOKEEPER_CLIENT_PORT: 2181
        ZOOKEEPER_TICK_TIME: 2000
      networks:
        - testharnessnetwork

    schemaregistry:
      container_name: schemaregistry
      image: apicurio/apicurio-registry-streams:1.3.2.Final
      ports:
        - 8081:8080
        - 9000
      environment:
        KAFKA_BOOTSTRAP_SERVERS: broker:9092
        QUARKUS_PROFILE: prod
        APPLICATION_ID: registry_id
        APPLICATION_SERVER: localhost:9000
        REGISTRY_STREAMS_TOPOLOGY_REPLICATION_FACTOR: 1
      networks:
        - testharnessnetwork

The Broker logs when spinning up the registry:

[2021-01-14 18:16:57,081] DEBUG [Controller id=1] Topics not in preferred replica for broker 1 Map() (kafka.controller.KafkaController)
[2021-01-14 18:16:57,081] TRACE [Controller id=1] Leader imbalance ratio for broker 1 is 0.0 (kafka.controller.KafkaController)
[2021-01-14 18:20:07,326] INFO [GroupCoordinator 1]: Preparing to rebalance group registry_id in state PreparingRebalance with old generation 0 (__consumer_offsets-21) (reason: Adding new member registry_id-d31ed249-8229-41ad-a619-49fcad96a55d-StreamThread-2-consumer-9d1dc7fa-6ae1-41fd-8cae-b208b7d85bdb with group instanceid None) (kafka.coordinator.group.GroupCoordinator)
[2021-01-14 18:20:13,333] INFO [GroupCoordinator 1]: Stabilized group registry_id generation 1 (__consumer_offsets-21) (kafka.coordinator.group.GroupCoordinator)
[2021-01-14 18:20:13,454] INFO Creating topic registry_id-global-id-store-changelog with configuration {message.timestamp.type=CreateTime, segment.bytes=67108864, min.compaction.lag.ms=0, cleanup.policy=compact} and initial partition assignment Map(0 -> ArrayBuffer(1)) (kafka.zk.AdminZkClient)
[2021-01-14 18:20:13,536] INFO Creating topic registry_id-storage-store-changelog with configuration {message.timestamp.type=CreateTime, segment.bytes=67108864, min.compaction.lag.ms=0, cleanup.policy=compact} and initial partition assignment Map(0 -> ArrayBuffer(1)) (kafka.zk.AdminZkClient)
[2021-01-14 18:20:13,586] INFO [Controller id=1] New topics: [Set(registry_id-global-id-store-changelog)], deleted topics: [Set()], new partition replica assignment [Map(registry_id-global-id-store-changelog-0 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=))] (kafka.controller.KafkaController)
[2021-01-14 18:20:13,590] INFO [Controller id=1] New partition creation callback for registry_id-global-id-store-changelog-0 (kafka.controller.KafkaController)
[2021-01-14 18:20:13,596] TRACE [Controller id=1 epoch=1] Changed partition registry_id-global-id-store-changelog-0 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-14 18:20:13,610] TRACE [Controller id=1 epoch=1] Changed state of replica 1 for partition registry_id-global-id-store-changelog-0 from NonExistentReplica to NewReplica (state.change.logger)
[2021-01-14 18:20:13,645] TRACE [Controller id=1 epoch=1] Changed partition registry_id-global-id-store-changelog-0 from NewPartition to OnlinePartition with state LeaderAndIsr(leader=1, leaderEpoch=0, isr=List(1), zkVersion=0) (state.change.logger)
[2021-01-14 18:20:13,646] TRACE [Controller id=1 epoch=1] Sending become-leader LeaderAndIsr request LeaderAndIsrPartitionState(topicName='registry_id-global-id-store-changelog', partitionIndex=0, controllerEpoch=1, leader=1, leaderEpoch=0, isr=[1], zkVersion=0, replicas=[1], addingReplicas=[], removingReplicas=[], isNew=true) to broker 1 for partition registry_id-global-id-store-changelog-0 (state.change.logger)
[2021-01-14 18:20:13,652] TRACE [Controller id=1 epoch=1] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='registry_id-global-id-store-changelog', partitionIndex=0, controllerEpoch=1, leader=1, leaderEpoch=0, isr=[1], zkVersion=0, replicas=[1], offlineReplicas=[]) to brokers Set(1) for partition registry_id-global-id-store-changelog-0 (state.change.logger)
[2021-01-14 18:20:13,653] TRACE [Controller id=1 epoch=1] Changed state of replica 1 for partition registry_id-global-id-store-changelog-0 from NewReplica to OnlineReplica (state.change.logger)
[2021-01-14 18:20:13,664] INFO [Controller id=1] New topics: [Set(registry_id-storage-store-changelog)], deleted topics: [Set()], new partition replica assignment [Map(registry_id-storage-store-changelog-0 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=))] (kafka.controller.KafkaController)
[2021-01-14 18:20:13,664] INFO [Controller id=1] New partition creation callback for registry_id-storage-store-changelog-0 (kafka.controller.KafkaController)
[2021-01-14 18:20:13,664] TRACE [Controller id=1 epoch=1] Changed partition registry_id-storage-store-changelog-0 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-14 18:20:13,665] TRACE [Controller id=1 epoch=1] Changed state of replica 1 for partition registry_id-storage-store-changelog-0 from NonExistentReplica to NewReplica (state.change.logger)
[2021-01-14 18:20:13,681] TRACE [Controller id=1 epoch=1] Changed partition registry_id-storage-store-changelog-0 from NewPartition to OnlinePartition with state LeaderAndIsr(leader=1, leaderEpoch=0, isr=List(1), zkVersion=0) (state.change.logger)
[2021-01-14 18:20:13,681] TRACE [Controller id=1 epoch=1] Sending become-leader LeaderAndIsr request LeaderAndIsrPartitionState(topicName='registry_id-storage-store-changelog', partitionIndex=0, controllerEpoch=1, leader=1, leaderEpoch=0, isr=[1], zkVersion=0, replicas=[1], addingReplicas=[], removingReplicas=[], isNew=true) to broker 1 for partition registry_id-storage-store-changelog-0 (state.change.logger)
[2021-01-14 18:20:13,681] TRACE [Controller id=1 epoch=1] Sending UpdateMetadata request UpdateMetadataPartitionState(topicName='registry_id-storage-store-changelog', partitionIndex=0, controllerEpoch=1, leader=1, leaderEpoch=0, isr=[1], zkVersion=0, replicas=[1], offlineReplicas=[]) to brokers Set(1) for partition registry_id-storage-store-changelog-0 (state.change.logger)
[2021-01-14 18:20:13,682] TRACE [Controller id=1 epoch=1] Changed state of replica 1 for partition registry_id-storage-store-changelog-0 from NewReplica to OnlineReplica (state.change.logger)
[2021-01-14 18:20:13,705] INFO [RequestSendThread controllerId=1] Controller 1 connected to broker:9092 (id: 1 rack: null) for sending state change requests (kafka.controller.RequestSendThread)
[2021-01-14 18:20:13,710] TRACE [Broker id=1] Received LeaderAndIsr request LeaderAndIsrPartitionState(topicName='registry_id-global-id-store-changelog', partitionIndex=0, controllerEpoch=1, leader=1, leaderEpoch=0, isr=[1], zkVersion=0, replicas=[1], addingReplicas=[], removingReplicas=[], isNew=true) correlation id 21 from controller 1 epoch 1 (state.change.logger)
[2021-01-14 18:20:13,719] TRACE [Broker id=1] Handling LeaderAndIsr request correlationId 21 from controller 1 epoch 1 starting the become-leader transition for partition registry_id-global-id-store-changelog-0 (state.change.logger)
[2021-01-14 18:20:13,722] INFO [ReplicaFetcherManager on broker 1] Removed fetcher for partitions Set(registry_id-global-id-store-changelog-0) (kafka.server.ReplicaFetcherManager)
[2021-01-14 18:20:13,754] INFO [Log partition=registry_id-global-id-store-changelog-0, dir=/var/lib/kafka/data] Loading producer state till offset 0 with message format version 2 (kafka.log.Log)
[2021-01-14 18:20:13,755] INFO [Log partition=registry_id-global-id-store-changelog-0, dir=/var/lib/kafka/data] Completed load of log with 1 segments, log start offset 0 and log end offset 0 in 18 ms (kafka.log.Log)
[2021-01-14 18:20:13,758] INFO Created log for partition registry_id-global-id-store-changelog-0 in /var/lib/kafka/data/registry_id-global-id-store-changelog-0 with properties {compression.type -> producer, message.downconversion.enable -> true, min.insync.replicas -> 1, segment.jitter.ms -> 0, cleanup.policy -> compact, flush.ms -> 9223372036854775807, segment.bytes -> 67108864, retention.ms -> 604800000, flush.messages -> 9223372036854775807, message.format.version -> 2.4-IV1, file.delete.delay.ms -> 60000, max.compaction.lag.ms -> 9223372036854775807, max.message.bytes -> 1000012, min.compaction.lag.ms -> 0, message.timestamp.type -> CreateTime, preallocate -> false, min.cleanable.dirty.ratio -> 0.5, index.interval.bytes -> 4096, unclean.leader.election.enable -> false, retention.bytes -> -1, delete.retention.ms -> 86400000, segment.ms -> 604800000, message.timestamp.difference.max.ms -> 9223372036854775807, segment.index.bytes -> 10485760}. (kafka.log.LogManager)
[2021-01-14 18:20:13,760] INFO [Partition registry_id-global-id-store-changelog-0 broker=1] No checkpointed highwatermark is found for partition registry_id-global-id-store-changelog-0 (kafka.cluster.Partition)
[2021-01-14 18:20:13,761] INFO [Partition registry_id-global-id-store-changelog-0 broker=1] Log loaded for partition registry_id-global-id-store-changelog-0 with initial high watermark 0 (kafka.cluster.Partition)
[2021-01-14 18:20:13,761] INFO [Partition registry_id-global-id-store-changelog-0 broker=1] registry_id-global-id-store-changelog-0 starts at leader epoch 0 from offset 0 with high watermark 0. Previous leader epoch was -1. (kafka.cluster.Partition)
[2021-01-14 18:20:13,766] TRACE [Broker id=1] Stopped fetchers as part of become-leader request from controller 1 epoch 1 with correlation id 21 for partition registry_id-global-id-store-changelog-0 (last update controller epoch 1) (state.change.logger)
[2021-01-14 18:20:13,767] TRACE [Broker id=1] Completed LeaderAndIsr request correlationId 21 from controller 1 epoch 1 for the become-leader transition for partition registry_id-global-id-store-changelog-0 (state.change.logger)
[2021-01-14 18:20:13,776] TRACE [Controller id=1 epoch=1] Received response {error_code=0,partition_errors=[{topic_name=registry_id-global-id-store-changelog,partition_index=0,error_code=0,_tagged_fields={}}],_tagged_fields={}} for request LEADER_AND_ISR with correlation id 21 sent to broker broker:9092 (id: 1 rack: null) (state.change.logger)
[2021-01-14 18:20:13,782] TRACE [Broker id=1] Cached leader info UpdateMetadataPartitionState(topicName='registry_id-global-id-store-changelog', partitionIndex=0, controllerEpoch=1, leader=1, leaderEpoch=0, isr=[1], zkVersion=0, replicas=[1], offlineReplicas=[]) for partition registry_id-global-id-store-changelog-0 in response to UpdateMetadata request sent by controller 1 epoch 1 with correlation id 22 (state.change.logger)
[2021-01-14 18:20:13,785] TRACE [Controller id=1 epoch=1] Received response {error_code=0,_tagged_fields={}} for request UPDATE_METADATA with correlation id 22 sent to broker broker:9092 (id: 1 rack: null) (state.change.logger)
[2021-01-14 18:20:13,786] TRACE [Broker id=1] Received LeaderAndIsr request LeaderAndIsrPartitionState(topicName='registry_id-storage-store-changelog', partitionIndex=0, controllerEpoch=1, leader=1, leaderEpoch=0, isr=[1], zkVersion=0, replicas=[1], addingReplicas=[], removingReplicas=[], isNew=true) correlation id 23 from controller 1 epoch 1 (state.change.logger)
[2021-01-14 18:20:13,790] TRACE [Broker id=1] Handling LeaderAndIsr request correlationId 23 from controller 1 epoch 1 starting the become-leader transition for partition registry_id-storage-store-changelog-0 (state.change.logger)
[2021-01-14 18:20:13,790] INFO [ReplicaFetcherManager on broker 1] Removed fetcher for partitions Set(registry_id-storage-store-changelog-0) (kafka.server.ReplicaFetcherManager)
[2021-01-14 18:20:13,798] INFO [Log partition=registry_id-storage-store-changelog-0, dir=/var/lib/kafka/data] Loading producer state till offset 0 with message format version 2 (kafka.log.Log)
[2021-01-14 18:20:13,799] INFO [Log partition=registry_id-storage-store-changelog-0, dir=/var/lib/kafka/data] Completed load of log with 1 segments, log start offset 0 and log end offset 0 in 4 ms (kafka.log.Log)
[2021-01-14 18:20:13,803] INFO Created log for partition registry_id-storage-store-changelog-0 in /var/lib/kafka/data/registry_id-storage-store-changelog-0 with properties {compression.type -> producer, message.downconversion.enable -> true, min.insync.replicas -> 1, segment.jitter.ms -> 0, cleanup.policy -> compact, flush.ms -> 9223372036854775807, segment.bytes -> 67108864, retention.ms -> 604800000, flush.messages -> 9223372036854775807, message.format.version -> 2.4-IV1, file.delete.delay.ms -> 60000, max.compaction.lag.ms -> 9223372036854775807, max.message.bytes -> 1000012, min.compaction.lag.ms -> 0, message.timestamp.type -> CreateTime, preallocate -> false, min.cleanable.dirty.ratio -> 0.5, index.interval.bytes -> 4096, unclean.leader.election.enable -> false, retention.bytes -> -1, delete.retention.ms -> 86400000, segment.ms -> 604800000, message.timestamp.difference.max.ms -> 9223372036854775807, segment.index.bytes -> 10485760}. (kafka.log.LogManager)
[2021-01-14 18:20:13,805] INFO [Partition registry_id-storage-store-changelog-0 broker=1] No checkpointed highwatermark is found for partition registry_id-storage-store-changelog-0 (kafka.cluster.Partition)
[2021-01-14 18:20:13,805] INFO [Partition registry_id-storage-store-changelog-0 broker=1] Log loaded for partition registry_id-storage-store-changelog-0 with initial high watermark 0 (kafka.cluster.Partition)
[2021-01-14 18:20:13,805] INFO [Partition registry_id-storage-store-changelog-0 broker=1] registry_id-storage-store-changelog-0 starts at leader epoch 0 from offset 0 with high watermark 0. Previous leader epoch was -1. (kafka.cluster.Partition)
[2021-01-14 18:20:13,818] TRACE [Broker id=1] Stopped fetchers as part of become-leader request from controller 1 epoch 1 with correlation id 23 for partition registry_id-storage-store-changelog-0 (last update controller epoch 1) (state.change.logger)
[2021-01-14 18:20:13,818] TRACE [Broker id=1] Completed LeaderAndIsr request correlationId 23 from controller 1 epoch 1 for the become-leader transition for partition registry_id-storage-store-changelog-0 (state.change.logger)
[2021-01-14 18:20:13,820] TRACE [Controller id=1 epoch=1] Received response {error_code=0,partition_errors=[{topic_name=registry_id-storage-store-changelog,partition_index=0,error_code=0,_tagged_fields={}}],_tagged_fields={}} for request LEADER_AND_ISR with correlation id 23 sent to broker broker:9092 (id: 1 rack: null) (state.change.logger)
[2021-01-14 18:20:13,825] TRACE [Broker id=1] Cached leader info UpdateMetadataPartitionState(topicName='registry_id-storage-store-changelog', partitionIndex=0, controllerEpoch=1, leader=1, leaderEpoch=0, isr=[1], zkVersion=0, replicas=[1], offlineReplicas=[]) for partition registry_id-storage-store-changelog-0 in response to UpdateMetadata request sent by controller 1 epoch 1 with correlation id 24 (state.change.logger)
[2021-01-14 18:20:13,830] TRACE [Controller id=1 epoch=1] Received response {error_code=0,_tagged_fields={}} for request UPDATE_METADATA with correlation id 24 sent to broker broker:9092 (id: 1 rack: null) (state.change.logger)
[2021-01-14 18:20:13,886] INFO [GroupCoordinator 1]: Assignment received from leader for group registry_id for generation 1 (kafka.coordinator.group.GroupCoordinator)
[2021-01-14 18:20:14,068] ERROR [KafkaApi-1] Number of alive brokers '1' does not meet the required replication factor '3' for the transactions state topic (configured via 'transaction.state.log.replication.factor'). This error can be ignored if the cluster is starting up and not all brokers are up yet. (kafka.server.KafkaApis)
[2021-01-14 18:20:14,070] ERROR [KafkaApi-1] Number of alive brokers '1' does not meet the required replication factor '3' for the transactions state topic (configured via 'transaction.state.log.replication.factor'). This error can be ignored if the cluster is starting up and not all brokers are up yet. (kafka.server.KafkaApis)
[2021-01-14 18:20:14,172] ERROR [KafkaApi-1] Number of alive brokers '1' does not meet the required replication factor '3' for the transactions state topic (configured via 'transaction.state.log.replication.factor'). This error can be ignored if the cluster is starting up and not all brokers are up yet. (kafka.server.KafkaApis)
[2021-01-14 18:20:14,274] ERROR [KafkaApi-1] Number of alive brokers '1' does not meet the required replication factor '3' for the transactions state topic (configured via 'transaction.state.log.replication.factor'). This error can be ignored if the cluster is starting up and not all brokers are up yet. (kafka.server.KafkaApis)
[2021-01-14 18:20:14,276] ERROR [KafkaApi-1] Number of alive brokers '1' does not meet the required replication factor '3' for the transactions state topic (configured via 'transaction.state.log.replication.factor'). This error can be ignored if the cluster is starting up and not all brokers are up yet. (kafka.server.KafkaApis)
[2021-01-14 18:20:14,376] ERROR [KafkaApi-1] Number of alive brokers '1' does not meet the required replication factor '3' for the transactions state topic (configured via 'transaction.state.log.replication.factor'). This error can be ignored if the cluster is starting up and not all brokers are up yet. (kafka.server.KafkaApis)

The last message is repeated over and over again on a tight loop. Broker becomes non-responsive.

This is the message I get in the web console:

org.apache.kafka.streams.errors.InvalidStateStoreException: StreamsMetadata is currently unavailable. This can occur during rebalance operations. Store-name: storage-store
	at io.apicurio.registry.utils.streams.diservice.DistributedService.allServicesForStore(DistributedService.java:129)
	at io.apicurio.registry.utils.streams.diservice.DistributedService.allServicesForStoreStream(DistributedService.java:146)
	at io.apicurio.registry.utils.streams.distore.DistributedReadOnlyKeyValueStore.filter(DistributedReadOnlyKeyValueStore.java:88)
	at io.apicurio.registry.streams.StreamsRegistryConfiguration_ProducerMethod_storageKeyValueStore_8b765b799b96c8c4c3fcb5f5ac812de669c05315_ClientProxy.filter(StreamsRegistryConfiguration_ProducerMethod_storageKeyValueStore_8b765b799b96c8c4c3fcb5f5ac812de669c05315_ClientProxy.zig:249)
	at io.apicurio.registry.streams.StreamsRegistryStorage.searchArtifacts(StreamsRegistryStorage.java:467)
	at io.apicurio.registry.streams.StreamsRegistryStorage_Subclass.searchArtifacts$$superaccessor34(StreamsRegistryStorage_Subclass.zig:6802)
	at io.apicurio.registry.streams.StreamsRegistryStorage_Subclass$$function$$34.apply(StreamsRegistryStorage_Subclass$$function$$34.zig:63)
	at io.quarkus.arc.impl.AroundInvokeInvocationContext.proceed(AroundInvokeInvocationContext.java:54)
@EricWittmann EricWittmann added the type/question Further information is requested label Jan 19, 2021
@EricWittmann
Copy link
Member

@alesj Thoughts on this?

@alesj
Copy link
Contributor

alesj commented Jan 19, 2021

Who / what creates topics required for the Streams topology?
(since newly added auto-creation is not part of 1.3.2, right?)

@tombentley
Copy link
Collaborator

If it's a single (or < 3) node cluster you should configure transaction.state.log.replication.factor on the broker.

@ebbnflow
Copy link
Author

If it's a single (or < 3) node cluster you should configure transaction.state.log.replication.factor on the broker.

I tried that, it didn't work so I commented it out:

  broker:
    image: docker.csx.com/confluentinc/cp-kafka
    hostname: broker
    container_name: broker
    depends_on:
      - zookeeper
    ports:
      - "9092:9092"
      - "29092:29092"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:9092,PLAINTEXT_HOST://localhost:29092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
#      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
#      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
    networks:
      - testharnessnetwork

@tombentley
Copy link
Collaborator

The log message Number of alive brokers '1' does not meet the required replication factor '3' for the transactions state topic (configured via 'transaction.state.log.replication.factor'). This error can be ignored if the cluster is starting up and not all brokers are up yet. (kafka.server.KafkaApis) only occurs in one place in KafkaApis, and that's where the internal topics (__transaction_state and __consumer_offsets) get created. If you think you set transaction.state.log.replication.factor and you still saw that exact same message in the logs (complete with the required replication factor '3' part) then it means that config wasn't actually set in that broker.

@ebbnflow
Copy link
Author

ebbnflow commented Jan 20, 2021

It is set. See my broker config at the bottom. I'm getting a different error now:

The schema registry logs:

[2021-01-20 20:41:35,356] INFO [Transaction State Manager 1]: TransactionalId registry_id-1_0 append transaction log for TxnTransitMetadata(producerId=3000, producerEpoch=0, txnTimeoutMs=60000, txnState=Empty, topicPartitions=Set(), txnStartTimestamp=-1, txnLastUpdateTimestamp=1611175295356) transition failed due to COORDINATOR_NOT_AVAILABLE, resetting pending state from Some(Empty), aborting state transition and returning COORDINATOR_NOT_AVAILABLE in the callback (kafka.coordinator.transaction.TransactionStateManager)
[2021-01-20 20:41:35,356] INFO [TransactionCoordinator id=1] Returning COORDINATOR_NOT_AVAILABLE error code to client for registry_id-1_0's InitProducerId request (kafka.coordinator.transaction.TransactionCoordinator)
[2021-01-20 20:41:35,359] ERROR [ReplicaManager broker=1] Error processing append operation on partition __transaction_state-47 (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.NotEnoughReplicasException: The size of the current ISR Set(1) is insufficient to satisfy the min.isr requirement of 2 for partition __transaction_state-47

I start up my broker, and create the topics using the below commands. The broker starts up and runs fine. Once I spin up the streams registry, this happens in the logs.

I think the issue has something to do with the __transaction_state-XX topic that are auto created. The logs show it being created with [2021-01-20 20:28:28,722] INFO Created log for partition __transaction_state-25 in /var/lib/kafka/data/__transaction_state-25 with properties {compression.type -> uncompressed, message.downconversion.enable -> true, min.insync.replicas -> 2, (as shown below)

My topic creation script:

kafka-topics --create --zookeeper zookeeper:2181 --partitions 1 --replication-factor 1 --topic storage-topic   --config cleanup.policy=compact
kafka-topics --create --zookeeper zookeeper:2181 --partitions 1 --replication-factor 1 --topic global-id-topic --config cleanup.policy=compact
kafka-topics --create --zookeeper zookeeper:2181 --partitions 1 --replication-factor 1 --topic search-configs  --config cleanup.policy=compact
kafka-topics --create --zookeeper zookeeper:2181 --partitions 1 --replication-factor 1 --topic search-offsets  --config cleanup.policy=compact
kafka-topics --create --zookeeper zookeeper:2181 --partitions 1 --replication-factor 1 --topic search-status   --config cleanup.policy=compact
kafka-topics --create --zookeeper zookeeper:2181 --partitions 1 --replication-factor 1 --topic search-topic

The broker logs:

[2021-01-20 20:28:28,318] INFO [KafkaApi-1] Auto creation of topic __transaction_state with 50 partitions and replication factor 1 is successful (kafka.server.KafkaApis)
[2021-01-20 20:28:28,325] INFO [Controller id=1] New topics: [Set(__transaction_state)], deleted topics: [Set()], new partition replica assignment [Map(__transaction_state-42 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-31 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-45 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-15 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-12 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-7 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-46 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-48 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-49 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-28 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-2 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-20 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-24 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-13 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-0 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-37 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-3 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-21 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-29 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-39 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-38 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-6 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-14 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-10 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-44 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-9 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-22 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-43 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-4 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-30 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-33 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-32 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-25 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-17 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-23 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-47 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-18 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-26 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-36 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-5 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-8 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-16 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-11 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-40 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-19 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-27 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-41 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-1 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-34 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=), __transaction_state-35 -> ReplicaAssignment(replicas=1, addingReplicas=, removingReplicas=))] (kafka.controller.KafkaController)
...
[2021-01-20 20:28:28,333] INFO [Controller id=1] New partition creation callback for __transaction_state-42,__transaction_state-31,__transaction_state-45,__transaction_state-15,__transaction_state-12,__transaction_state-7,__transaction_state-46,__transaction_state-48,__transaction_state-49,__transaction_state-28,__transaction_state-2,__transaction_state-20,__transaction_state-24,__transaction_state-13,__transaction_state-0,__transaction_state-37,__transaction_state-3,__transaction_state-21,__transaction_state-29,__transaction_state-39,__transaction_state-38,__transaction_state-6,__transaction_state-14,__transaction_state-10,__transaction_state-44,__transaction_state-9,__transaction_state-22,__transaction_state-43,__transaction_state-4,__transaction_state-30,__transaction_state-33,__transaction_state-32,__transaction_state-25,__transaction_state-17,__transaction_state-23,__transaction_state-47,__transaction_state-18,__transaction_state-26,__transaction_state-36,__transaction_state-5,__transaction_state-8,__transaction_state-16,__transaction_state-11,__transaction_state-40,__transaction_state-19,__transaction_state-27,__transaction_state-41,__transaction_state-1,__transaction_state-34,__transaction_state-35 (kafka.controller.KafkaController)
...
[2021-01-20 20:28:28,337] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-42 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,337] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-31 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,337] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-45 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,337] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-15 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,339] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-12 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,339] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-7 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,340] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-46 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,340] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-48 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,340] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-49 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,340] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-28 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,340] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-2 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,340] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-20 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,340] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-24 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,341] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-13 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,342] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-0 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,342] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-37 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,342] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-3 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,342] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-21 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,343] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-29 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,343] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-39 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,344] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-38 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,344] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-6 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,344] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-14 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,344] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-10 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,344] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-44 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,344] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-9 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,344] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-22 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,345] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-43 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,346] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-4 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,346] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-30 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,346] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-33 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,347] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-32 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,347] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-25 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,347] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-17 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,348] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-23 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,348] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-47 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,349] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-18 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,349] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-26 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,350] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-36 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,350] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-5 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,350] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-8 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,350] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-16 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,350] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-11 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,350] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-40 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,350] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-19 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,350] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-27 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,350] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-41 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,350] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-1 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,350] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-34 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,350] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-35 state from NonExistentPartition to NewPartition with assigned replicas 1 (state.change.logger)
[2021-01-20 20:28:28,353] TRACE [Controller id=1 epoch=4] Changed state of replica 1 for partition __transaction_state-29 from NonExistentReplica to NewReplica (state.change.logger)
[2021-01-20 20:28:28,353] TRACE [Controller id=1 epoch=4] Changed state of replica 1 for partition __transaction_state-38 from NonExistentReplica to NewReplica (state.change.logger)
[2021-01-20 20:28:28,353] TRACE [Controller id=1 epoch=4] Changed state of replica 1 for partition __transaction_state-34 from NonExistentReplica to NewReplica (state.change.logger)
[2021-01-20 20:28:28,354] TRACE [Controller id=1 epoch=4] Changed state of replica 1 for partition __transaction_state-17 from NonExistentReplica to NewReplica (state.change.logger)
[2021-01-20 20:28:28,609] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-42 from NewPartition to OnlinePartition with state LeaderAndIsr(leader=1, leaderEpoch=0, isr=List(1), zkVersion=0) (state.change.logger)
[2021-01-20 20:28:28,609] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-31 from NewPartition to OnlinePartition with state LeaderAndIsr(leader=1, leaderEpoch=0, isr=List(1), zkVersion=0) (state.change.logger)
[2021-01-20 20:28:28,609] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-45 from NewPartition to OnlinePartition with state LeaderAndIsr(leader=1, leaderEpoch=0, isr=List(1), zkVersion=0) (state.change.logger)
[2021-01-20 20:28:28,610] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-15 from NewPartition to OnlinePartition with state LeaderAndIsr(leader=1, leaderEpoch=0, isr=List(1), zkVersion=0) (state.change.logger)
[2021-01-20 20:28:28,610] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-12 from NewPartition to OnlinePartition with state LeaderAndIsr(leader=1, leaderEpoch=0, isr=List(1), zkVersion=0) (state.change.logger)
[2021-01-20 20:28:28,610] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-7 from NewPartition to OnlinePartition with state LeaderAndIsr(leader=1, leaderEpoch=0, isr=List(1), zkVersion=0) (state.change.logger)
[2021-01-20 20:28:28,610] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-46 from NewPartition to OnlinePartition with state LeaderAndIsr(leader=1, leaderEpoch=0, isr=List(1), zkVersion=0) (state.change.logger)
[2021-01-20 20:28:28,610] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-48 from NewPartition to OnlinePartition with state LeaderAndIsr(leader=1, leaderEpoch=0, isr=List(1), zkVersion=0) (state.change.logger)
[2021-01-20 20:28:28,610] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-49 from NewPartition to OnlinePartition with state LeaderAndIsr(leader=1, leaderEpoch=0, isr=List(1), zkVersion=0) (state.change.logger)
[2021-01-20 20:28:28,610] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-28 from NewPartition to OnlinePartition with state LeaderAndIsr(leader=1, leaderEpoch=0, isr=List(1), zkVersion=0) (state.change.logger)
[2021-01-20 20:28:28,610] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-2 from NewPartition to OnlinePartition with state LeaderAndIsr(leader=1, leaderEpoch=0, isr=List(1), zkVersion=0) (state.change.logger)
[2021-01-20 20:28:28,610] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-20 from NewPartition to OnlinePartition with state LeaderAndIsr(leader=1, leaderEpoch=0, isr=List(1), zkVersion=0) (state.change.logger)
[2021-01-20 20:28:28,610] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-24 from NewPartition to OnlinePartition with state LeaderAndIsr(leader=1, leaderEpoch=0, isr=List(1), zkVersion=0) (state.change.logger)
[2021-01-20 20:28:28,610] TRACE [Controller id=1 epoch=4] Changed partition __transaction_state-13 from NewPartition to OnlinePartition with state LeaderAndIsr(leader=1, leaderEpoch=0, isr=List(1), zkVersion=0) (state.change.logger)
[2021-01-20 20:28:28,611] TRACE [Controller id=1 epoch=4] Sending become-leader LeaderAndIsr request LeaderAndIsrPartitionState(topicName='__transaction_state', partitionIndex=6, controllerEpoch=4, leader=1, leaderEpoch=0, isr=[1], zkVersion=0, replicas=[1], addingReplicas=[], removingReplicas=[], isNew=true) to broker 1 for partition __transaction_state-6 (state.change.logger)
[2021-01-20 20:28:28,611] TRACE [Controller id=1 epoch=4] Sending become-leader LeaderAndIsr request LeaderAndIsrPartitionState(topicName='__transaction_state', partitionIndex=39, controllerEpoch=4, leader=1, leaderEpoch=0, isr=[1], zkVersion=0, replicas=[1], addingReplicas=[], removingReplicas=[], isNew=true) to broker 1 for partition __transaction_state-39 (state.change.logger)
[2021-01-20 20:28:28,612] TRACE [Controller id=1 epoch=4] Sending become-leader LeaderAndIsr request LeaderAndIsrPartitionState(topicName='__transaction_state', partitionIndex=28, controllerEpoch=4, leader=1, leaderEpoch=0, isr=[1], zkVersion=0, replicas=[1], addingReplicas=[], removingReplicas=[], isNew=true) to broker 1 for partition __transaction_state-28 (state.change.logger)
[2021-01-20 20:28:28,611] TRACE [Controller id=1 epoch=4] Sending become-leader LeaderAndIsr request LeaderAndIsrPartitionState(topicName='__transaction_state', partitionIndex=6, controllerEpoch=4, leader=1, leaderEpoch=0, isr=[1], zkVersion=0, replicas=[1], addingReplicas=[], removingReplicas=[], isNew=true) to broker 1 for partition __transaction_state-6 (state.change.logger)
[2021-01-20 20:28:28,611] TRACE [Controller id=1 epoch=4] Sending become-leader LeaderAndIsr request LeaderAndIsrPartitionState(topicName='__transaction_state', partitionIndex=39, controllerEpoch=4, leader=1, leaderEpoch=0, isr=[1], zkVersion=0, replicas=[1], addingReplicas=[], removingReplicas=[], isNew=true) to broker 1 for partition __transaction_state-39 (state.change.logger)
[2021-01-20 20:28:28,612] TRACE [Controller id=1 epoch=4] Sending become-leader LeaderAndIsr request LeaderAndIsrPartitionState(topicName='__transaction_state', partitionIndex=28, controllerEpoch=4, leader=1, leaderEpoch=0, isr=[1], zkVersion=0, replicas=[1], addingReplicas=[], removingReplicas=[], isNew=true) to broker 1 for partition __transaction_state-28 (state.change.logger)
[2021-01-20 20:28:28,654] TRACE [Controller id=1 epoch=4] Changed state of replica 1 for partition __transaction_state-29 from NewReplica to OnlineReplica (state.change.logger)
[2021-01-20 20:28:28,654] TRACE [Controller id=1 epoch=4] Changed state of replica 1 for partition __transaction_state-38 from NewReplica to OnlineReplica (state.change.logger)
[2021-01-20 20:28:28,655] TRACE [Controller id=1 epoch=4] Changed state of replica 1 for partition __transaction_state-34 from NewReplica to OnlineReplica (state.change.logger)
[2021-01-20 20:28:28,656] TRACE [Controller id=1 epoch=4] Changed state of replica 1 for partition __transaction_state-17 from NewReplica to OnlineReplica (state.change.logger)
...
[2021-01-20 20:28:28,707] TRACE [Broker id=1] Handling LeaderAndIsr request correlationId 5 from controller 1 epoch 4 starting the become-leader transition for partition __transaction_state-25 (state.change.logger)
[2021-01-20 20:28:28,707] TRACE [Broker id=1] Handling LeaderAndIsr request correlationId 5 from controller 1 epoch 4 starting the become-leader transition for partition __transaction_state-6 (state.change.logger)
[2021-01-20 20:28:28,707] TRACE [Broker id=1] Handling LeaderAndIsr request correlationId 5 from controller 1 epoch 4 starting the become-leader transition for partition __transaction_state-44 (state.change.logger)
[2021-01-20 20:28:28,707] TRACE [Broker id=1] Handling LeaderAndIsr request correlationId 5 from controller 1 epoch 4 starting the become-leader transition for partition __transaction_state-41 (state.change.logger)
[2021-01-20 20:28:28,707] TRACE [Broker id=1] Handling LeaderAndIsr request correlationId 5 from controller 1 epoch 4 starting the become-leader transition for partition __transaction_state-3 (state.change.logger)
[2021-01-20 20:28:28,707] TRACE [Broker id=1] Handling LeaderAndIsr request correlationId 5 from controller 1 epoch 4 starting the become-leader transition for partition __transaction_state-22 (state.change.logger)
...
[2021-01-20 20:28:28,709] INFO [ReplicaFetcherManager on broker 1] Removed fetcher for partitions Set(__transaction_state-42, __transaction_state-31, __transaction_state-45, __transaction_state-15, __transaction_state-12, __transaction_state-7, __transaction_state-46, __transaction_state-48, __transaction_state-49, __transaction_state-28, __transaction_state-2, __transaction_state-20, __transaction_state-24, __transaction_state-13, __transaction_state-0, __transaction_state-37, __transaction_state-3, __transaction_state-21, __transaction_state-29, __transaction_state-39, __transaction_state-38, __transaction_state-6, __transaction_state-14, __transaction_state-10, __transaction_state-44, __transaction_state-9, __transaction_state-22, __transaction_state-43, __transaction_state-4, __transaction_state-30, __transaction_state-33, __transaction_state-32, __transaction_state-25, __transaction_state-17, __transaction_state-23, __transaction_state-47, __transaction_state-18, __transaction_state-26, __transaction_state-36, __transaction_state-5, __transaction_state-8, __transaction_state-16, __transaction_state-11, __transaction_state-40, __transaction_state-19, __transaction_state-27, __transaction_state-41, __transaction_state-1, __transaction_state-34, __transaction_state-35) (kafka.server.ReplicaFetcherManager)
[2021-01-20 20:28:28,720] INFO [Log partition=__transaction_state-25, dir=/var/lib/kafka/data] Loading producer state till offset 0 with message format version 2 (kafka.log.Log)
[2021-01-20 20:28:28,721] INFO [Log partition=__transaction_state-25, dir=/var/lib/kafka/data] Completed load of log with 1 segments, log start offset 0 and log end offset 0 in 4 ms (kafka.log.Log)
[2021-01-20 20:28:28,722] INFO Created log for partition __transaction_state-25 in /var/lib/kafka/data/__transaction_state-25 with properties {compression.type -> uncompressed, message.downconversion.enable -> true, min.insync.replicas -> 2, segment.jitter.ms -> 0, cleanup.policy -> compact, flush.ms -> 9223372036854775807, segment.bytes -> 104857600, retention.ms -> 604800000, flush.messages -> 9223372036854775807, message.format.version -> 2.4-IV1, file.delete.delay.ms -> 60000, max.compaction.lag.ms -> 9223372036854775807, max.message.bytes -> 1000012, min.compaction.lag.ms -> 0, message.timestamp.type -> CreateTime, preallocate -> false, min.cleanable.dirty.ratio -> 0.5, index.interval.bytes -> 4096, unclean.leader.election.enable -> false, retention.bytes -> -1, delete.retention.ms -> 86400000, segment.ms -> 604800000, message.timestamp.difference.max.ms -> 9223372036854775807, segment.index.bytes -> 10485760}. (kafka.log.LogManager)
...
[2021-01-20 20:28:29,434] TRACE [Broker id=1] Stopped fetchers as part of become-leader request from controller 1 epoch 4 with correlation id 5 for partition __transaction_state-9 (last update controller epoch 4) (state.change.logger)
[2021-01-20 20:28:29,434] TRACE [Broker id=1] Completed LeaderAndIsr request correlationId 5 from controller 1 epoch 4 for the become-leader transition for partition __transaction_state-25 (state.change.logger)
[2021-01-20 20:28:29,434] TRACE [Broker id=1] Completed LeaderAndIsr request correlationId 5 from controller 1 epoch 4 for the become-leader transition for partition __transaction_state-6 (state.change.logger)
[2021-01-20 20:28:29,434] TRACE [Broker id=1] Completed LeaderAndIsr request correlationId 5 from controller 1 epoch 4 for the become-leader transition for partition __transaction_state-44 (state.change.logger)
[2021-01-20 20:28:29,434] TRACE [Broker id=1] Completed LeaderAndIsr request correlationId 5 from controller 1 epoch 4 for the become-leader transition for partition __transaction_state-41 (state.change.logger)
...

The below error happens over and over again in the broker logs and the registry web console won't load - saying kstreams isn't running.

   [2021-01-20 20:28:44,149] ERROR [ReplicaManager broker=1] Error processing append operation on partition __transaction_state-47 (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.NotEnoughReplicasException: The size of the current ISR Set(1) is insufficient to satisfy the min.isr requirement of 2 for partition __transaction_state-47
[2021-01-20 20:28:44,150] INFO [Transaction State Manager 1]: TransactionalId registry_id-0_0 append transaction log for TxnTransitMetadata(producerId=3001, producerEpoch=0, txnTimeoutMs=60000, txnState=Empty, topicPartitions=Set(), txnStartTimestamp=-1, txnLastUpdateTimestamp=1611174524149) transition failed due to COORDINATOR_NOT_AVAILABLE, resetting pending state from Some(Empty), aborting state transition and returning COORDINATOR_NOT_AVAILABLE in the callback (kafka.coordinator.transaction.TransactionStateManager)
[2021-01-20 20:19:26,382] INFO KafkaConfig values:  
advertised.host.name = null 
advertised.listeners = PLAINTEXT://broker:9092,PLAINTEXT_HOST://localhost:29092 
advertised.port = null 
alter.config.policy.class.name = null 
alter.log.dirs.replication.quota.window.num = 11 
alter.log.dirs.replication.quota.window.size.seconds = 1 
authorizer.class.name =  
auto.create.topics.enable = true 
auto.leader.rebalance.enable = true 
background.threads = 10 
broker.id = 1 
broker.id.generation.enable = true 
broker.rack = null 
client.quota.callback.class = null 
compression.type = producer 
connection.failed.authentication.delay.ms = 100 
connections.max.idle.ms = 600000 
connections.max.reauth.ms = 0 
control.plane.listener.name = null 
controlled.shutdown.enable = true 
controlled.shutdown.max.retries = 3 
controlled.shutdown.retry.backoff.ms = 5000 
controller.socket.timeout.ms = 30000 
create.topic.policy.class.name = null 
default.replication.factor = 1 
delegation.token.expiry.check.interval.ms = 3600000 
delegation.token.expiry.time.ms = 86400000 
delegation.token.master.key = null 
delegation.token.max.lifetime.ms = 604800000 
delete.records.purgatory.purge.interval.requests = 1 
delete.topic.enable = true 
fetch.purgatory.purge.interval.requests = 1000 
group.initial.rebalance.delay.ms = 3000 
group.max.session.timeout.ms = 1800000 
group.max.size = 2147483647
group.max.session.timeout.ms = 1800000 
group.max.size = 2147483647 
group.min.session.timeout.ms = 6000 
host.name =  
inter.broker.listener.name = null 
inter.broker.protocol.version = 2.4-IV1 
kafka.metrics.polling.interval.secs = 10 
kafka.metrics.reporters = [] 
leader.imbalance.check.interval.seconds = 300 
leader.imbalance.per.broker.percentage = 10 
listener.security.protocol.map = PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT 
listeners = PLAINTEXT://0.0.0.0:9092,PLAINTEXT_HOST://0.0.0.0:29092 
log.cleaner.backoff.ms = 15000 
log.cleaner.dedupe.buffer.size = 134217728 
log.cleaner.delete.retention.ms = 86400000 
log.cleaner.enable = true 
log.cleaner.io.buffer.load.factor = 0.9 
log.cleaner.io.buffer.size = 524288 
log.cleaner.io.max.bytes.per.second = 1.7976931348623157E308 
log.cleaner.max.compaction.lag.ms = 9223372036854775807 
log.cleaner.min.cleanable.ratio = 0.5 
log.cleaner.min.compaction.lag.ms = 0 
log.cleaner.threads = 1 
log.cleanup.policy = [delete] 
log.dir = /tmp/kafka-logs 
log.dirs = /var/lib/kafka/data 
log.flush.interval.messages = 9223372036854775807 
log.flush.interval.ms = null 
log.flush.offset.checkpoint.interval.ms = 60000 
log.flush.scheduler.interval.ms = 9223372036854775807 
log.flush.start.offset.checkpoint.interval.ms = 60000 
log.index.interval.bytes = 4096 
log.index.size.max.bytes = 10485760 
log.message.downconversion.enable = true 
log.message.format.version = 2.4-IV1 
log.message.timestamp.difference.max.ms = 9223372036854775807 
log.message.timestamp.type = CreateTime 
log.preallocate = false 
log.retention.bytes = -1 
log.retention.check.interval.ms = 300000 
log.retention.hours = 168 
log.retention.minutes = null 
log.retention.ms = null 
log.roll.hours = 168 
log.roll.jitter.hours = 0 
log.roll.jitter.ms = null 
log.roll.ms = null 
log.segment.bytes = 1073741824 
log.segment.delete.delay.ms = 60000 
max.connections = 2147483647 
max.connections.per.ip = 2147483647 
max.connections.per.ip.overrides =  
max.incremental.fetch.session.cache.slots = 1000 
message.max.bytes = 1000012 
metric.reporters = [] 
metrics.num.samples = 2 
metrics.recording.level = INFO 
metrics.sample.window.ms = 30000 
min.insync.replicas = 1 
num.io.threads = 8 
num.network.threads = 3 
num.partitions = 1
num.recovery.threads.per.data.dir = 1 
num.replica.alter.log.dirs.threads = null 
num.replica.fetchers = 1 
offset.metadata.max.bytes = 4096 
offsets.commit.required.acks = -1 
offsets.commit.timeout.ms = 5000 
offsets.load.buffer.size = 5242880 
offsets.retention.check.interval.ms = 600000 
offsets.retention.minutes = 10080 
offsets.topic.compression.codec = 0 
offsets.topic.num.partitions = 50 
offsets.topic.replication.factor = 1 
offsets.topic.segment.bytes = 104857600 
password.encoder.cipher.algorithm = AES/CBC/PKCS5Padding 
password.encoder.iterations = 4096 
password.encoder.key.length = 128 
password.encoder.keyfactory.algorithm = null 
password.encoder.old.secret = null 
password.encoder.secret = null 
port = 9092 
principal.builder.class = null 
producer.purgatory.purge.interval.requests = 1000 
queued.max.request.bytes = -1 
queued.max.requests = 500 
quota.consumer.default = 9223372036854775807 
quota.producer.default = 9223372036854775807 
quota.window.num = 11 
quota.window.size.seconds = 1 
replica.fetch.backoff.ms = 1000 
replica.fetch.max.bytes = 1048576 
replica.fetch.min.bytes = 1 
replica.fetch.response.max.bytes = 10485760 
replica.fetch.wait.max.ms = 500 
replica.high.watermark.checkpoint.interval.ms = 5000 
replica.lag.time.max.ms = 10000 
replica.selector.class = null 
replica.socket.receive.buffer.bytes = 65536 
replica.socket.timeout.ms = 30000 
replication.quota.window.num = 11 
replication.quota.window.size.seconds = 1 
request.timeout.ms = 30000 
reserved.broker.max.id = 1000 
sasl.client.callback.handler.class = null 
sasl.enabled.mechanisms = [GSSAPI] 
sasl.jaas.config = null 
sasl.kerberos.kinit.cmd = /usr/bin/kinit 
sasl.kerberos.min.time.before.relogin = 60000 
sasl.kerberos.principal.to.local.rules = [DEFAULT] 
sasl.kerberos.service.name = null 
sasl.kerberos.ticket.renew.jitter = 0.05 
sasl.kerberos.ticket.renew.window.factor = 0.8 
sasl.login.callback.handler.class = null 
sasl.login.class = null 
sasl.login.refresh.buffer.seconds = 300 
sasl.login.refresh.min.period.seconds = 60 
sasl.login.refresh.window.factor = 0.8 
sasl.login.refresh.window.jitter = 0.05 
sasl.mechanism.inter.broker.protocol = GSSAPI 
sasl.server.callback.handler.class = null 
security.inter.broker.protocol = PLAINTEXT
security.providers = null 
socket.receive.buffer.bytes = 102400 
socket.request.max.bytes = 104857600 
socket.send.buffer.bytes = 102400 
ssl.cipher.suites = [] 
ssl.client.auth = none 
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1] 
ssl.endpoint.identification.algorithm = https 
ssl.key.password = null 
ssl.keymanager.algorithm = SunX509 
ssl.keystore.location = null 
ssl.keystore.password = null 
ssl.keystore.type = JKS 
ssl.principal.mapping.rules = DEFAULT 
ssl.protocol = TLS 
ssl.provider = null 
ssl.secure.random.implementation = null 
ssl.trustmanager.algorithm = PKIX 
ssl.truststore.location = null 
ssl.truststore.password = null 
ssl.truststore.type = JKS 
transaction.abort.timed.out.transaction.cleanup.interval.ms = 60000 
transaction.max.timeout.ms = 900000 
transaction.remove.expired.transaction.cleanup.interval.ms = 3600000 
transaction.state.log.load.buffer.size = 5242880 
transaction.state.log.min.isr = 2 
transaction.state.log.num.partitions = 50 
transaction.state.log.replication.factor = 1 
transaction.state.log.segment.bytes = 104857600 
transactional.id.expiration.ms = 604800000 
unclean.leader.election.enable = false 
zookeeper.connect = zookeeper:2181 
zookeeper.connection.timeout.ms = null 
zookeeper.max.in.flight.requests = 10 
zookeeper.session.timeout.ms = 6000 
zookeeper.set.acl = false 
zookeeper.sync.time.ms = 2000

reserved.broker.max.id = 1000 
sasl.client.callback.handler.class = null 
sasl.enabled.mechanisms = [GSSAPI] 
sasl.jaas.config = null 
sasl.kerberos.kinit.cmd = /usr/bin/kinit 
sasl.kerberos.min.time.before.relogin = 60000 
sasl.kerberos.principal.to.local.rules = [DEFAULT] 
sasl.kerberos.service.name = null 
sasl.kerberos.ticket.renew.jitter = 0.05 
sasl.kerberos.ticket.renew.window.factor = 0.8 
sasl.login.callback.handler.class = null 
sasl.login.class = null 
sasl.login.refresh.buffer.seconds = 300 
sasl.login.refresh.min.period.seconds = 60 
sasl.login.refresh.window.factor = 0.8 
sasl.login.refresh.window.jitter = 0.05 
sasl.mechanism.inter.broker.protocol = GSSAPI 
sasl.server.callback.handler.class = null 
security.inter.broker.protocol = PLAINTEXT 
security.providers = null 
socket.receive.buffer.bytes = 102400 
socket.request.max.bytes = 104857600 
socket.send.buffer.bytes = 102400 
ssl.cipher.suites = [] 
ssl.client.auth = none 
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1] 
ssl.endpoint.identification.algorithm = https 
ssl.key.password = null 
ssl.keymanager.algorithm = SunX509 
ssl.keystore.location = null 
ssl.keystore.password = null 
ssl.keystore.type = JKS 
ssl.principal.mapping.rules = DEFAULT 
ssl.protocol = TLS 
ssl.provider = null 
ssl.secure.random.implementation = null 
ssl.trustmanager.algorithm = PKIX 
ssl.truststore.location = null 
ssl.truststore.password = null 
ssl.truststore.type = JKS 
transaction.abort.timed.out.transaction.cleanup.interval.ms = 60000 
transaction.max.timeout.ms = 900000 
transaction.remove.expired.transaction.cleanup.interval.ms = 3600000 
transaction.state.log.load.buffer.size = 5242880 
transaction.state.log.min.isr = 2 
transaction.state.log.num.partitions = 50 
transaction.state.log.replication.factor = 1 <------ here:
transaction.state.log.segment.bytes = 104857600 
transactional.id.expiration.ms = 604800000 
unclean.leader.election.enable = false 
zookeeper.connect = zookeeper:2181 
zookeeper.connection.timeout.ms = null 
zookeeper.max.in.flight.requests = 10 
zookeeper.session.timeout.ms = 6000 
zookeeper.set.acl = false 
zookeeper.sync.time.ms = 2000

@ebbnflow
Copy link
Author

I solved this by adding these to my broker. Sorry for opening this issue.

  KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
  KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
  KAFKA_MIN_INSYNC_REPLICAS: 1

@EricWittmann
Copy link
Member

Glad you figured this out! I'm very happy this issue exists so that hopefully someone in the future will run into this issue and the info you've provided here will help.

@kitd
Copy link

kitd commented Feb 23, 2021

FWIW, this came up for me too. Thanks for the guidance @ebbnflow

One thing to add for anyone coming to this late and skipping to the end is that global-id-topic and storage-topic must have replicas: 1

@EricWittmann EricWittmann added the Impacts Documentation Use this to tag Pull Requests that introduce changes that should be documented. label Feb 24, 2021
@EricWittmann
Copy link
Member

@smccarthy-ie something to add to our documentation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Impacts Documentation Use this to tag Pull Requests that introduce changes that should be documented. type/question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants