Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] PopProcessQueue's lastPopTimestamp defaults to 0, resulting in rebalance failed #7570

Closed
3 tasks done
HScarb opened this issue Nov 17, 2023 · 0 comments · Fixed by #7571
Closed
3 tasks done

Comments

@HScarb
Copy link
Contributor

HScarb commented Nov 17, 2023

Before Creating the Bug Report

  • I found a bug, not just asking a question, which should be created in GitHub Discussions.

  • I have searched the GitHub Issues and GitHub Discussions of this repository and believe that this is not a duplicate.

  • I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ.

Runtime platform environment

Ubuntu 22.04.1 LTS

RocketMQ version

RocketMQ 5.1.4

JDK Version

JDK 8

Describe the Bug

The field lastPopTimestamp in PopProcessQueue is not explicitly assigned a default value, it will default to 0, resulting in immediate expiration when pop consuming.

Steps to Reproduce

  1. Create a topic TopicTest in a cluster with 2 brokers.
  2. Run PopConsumer

What Did You Expect to See?

Rebalance success with no error log.

What Did You See Instead?

Rebalance failed with error logs:

2023-11-17 10:20:33.270 ERROR [46212] [RebalanceService] [o.a.r.c.i.c.RebalanceImpl#?:?] - [BUG]doRebalance, CID_JODIE_1, try remove unnecessary pop mq, MessageQueue [topic=TopicTest, brokerName=broker-b, queueId=-1], because pop is pause, so try to fixed it
2023-11-17 10:20:33.682 ERROR [46212] [RebalanceService] [o.a.r.c.i.c.RebalanceImpl#?:?] - [BUG]doRebalance, CID_JODIE_1, try remove unnecessary pop mq, MessageQueue [topic=TopicTest, brokerName=broker-a, queueId=-1], because pop is pause, so try to fixed it

full log:

2023-11-17 10:20:30.814 INFO  [46212] [NettyClientPublicExecutor_5] [o.a.r.c.i.ClientRemotingProcessor#?:?] - receive broker's notification[x.x.x.x:10931], the consumer group: CID_JODIE_1 changed, rebalance immediately
2023-11-17 10:20:30.846 INFO  [46212] [RebalanceService] [RocketmqRemoting#?:?] - createChannel: connect remote host[x.x.x.x:10911] success, AbstractBootstrap$PendingRegistrationPromise@738740e1(success)
2023-11-17 10:20:30.846 INFO  [46212] [main] [RocketmqRemoting#?:?] - createChannel: connect remote host[x.x.x.x:10911] success, AbstractBootstrap$PendingRegistrationPromise@738740e1(success)
2023-11-17 10:20:30.846 INFO  [46212] [NettyClientWorkerThread_3] [RocketmqRemoting#?:?] - NETTY CLIENT PIPELINE: ACTIVE, x.x.x.x:10911
2023-11-17 10:20:30.846 WARN  [46212] [NettyEventExecutor] [o.a.r.c.i.f.MQClientInstance#?:?] - sendHeartbeatToAllBrokerWithLockV2 lock heartBeat, but failed.
2023-11-17 10:20:30.881 INFO  [46212] [NettyClientPublicExecutor_7] [o.a.r.c.i.ClientRemotingProcessor#?:?] - receive broker's notification[100.95.190.210:10911], the consumer group: CID_JODIE_1 changed, rebalance immediately
2023-11-17 10:20:30.881 INFO  [46212] [main] [o.a.r.c.i.f.MQClientInstance#?:?] - send heart beat to broker[broker-a 0 100.95.190.210:10911] success
2023-11-17 10:20:30.881 INFO  [46212] [main] [o.a.r.c.i.f.MQClientInstance#?:?] - HeartbeatData [clientID=10.173.92.229@46212#2054135766774700, producerDataSet=[ProducerData [groupName=CLIENT_INNER_PRODUCER]], consumerDataSet=[ConsumerData [groupName=CID_JODIE_1, consumeType=CONSUME_PASSIVELY, messageModel=CLUSTERING, consumeFromWhere=CONSUME_FROM_FIRST_OFFSET, unitMode=false, subscriptionDataSet=[SubscriptionData [classFilterMode=false, topic=TopicTest, subString=*, tagsSet=[], codeSet=[], subVersion=1700187629600, expressionType=TAG], SubscriptionData [classFilterMode=false, topic=%RETRY%CID_JODIE_1, subString=*, tagsSet=[], codeSet=[], subVersion=1700187629606, expressionType=TAG]]]]]
2023-11-17 10:20:30.923 INFO  [46212] [RebalanceService] [o.a.r.c.c.s.RemoteBrokerOffsetStore#?:?] - remove unnecessary messageQueue offset. group=CID_JODIE_1, mq=MessageQueue [topic=%RETRY%CID_JODIE_1, brokerName=broker-b, queueId=0], offsetTableSize=0
2023-11-17 10:20:30.965 INFO  [46212] [RebalanceService] [o.a.r.c.i.c.RebalanceImpl#?:?] - doRebalance, CID_JODIE_1, add a new mq, MessageQueue [topic=%RETRY%CID_JODIE_1, brokerName=broker-b, queueId=0]
2023-11-17 10:20:30.966 INFO  [46212] [RebalanceService] [o.a.r.c.c.s.RemoteBrokerOffsetStore#?:?] - remove unnecessary messageQueue offset. group=CID_JODIE_1, mq=MessageQueue [topic=%RETRY%CID_JODIE_1, brokerName=broker-a, queueId=0], offsetTableSize=1
2023-11-17 10:20:31.000 INFO  [46212] [RebalanceService] [o.a.r.c.i.c.RebalanceImpl#?:?] - doRebalance, CID_JODIE_1, add a new mq, MessageQueue [topic=%RETRY%CID_JODIE_1, brokerName=broker-a, queueId=0]
2023-11-17 10:20:31.001 INFO  [46212] [RebalanceService] [o.a.r.c.i.c.RebalanceImpl#?:?] - broker rebalanced result changed. allocateMessageQueueStrategyName=AVG, group=CID_JODIE_1, topic=%RETRY%CID_JODIE_1, clientId=10.173.92.229@46212#2054135766774700, assignmentSet=[MessageQueueAssignment [MessageQueue=MessageQueue [topic=%RETRY%CID_JODIE_1, brokerName=broker-a, queueId=0], Mode=PULL], MessageQueueAssignment [MessageQueue=MessageQueue [topic=%RETRY%CID_JODIE_1, brokerName=broker-b, queueId=0], Mode=PULL]]
2023-11-17 10:20:31.001 INFO  [46212] [RebalanceService] [o.a.r.c.i.c.RebalanceImpl#?:?] - %RETRY%CID_JODIE_1 Rebalance changed, also update version: 1700187629606, 1700187631001
2023-11-17 10:20:31.142 INFO  [46212] [RebalanceService] [o.a.r.c.i.c.RebalanceImpl#?:?] - doRebalance, CID_JODIE_1, add a new pop mq, MessageQueue [topic=TopicTest, brokerName=broker-a, queueId=-1]
2023-11-17 10:20:31.143 INFO  [46212] [RebalanceService] [o.a.r.c.i.c.RebalanceImpl#?:?] - doRebalance, CID_JODIE_1, add a new pop mq, MessageQueue [topic=TopicTest, brokerName=broker-b, queueId=-1]
2023-11-17 10:20:31.144 INFO  [46212] [RebalanceService] [o.a.r.c.i.c.RebalanceImpl#?:?] - broker rebalanced result changed. allocateMessageQueueStrategyName=AVG, group=CID_JODIE_1, topic=TopicTest, clientId=10.173.92.229@46212#2054135766774700, assignmentSet=[MessageQueueAssignment [MessageQueue=MessageQueue [topic=TopicTest, brokerName=broker-b, queueId=-1], Mode=POP], MessageQueueAssignment [MessageQueue=MessageQueue [topic=TopicTest, brokerName=broker-a, queueId=-1], Mode=POP]]
2023-11-17 10:20:31.144 INFO  [46212] [RebalanceService] [o.a.r.c.i.c.RebalanceImpl#?:?] - TopicTest Rebalance changed, also update version: 1700187629600, 1700187631144
2023-11-17 10:20:33.270 ERROR [46212] [RebalanceService] [o.a.r.c.i.c.RebalanceImpl#?:?] - [BUG]doRebalance, CID_JODIE_1, try remove unnecessary pop mq, MessageQueue [topic=TopicTest, brokerName=broker-b, queueId=-1], because pop is pause, so try to fixed it
2023-11-17 10:20:33.682 ERROR [46212] [RebalanceService] [o.a.r.c.i.c.RebalanceImpl#?:?] - [BUG]doRebalance, CID_JODIE_1, try remove unnecessary pop mq, MessageQueue [topic=TopicTest, brokerName=broker-a, queueId=-1], because pop is pause, so try to fixed it
2023-11-17 10:20:33.682 INFO  [46212] [RebalanceService] [o.a.r.c.i.c.RebalanceImpl#?:?] - doRebalance, CID_JODIE_1, remove unnecessary pop mq, MessageQueue [topic=TopicTest, brokerName=broker-a, queueId=-1]
2023-11-17 10:20:33.682 INFO  [46212] [RebalanceService] [o.a.r.c.i.c.RebalanceImpl#?:?] - doRebalance, CID_JODIE_1, remove unnecessary pop mq, MessageQueue [topic=TopicTest, brokerName=broker-b, queueId=-1]
2023-11-17 10:20:33.682 INFO  [46212] [RebalanceService] [o.a.r.c.i.c.RebalanceImpl#?:?] - doRebalance, CID_JODIE_1, add a new pop mq, MessageQueue [topic=TopicTest, brokerName=broker-a, queueId=-1]
2023-11-17 10:20:33.682 INFO  [46212] [RebalanceService] [o.a.r.c.i.c.RebalanceImpl#?:?] - doRebalance, CID_JODIE_1, add a new pop mq, MessageQueue [topic=TopicTest, brokerName=broker-b, queueId=-1]
2023-11-17 10:20:33.682 INFO  [46212] [RebalanceService] [o.a.r.c.i.c.RebalanceImpl#?:?] - broker rebalanced result changed. allocateMessageQueueStrategyName=AVG, group=CID_JODIE_1, topic=TopicTest, clientId=10.173.92.229@46212#2054135766774700, assignmentSet=[MessageQueueAssignment [MessageQueue=MessageQueue [topic=TopicTest, brokerName=broker-b, queueId=-1], Mode=POP], MessageQueueAssignment [MessageQueue=MessageQueue [topic=TopicTest, brokerName=broker-a, queueId=-1], Mode=POP]]
2023-11-17 10:20:33.682 INFO  [46212] [RebalanceService] [o.a.r.c.i.c.RebalanceImpl#?:?] - TopicTest Rebalance changed, also update version: 1700187631144, 1700187633682
2023-11-17 10:20:33.682 WARN  [46212] [RebalanceService] [o.a.r.c.i.f.MQClientInstance#?:?] - sendHeartbeatToAllBrokerWithLockV2 lock heartBeat, but failed.

Additional Context

Ref:

HashMap<MessageQueue, PopProcessQueue> removeQueueMap = new HashMap<>(this.popProcessQueueTable.size());
Iterator<Entry<MessageQueue, PopProcessQueue>> it = this.popProcessQueueTable.entrySet().iterator();
while (it.hasNext()) {
Entry<MessageQueue, PopProcessQueue> next = it.next();
MessageQueue mq = next.getKey();
PopProcessQueue pq = next.getValue();
if (mq.getTopic().equals(topic)) {
if (!mq2PopAssignment.containsKey(mq)) {
//the queue is no longer your assignment
pq.setDropped(true);
removeQueueMap.put(mq, pq);
} else if (pq.isPullExpired() && this.consumeType() == ConsumeType.CONSUME_PASSIVELY) {
pq.setDropped(true);
removeQueueMap.put(mq, pq);
log.error("[BUG]doRebalance, {}, try remove unnecessary pop mq, {}, because pop is pause, so try to fixed it",
consumerGroup, mq);
}
}
}

public boolean isPullExpired() {
return (System.currentTimeMillis() - this.lastPopTimestamp) > PULL_MAX_IDLE_TIME;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant