Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System busy exception when transientStorePoolEnable=true in controller mode #5714

Closed
riki-wang opened this issue Dec 16, 2022 · 4 comments · Fixed by #5722
Closed

System busy exception when transientStorePoolEnable=true in controller mode #5714

riki-wang opened this issue Dec 16, 2022 · 4 comments · Fixed by #5722
Labels

Comments

@riki-wang
Copy link

riki-wang commented Dec 16, 2022

  1. Problem
    Broker在开启自动故障转移模式下,同时设置transientStorePoolEnable=true与brokerRole=SLAVE的情况下会引起客户端产生"system busy"异常
    image

  2. Environment

    • Centos 7.9
    • CPU: 4 core
    • Memory: 16GB
    • RocketMQ Version: 5.0
    • Broker config
      • broker-c-1.conf
          brokerClusterName=DC
          brokerName=broker-c
          brokerId=-1
          deleteWhen=04
          fileReservedTime=48
          brokerRole=SLAVE
          flushDiskType=ASYNC_FLUSH
        
          enableControllerMode = true
          controllerAddr = 127.0.0.1:59878
          #allAckInSyncStateSet = true
          namesrvAddr=127.0.0.1:9876
          
          listenPort=58922
          storePathRootDir=/data/rocketmq/master/store
          storePathIndex=/data/rocketmq/master/store/index
          storeCheckpoint=/data/rocketmq/master/store/checkpoint
          abortFile=/data/rocketmq/master/store/abort
          storePathCommitLog=/data/rocketmq/master/store/commitlog
          storePathConsumerQueue=/data/rocketmq/master/store/consumequeue
          storePathEpochFile=/data/rocketmq/master/store/storePathEpochFile
          
          transientStorePoolEnable=true
          transientStorePoolSize=2
        
      • broker-c-2.conf
        brokerClusterName=DC
        brokerName=broker-c
        brokerId=-1
        deleteWhen=04
        fileReservedTime=48
        brokerRole=SLAVE
        flushDiskType=ASYNC_FLUSH
        
        enableControllerMode = true
        controllerAddr = 127.0.0.1:59878
        allAckInSyncStateSet = true
        namesrvAddr=127.0.0.1:9876
        
        listenPort=59922
        storePathRootDir=/data/rocketmq/slave/store
        storePathIndex=/data/rocketmq/slave/store/index
        storeCheckpoint=/data/rocketmq/slave/store/checkpoint
        abortFile=/data/rocketmq/slave/store/abort
        storePathCommitLog=/data/rocketmq/slave/store/commitlog
        storePathConsumerQueue=/data/rocketmq/slave/store/consumequeue
        storePathEpochFile=/data/rocketmq/slave/store/storePathEpochFile
        
        transientStorePoolEnable=true
        transientStorePoolSize=2
        
      • contoller.conf
         controllerDLegerGroup = group2
         controllerDLegerPeers = n0-10.110.150.69:59878
         controllerDLegerSelfId = n0
          controllerStorePath=/data/rocketmq/controller
        
  3. Steps to reproduce

    • start namesrv
      nohup bin/mqnamesrv &
    • start controller
      nohup bin/mqcontroller -c conf/controller.conf
    • start broker
      • nohup bin/mqbroker -c conf/broker-c-1.conf &
      • nohup bin/mqbroker -c conf/broker-c-2.conf &
@RongtongJin RongtongJin changed the title Broker在DledgerController模式下开启transientStorePool引起system busy异常 System busy exception when transientStorePoolEnable=true in controller mode Dec 16, 2022
@RongtongJin
Copy link
Contributor

Hi @riki-wang 我在controller模式下打开transientStorePoolEnable,正常情况下收发并未出现System busy,切换时也能及时恢复。想确认一下,当出现System busy exception,使用clusterList查看是否选出了Master?

@riki-wang
Copy link
Author

Hi @riki-wang 我在controller模式下打开transientStorePoolEnable,正常情况下收发并未出现System busy,切换时也能及时恢复。想确认一下,当出现System busy exception,使用clusterList查看是否选出了Master?

还需要将brokerRole设置成SLAVE才会出现这个异常

@riki-wang
Copy link
Author

riki-wang commented Dec 16, 2022

Hi @riki-wang 我在controller模式下打开transientStorePoolEnable,正常情况下收发并未出现System busy,切换时也能及时恢复。想确认一下,当出现System busy exception,使用clusterList查看是否选出了Master?

当transientStorePoolEnable=true与brokerRole=SLAVE同时设置时,slave节点似乎一直无法跟选举出的master同步
image

@RongtongJin
Copy link
Contributor

SLAVE

Good catch! I have reproduced it, I will find out the reason.

odbozhou pushed a commit that referenced this issue Jan 1, 2023
…sientStorePool=true in controller mode (#5722)

* Fix the issue that the slave role does not initialize the transientPool in controller mode

* Format the checkstyle

* Remove the useless import

* Fix the HA transmission disconnection issue when transientStorePoolEnable is true

* just test

* just test

* just test

* just test

* just test

* just test

* just test

* just test

* just test

* Format the check style

* Format the check style
Langkeren pushed a commit to Langkeren/rocketmq that referenced this issue Jan 3, 2023
…n transientStorePool=true in controller mode (apache#5722)

* Fix the issue that the slave role does not initialize the transientPool in controller mode

* Format the checkstyle

* Remove the useless import

* Fix the HA transmission disconnection issue when transientStorePoolEnable is true

* just test

* just test

* just test

* just test

* just test

* just test

* just test

* just test

* just test

* Format the check style

* Format the check style
RongtongJin added a commit to RongtongJin/rocketmq that referenced this issue Jan 10, 2023
…n transientStorePool=true in controller mode (apache#5722)

* Fix the issue that the slave role does not initialize the transientPool in controller mode

* Format the checkstyle

* Remove the useless import

* Fix the HA transmission disconnection issue when transientStorePoolEnable is true

* just test

* just test

* just test

* just test

* just test

* just test

* just test

* just test

* just test

* Format the check style

* Format the check style
drpmma pushed a commit that referenced this issue Feb 21, 2023
…sientStorePool=true in controller mode (#5722)

* Fix the issue that the slave role does not initialize the transientPool in controller mode

* Format the checkstyle

* Remove the useless import

* Fix the HA transmission disconnection issue when transientStorePoolEnable is true

* just test

* just test

* just test

* just test

* just test

* just test

* just test

* just test

* just test

* Format the check style

* Format the check style
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants