Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Proxy in cluster mode fail on startup when using namesrvDomain config #7068

Closed
3 tasks done
chndzcl opened this issue Jul 24, 2023 · 1 comment · Fixed by #7076
Closed
3 tasks done

[Bug] Proxy in cluster mode fail on startup when using namesrvDomain config #7068

chndzcl opened this issue Jul 24, 2023 · 1 comment · Fixed by #7076

Comments

@chndzcl
Copy link

chndzcl commented Jul 24, 2023

Before Creating the Bug Report

  • I found a bug, not just asking a question, which should be created in GitHub Discussions.

  • I have searched the GitHub Issues and GitHub Discussions of this repository and believe that this is not a duplicate.

  • I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ.

Runtime platform environment

OS: Debian 11

RocketMQ version

5.1.1

JDK Version

openjdk version "11.0.13" 2021-10-19

Describe the Bug

I start proxy in 'Cluster' mode with the following config file:

{
    "rocketMQClusterName": "rocketmq-cluster",
    "proxyClusterName": "rocketmq-cluster",
    "enablePrintJstack": false,
    "namesrvDomain": "rocketmq-ns-endpoint-server",
    "namesrvDomainSubgroup": "nameservers",
    "grpcServerPort": 8081
}

The proxy process will fail with the following Exception:

org.apache.rocketmq.proxy.common.ProxyException: create system broadcast topic DefaultHeartBeatSyncerTopic failed on cluster rocketmq-cluster
	at org.apache.rocketmq.proxy.service.sysmessage.AbstractSystemMessageSyncer.createSysTopic(AbstractSystemMessageSyncer.java:174)
	at org.apache.rocketmq.proxy.service.sysmessage.AbstractSystemMessageSyncer.start(AbstractSystemMessageSyncer.java:140)
	at org.apache.rocketmq.proxy.service.client.ClusterConsumerManager.start(ClusterConsumerManager.java:67)
	at org.apache.rocketmq.common.utils.AbstractStartAndShutdown.start(AbstractStartAndShutdown.java:33)
	at org.apache.rocketmq.common.utils.AbstractStartAndShutdown.start(AbstractStartAndShutdown.java:33)
	at org.apache.rocketmq.common.utils.AbstractStartAndShutdown.start(AbstractStartAndShutdown.java:33)
	at org.apache.rocketmq.proxy.ProxyStartup.main(ProxyStartup.java:91)

Steps to Reproduce

Just start rocketmq in proxy mode with namesrvDomain configured instead of namesrvAddr

What Did You Expect to See?

The proxy fail on startup.

What Did You See Instead?

The proxy startup successfully.

Additional Context

By analyzing the source code and debuging, I found the problem is:

In such case, the class org.apache.rocketmq.client.impl.mqclient.MQClientAPIFactory start an asynchronous task for fetching addresses:

        if (!mqClientAPIExt.updateNameServerAddressList()) {
            this.scheduledExecutorService.scheduleAtFixedRate(
                mqClientAPIExt::fetchNameServerAddr,
                Duration.ofSeconds(10).toMillis(),
                Duration.ofMinutes(2).toMillis(),
                TimeUnit.MILLISECONDS
            );
        }

However, the initialDelay of that task is 10 second.

And then, the org.apache.rocketmq.proxy.ProxyStartup will call the start method of org.apache.rocketmq.proxy.service.sysmessage.HeartbeatSyncer. At this time, the address of the nameserver is not ready in most case. The createSysTopic method will throw an Exception:

        boolean createSuccess = this.adminService.createTopicOnTopicBrokerIfNotExist(
            this.getBroadcastTopicName(),
            clusterName,
            this.getBroadcastTopicQueueNum(),
            this.getBroadcastTopicQueueNum(),
            true,
            3
        );
        if (!createSuccess) {
            throw new ProxyException(ProxyExceptionCode.INTERNAL_SERVER_ERROR, "create system broadcast topic " + this.getBroadcastTopicName() + " failed on cluster " + clusterName);
        }

Can we just retry creating the system topic instead of throwing the exception or fetch address of nameserver address directly at org.apache.rocketmq.client.impl.mqclient.MQClientAPIFactory?

@gaoyf
Copy link
Contributor

gaoyf commented Jul 25, 2023

I have fixed the bug by fetching nameserver addrress before MQClientAPIExt start。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants