Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiFlash can't start with the error of duplicated store address: 0.0.0.0:3930 #5635

Closed
lilinghai opened this issue Aug 16, 2022 · 3 comments · Fixed by #5640
Closed

TiFlash can't start with the error of duplicated store address: 0.0.0.0:3930 #5635

lilinghai opened this issue Aug 16, 2022 · 3 comments · Fixed by #5640
Assignees
Labels
feature/developing severity/major type/bug The issue is confirmed as a bug.

Comments

@lilinghai
Copy link

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

tidb operator deploy nightly tidb cluster

[2022/08/16 22:06:40.241 +08:00] [INFO] [node.rs:231] ["put store to PD"] [store="id: 8 address: \"0.0.0.0:3930\" labels { key: \"engine\" value: \"tiflash\" } version: \"v6.2.0-alpha\" peer_address: \"tc-tiflash-0.tc-tiflash-peer.endless-htap-ch-full-ap-query-tps-1170596-1-840.svc:20170\" status_address: \"tc-tiflash-0.tc-tiflash-peer.endless-htap-ch-full-ap-query-tps-1170596-1-840.svc:20292\" git_hash: \"49d8050c2a57f0b905917ad5c7e23e14ff0b5b24\" start_timestamp: 1660658800 deploy_path: \"/tiflash\""]
[2022/08/16 22:06:40.242 +08:00] [FATAL] [run.rs:1255] ["failed to start node: Other(\"[components/pd_client/src/util.rs:809]: duplicated store address: id:8 address:\\\"0.0.0.0:3930\\\" labels:<key:\\\"engine\\\" value:\\\"tiflash\\\" > version:\\\"v6.2.0-alpha\\\" peer_address:\\\"tc-tiflash-0.tc-tiflash-peer.endless-htap-ch-full-ap-query-tps-1170596-1-840.svc:20170\\\" status_address:\\\"tc-tiflash-0.tc-tiflash-peer.endless-htap-ch-full-ap-query-tps-1170596-1-840.svc:20292\\\" git_hash:\\\"49d8050c2a57f0b905917ad5c7e23e14ff0b5b24\\\" start_timestamp:1660658800 deploy_path:\\\"/tiflash\\\" , already registered by id:4 address:\\\"0.0.0.0:3930\\\" labels:<key:\\\"engine\\\" value:\\\"tiflash\\\" > version:\\\"v6.2.0-alpha\\\" peer_address:\\\"tc-tiflash-1.tc-tiflash-peer.endless-htap-ch-full-ap-query-tps-1170596-1-840.svc:20170\\\" status_address:\\\"tc-tiflash-1.tc-tiflash-peer.endless-htap-ch-full-ap-query-tps-1170596-1-840.svc:20292\\\" git_hash:\\\"49d8050c2a57f0b905917ad5c7e23e14ff0b5b24\\\" start_timestamp:1660657792 deploy_path:\\\"/tiflash\\\" last_heartbeat:1660658792971788227 node_state:Serving \")"]

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiFlash version? (Required)

master

@CalvinNeo
Copy link
Member

CalvinNeo commented Aug 17, 2022

Config of scence.
We found engine-addr is configed in Proxy.

cat /data0/proxy.toml
log-file = "/data0/logs/tiflash_tikv.log"
log-level = "info"

[server]
  advertise-status-addr = "tc-tiflash-0.tc-tiflash-peer.endless-htap-ch-full-ap-query-tps-1170596-1-840.svc:20292"
  engine-addr = "tc-tiflash-0.tc-tiflash-peer.endless-htap-ch-full-ap-query-tps-1170596-1-840.svc:3930"
  status-addr = "0.0.0.0:20292"
/ # cat /data0/logs/tiflash_tikv.log

/ # cat /data0/config.toml
default_profile = "default"
display_name = "TiFlash"
http_port = 8123
interserver_http_port = 9009
listen_host = "0.0.0.0"
mark_cache_size = 5368709120
minmax_index_cache_size = 5368709120
path = "/data0/db"
path_realtime_mode = false
tcp_port = 9000
tmp_path = "/data0/tmp"

[application]
  runAsDaemon = true

[flash]
  compact_log_min_period = 200
  overlap_threshold = 0.6
  service_addr = "0.0.0.0:3930"
  tidb_status_addr = "tc-tidb.endless-htap-ch-full-ap-query-tps-1170596-1-840.svc:10080"
  [flash.flash_cluster]
    cluster_manager_path = "/tiflash/flash_cluster_manager"
    log = "/data0/logs/flash_cluster_manager.log"
    master_ttl = 60
    refresh_interval = 20
    update_rule_interval = 10
  [flash.proxy]
    addr = "0.0.0.0:20170"
    advertise-addr = "tc-tiflash-0.tc-tiflash-peer.endless-htap-ch-full-ap-query-tps-1170596-1-840.svc:20170"
    config = "/data0/proxy.toml"
    data-dir = "/data0/proxy"

[logger]
  count = 100
  errorlog = "/data0/logs/error.log"
  level = "debug"
  log = "/data0/logs/server.log"
  size = "1000M"

[profiles]
  [profiles.default]
    load_balancing = "random"
    max_memory_usage = 0
    max_memory_usage_for_all_queries = 0
    use_uncompressed_cache = 0
  [profiles.readonly]
    readonly = 1

[quotas]
  [quotas.default]
    [quotas.default.interval]
      duration = 3600
      errors = 0
      execution_time = 0
      queries = 0
      read_rows = 0
      result_rows = 0

[raft]
  kvstore_path = "/data0/kvstore"
  pd_addr = "tc-pd.endless-htap-ch-full-ap-query-tps-1170596-1-840.svc:2379"
  storage_engine = "dt"

[status]
  metrics_port = 8234

[users]
  [users.default]
    password = ""
    profile = "default"
    quota = "default"
    [users.default.networks]
      ip = "::/0"
  [users.readonly]
    password = ""
    profile = "readonly"
    quota = "default"
    [users.readonly.networks]
      ip = "::/0"

How TiFlash handles:

  1. If we don't have engine-addr in flash.proxy, we set engine-addr to flash.service_addr, which is normal case and case above
  2. Else wise we set advertise-engine-addr to engine-addr
...
            if (!args_map.count(engine_store_address))
                args_map[engine_store_address] = config.getString("flash.service_addr");
            else
                args_map[engine_store_advertise_address] = args_map[engine_store_address];
            args_map[engine_label] = engine_label_value;
...

How Proxy handles:
If engine-addr is not set by Proxy's own config, we use TiFlash's

...
    if config.server.engine_addr.is_empty() {
        if let Some(engine_addr) = matches.value_of("engine-addr") {
            config.server.engine_addr = engine_addr.to_owned();
        }
    }

    if let Some(engine_addr) = matches.value_of("advertise-engine-addr") {
        config.server.engine_addr = engine_addr.to_owned();
    }
...

@CalvinNeo
Copy link
Member

CalvinNeo commented Aug 17, 2022

This is a bug.
Current ProxyConfig can only parse

engine-addr = "127.0.0.1:3930"

However, the real config is (please notice the [server])

[server]
engine-addr = "127.0.0.1:3930"

As a result, engine-addr is parsed by ProxyConfig as a empty string, rather than 127.0.0.1:3930, thus proxy tend to using config from TiFlash's side

Though ProxyConfig is introduced in 6.2, it does not actually used except in config check. After pingcap/tidb-engine-ext#139, the flaw reveals itself.

@CalvinNeo
Copy link
Member

However, we still need to keep in mind that incorresponding port between tiflash and proxy can cause problem.
We take v5.3.0 as an example, if we have

[server]
addr = "0.0.0.0:20170"
advertise-addr = "127.0.0.1:20170"
engine-addr = "127.0.0.1:3932"
status-addr = "0.0.0.0:20292"
advertise-status-addr = "127.0.0.1:20292"

and

[flash]
service_addr = "127.0.0.1:3931"

The service can not start.
This is because TiFlash will listen to service_addr, which should be consistent with engine-addr
/cc @lilinghai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/developing severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants