CA replication not working 1.6beta2 #6192

tristan-weil · 2019-07-22T23:04:03Z

Overview of the Issue

Consul 1.6beta2

The CA replication from the primary DC to the secondary DC does not work.

ACLs and Intentions are replicated but he Consul cluster in the secondary DC is not able to replicate the CA.

Reproduction Steps

2 clusters of 3 nodes each in 2 different regions (tested in AWS: eu-west-1 and eu-west-3).
ACL, Connect, TLS are enabled.
replication, agent, default and agent_master tokens are set with appropriate policies.

Consul info for both Client and Server

Replication policy:

    acl = "write"

    operator = "write"

    service_prefix "" {
      policy = "read"
      intentions = "read"
    }

Part of the configuration in the secondary DC:

{
    "datacenter": "eu-west-1",
    "primary_datacenter": "eu-west-3",
    "connect": {
        "enabled": true
    },
    "acl": {
        "default_policy": "deny",
        "down_policy": "extend-cache",
        "enable_key_list_policy": true,
        "enable_token_persistence": true,
        "enable_token_replication": true,
        "enabled": true
    }
}

Part of the configuration in the primary DC:

{
    "datacenter": "eu-west-3",
    "primary_datacenter": "eu-west-3",
    "connect": {
        "enabled": true
    },
    "acl": {
        "default_policy": "deny",
        "down_policy": "extend-cache",
        "enable_key_list_policy": true,
        "enable_token_persistence": true,
        "enabled": true
    }
}

Operating system and Environment details

debian 9 on t2.micro

Log Fragments

In the primary DC:

22:05:49 root@ip-10-3-0-13 ~ [0]
> consul members 
Node             Address            Status  Type    Build       Protocol  DC         Segment
ip-10-3-0-13     10.3.0.13:8301     alive   server  1.6.0beta2  2         eu-west-3  <all>
ip-10-3-0-29     10.3.0.29:8301     alive   server  1.6.0beta2  2         eu-west-3  <all>
ip-10-3-0-37     10.3.0.37:8301     alive   server  1.6.0beta2  2         eu-west-3  <all>
ip-10-3-0-4      10.3.0.4:8301      alive   client  1.6.0beta2  2         eu-west-3  <default>
ip-10-3-0-7      10.3.0.7:8301      alive   client  1.6.0beta2  2         eu-west-3  <default>

22:06:03 root@ip-10-3-0-13 ~ [0]
> consul members -wan
Node                    Address         Status  Type    Build       Protocol  DC         Segment
ip-10-1-0-10.eu-west-1  10.1.0.10:8302  alive   server  1.6.0beta2  2         eu-west-1  <all>
ip-10-1-0-22.eu-west-1  10.1.0.22:8302  alive   server  1.6.0beta2  2         eu-west-1  <all>
ip-10-1-0-41.eu-west-1  10.1.0.41:8302  alive   server  1.6.0beta2  2         eu-west-1  <all>
ip-10-3-0-13.eu-west-3  10.3.0.13:8302  alive   server  1.6.0beta2  2         eu-west-3  <all>
ip-10-3-0-29.eu-west-3  10.3.0.29:8302  alive   server  1.6.0beta2  2         eu-west-3  <all>
ip-10-3-0-37.eu-west-3  10.3.0.37:8302  alive   server  1.6.0beta2  2         eu-west-3  <all>

In the secondary DC:

22:07:37 root@ip-10-1-0-22 ~ [0]
> consul members -wan
Node                    Address         Status  Type    Build       Protocol  DC         Segment
ip-10-1-0-10.eu-west-1  10.1.0.10:8302  alive   server  1.6.0beta2  2         eu-west-1  <all>
ip-10-1-0-22.eu-west-1  10.1.0.22:8302  alive   server  1.6.0beta2  2         eu-west-1  <all>
ip-10-1-0-41.eu-west-1  10.1.0.41:8302  alive   server  1.6.0beta2  2         eu-west-1  <all>
ip-10-3-0-13.eu-west-3  10.3.0.13:8302  alive   server  1.6.0beta2  2         eu-west-3  <all>
ip-10-3-0-29.eu-west-3  10.3.0.29:8302  alive   server  1.6.0beta2  2         eu-west-3  <all>
ip-10-3-0-37.eu-west-3  10.3.0.37:8302  alive   server  1.6.0beta2  2         eu-west-3  <all>

Here is the auto-generated CA in the primary DC:

22:07:12 root@ip-10-3-0-13 ~ [0]
> curl -sS http://127.0.0.1:8500/v1/connect/ca/roots | jq .
{
  "ActiveRootID": "9a:21:7a:e0:16:ae:19:58:bc:ca:b5:c4:97:3e:fe:d3:0c:8f:af:8e",
  "TrustDomain": "594d66d8-240c-260c-33d0-71589d845d99.consul",
  "Roots": [
    {
      "ID": "9a:21:7a:e0:16:ae:19:58:bc:ca:b5:c4:97:3e:fe:d3:0c:8f:af:8e",
      "Name": "Consul CA Root Cert",
      "SerialNumber": 7,
      "SigningKeyID": "37:63:3a:33:65:3a:39:38:3a:36:34:3a:30:36:3a:30:36:3a:30:61:3a:34:30:3a:33:61:3a:65:30:3a:64:31:3a:36:39:3a:31:61:3a:31:65:3a:31:38:3a:66:35:3a:37:34:3a:38:32:3a:65:63:3a:38:64:3a:33:65:3a:31:66:3a:32:66:3a:35:62:3a:38:31:3a:30:64:3a:65:33:3a:32:65:3a:38:64:3a:64:32:3a:33:39:3a:66:36",
      "ExternalTrustDomain": "594d66d8-240c-260c-33d0-71589d845d99",
      "NotBefore": "2019-07-22T21:47:11Z",
      "NotAfter": "2029-07-22T21:47:11Z",
      "RootCert": "-----BEGIN CERTIFICATE-----\nMIICWjCCAf+gAwIBAgIBBzAKBggqhkjOPQQDAjAWMRQwEgYDVQQDEwtDb25zdWwg\n...\neWXKTgt8Mvzb5sQlgCeJvekBQk6in29TqJHD/ovf\n-----END CERTIFICATE-----\n",
      "IntermediateCerts": null,
      "Active": true,
      "CreateIndex": 8,
      "ModifyIndex": 8
    }
  ]
}

Here is the auto-generated CA in the secondary DC:

22:08:44 root@ip-10-1-0-22 ~ [0]
> curl -sS http://127.0.0.1:8500/v1/connect/ca/roots | jq .
{
  "ActiveRootID": "ee:75:44:b2:f8:92:06:37:06:cf:55:be:7f:87:d9:93:ee:ad:c8:6e",
  "TrustDomain": "2b56ec58-87f8-dda6-bbf5-b3f3f24fa3ed.consul",
  "Roots": [
    {
      "ID": "ee:75:44:b2:f8:92:06:37:06:cf:55:be:7f:87:d9:93:ee:ad:c8:6e",
      "Name": "Consul CA Root Cert",
      "SerialNumber": 7,
      "SigningKeyID": "39:66:3a:30:38:3a:32:31:3a:64:65:3a:65:30:3a:38:66:3a:61:66:3a:33:66:3a:62:65:3a:63:32:3a:65:64:3a:32:37:3a:64:64:3a:64:64:3a:65:63:3a:36:30:3a:30:65:3a:64:61:3a:33:38:3a:39:30:3a:64:39:3a:64:62:3a:30:64:3a:31:39:3a:35:39:3a:39:62:3a:62:31:3a:33:62:3a:33:39:3a:38:65:3a:65:39:3a:65:34",
      "ExternalTrustDomain": "2b56ec58-87f8-dda6-bbf5-b3f3f24fa3ed",
      "NotBefore": "2019-07-22T21:47:09Z",
      "NotAfter": "2029-07-22T21:47:09Z",
      "RootCert": "-----BEGIN CERTIFICATE-----\nMIICWDCCAf+gAwIBAgIBBzAKBggqhkjOPQQDAjAWMRQwEgYDVQQDEwtDb25zdWwg\n...\nqJRf+hLfFc1SdWq8eiMuyt422i/PSpby05pMnw==\n-----END CERTIFICATE-----\n",
      "IntermediateCerts": null,
      "Active": true,
      "CreateIndex": 8,
      "ModifyIndex": 8
    }
  ]
}

=> as we can see, in the secondary DC, the CA is not replicated

Here is the state of the replication in the secondary DC (it's ok):

22:11:39 root@ip-10-1-0-22 /var/log [0]
> curl -sS http://127.0.0.1:8500/v1/acl/replication | jq .
{
  "Enabled": true,
  "Running": true,
  "SourceDatacenter": "eu-west-3",
  "ReplicationType": "tokens",
  "ReplicatedIndex": 1090,
  "ReplicatedRoleIndex": 1,
  "ReplicatedTokenIndex": 1131,
  "LastSuccess": "2019-07-22T22:10:00Z",
  "LastError": "0001-01-01T00:00:00Z"
}

In the log of the leader in the secondary DC, I have:

Jul 22 22:11:39 ip-10-1-0-22 consul[9365]:     2019/07/22 22:11:39 [ERR] consul: RPC failed to server 10.3.0.13:8300 in DC "eu-west-3": rpc error making call: rpc error making call: Permission denied
Jul 22 22:11:39 ip-10-1-0-22 consul[9365]:     2019/07/22 22:11:39 [ERR] connect: error watching primary datacenter roots: rpc error making call: rpc error making call: Permission denied

I have tested to replace all the tokens with a global-management token: same error.
I have also tested to restart, deactivate/reactive Connect, etc.
I think the problem lies in RPC message sent by the leader of the secondary DC: it does not include the replication token to check the health of the primary cluster.

See the PR #6193

With this PR, the leader in the secondary cluster immediately replicates the CA:

Jul 22 22:47:53 ip-10-1-0-22 consul[12366]:     2019/07/22 22:47:53 [INFO] consul: New leader elected: ip-10-1-0-22
Jul 22 22:47:53 ip-10-1-0-22 consul[12366]:     2019/07/22 22:47:53 [INFO] acl: started ACL policy replication
Jul 22 22:47:53 ip-10-1-0-22 consul[12366]:     2019/07/22 22:47:53 [INFO] acl: started ACL role replication
Jul 22 22:47:53 ip-10-1-0-22 consul[12366]:     2019/07/22 22:47:53 [INFO] acl: started ACL token replication
Jul 22 22:47:53 ip-10-1-0-22 consul[12366]:     2019/07/22 22:47:53 [INFO]  raft: pipelining replication to peer {Nonvoter 9db86766-a5f7-ad14-bf13-a8399a7df6c1 10.1.0.10:8300}
Jul 22 22:47:53 ip-10-1-0-22 consul[12366]:     2019/07/22 22:47:53 [INFO] connect: updated root certificates from primary datacenter
Jul 22 22:47:53 ip-10-1-0-22 consul[12366]:     2019/07/22 22:47:53 [INFO] connect: received new intermediate certificate from primary datacenter
Jul 22 22:47:53 ip-10-1-0-22 consul[12366]:     2019/07/22 22:47:53 [INFO] connect: initialized secondary datacenter CA with provider "consul"
Jul 22 22:47:53 ip-10-1-0-22 consul[12366]:     2019/07/22 22:47:53 [INFO] replication: started Config Entry replication

And the old CA is replaced with the one from the primary DC:

22:49:36 root@ip-10-1-0-22 /var/log [130]
> curl -sS http://127.0.0.1:8500/v1/connect/ca/roots | jq .
{
  "ActiveRootID": "9a:21:7a:e0:16:ae:19:58:bc:ca:b5:c4:97:3e:fe:d3:0c:8f:af:8e",
  "TrustDomain": "594d66d8-240c-260c-33d0-71589d845d99.consul",
  "Roots": [
    {
      "ID": "9a:21:7a:e0:16:ae:19:58:bc:ca:b5:c4:97:3e:fe:d3:0c:8f:af:8e",
      "Name": "Consul CA Root Cert",
      "SerialNumber": 7,
      "SigningKeyID": "37:63:3a:33:65:3a:39:38:3a:36:34:3a:30:36:3a:30:36:3a:30:61:3a:34:30:3a:33:61:3a:65:30:3a:64:31:3a:36:39:3a:31:61:3a:31:65:3a:31:38:3a:66:35:3a:37:34:3a:38:32:3a:65:63:3a:38:64:3a:33:65:3a:31:66:3a:32:66:3a:35:62:3a:38:31:3a:30:64:3a:65:33:3a:32:65:3a:38:64:3a:64:32:3a:33:39:3a:66:36",
      "ExternalTrustDomain": "594d66d8-240c-260c-33d0-71589d845d99",
      "NotBefore": "2019-07-22T21:47:11Z",
      "NotAfter": "2029-07-22T21:47:11Z",
      "RootCert": "-----BEGIN CERTIFICATE-----\nMIICWjCCAf+gAwIBAgIBBzAKBggqhkjOPQQDAjAWMRQwEgYDVQQDEwtDb25zdWwg\n...\neWXKTgt8Mvzb5sQlgCeJvekBQk6in29TqJHD/ovf\n-----END CERTIFICATE-----\n",
      "IntermediateCerts": null,
      "Active": true,
      "CreateIndex": 691,
      "ModifyIndex": 691
    },
    {
      "ID": "ee:75:44:b2:f8:92:06:37:06:cf:55:be:7f:87:d9:93:ee:ad:c8:6e",
      "Name": "Consul CA Root Cert",
      "SerialNumber": 7,
      "SigningKeyID": "39:66:3a:30:38:3a:32:31:3a:64:65:3a:65:30:3a:38:66:3a:61:66:3a:33:66:3a:62:65:3a:63:32:3a:65:64:3a:32:37:3a:64:64:3a:64:64:3a:65:63:3a:36:30:3a:30:65:3a:64:61:3a:33:38:3a:39:30:3a:64:39:3a:64:62:3a:30:64:3a:31:39:3a:35:39:3a:39:62:3a:62:31:3a:33:62:3a:33:39:3a:38:65:3a:65:39:3a:65:34",
      "ExternalTrustDomain": "2b56ec58-87f8-dda6-bbf5-b3f3f24fa3ed",
      "NotBefore": "2019-07-22T21:47:09Z",
      "NotAfter": "2029-07-22T21:47:09Z",
      "RootCert": "-----BEGIN CERTIFICATE-----\nMIICWDCCAf+gAwIBAgIBBzAKBggqhkjOPQQDAjAWMRQwEgYDVQQDEwtDb25zdWwg\n...\nqJRf+hLfFc1SdWq8eiMuyt422i/PSpby05pMnw==\n-----END CERTIFICATE-----\n",
      "IntermediateCerts": null,
      "Active": false,
      "CreateIndex": 8,
      "ModifyIndex": 691
    }
  ]
}

The text was updated successfully, but these errors were encountered:

mkeeler · 2019-07-22T23:14:09Z

I think you are correct, that ServerHealth RPC would need a token in order to succeed.

I think the better solution however might be to use the information advertised via Serf instead of making RPC requests to all the servers to figure this out. Thats what we do to determine the legacy/new ACL mode. I will be looking into this more tomorrow morning.

Also thank you for the extremely detailed and clear bug report.

tristan-weil mentioned this issue Jul 22, 2019

Use the replication token when checking the health of a primary DC during CA replication #6193

Closed

mkeeler added this to the 1.6.0-beta3 milestone Jul 22, 2019

mkeeler self-assigned this Jul 22, 2019

mkeeler mentioned this issue Jul 23, 2019

Fix CA Replication when ACLs are enabled #6201

Merged

mkeeler closed this as completed Jul 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CA replication not working 1.6beta2 #6192

CA replication not working 1.6beta2 #6192

tristan-weil commented Jul 22, 2019 •

edited

Loading

mkeeler commented Jul 22, 2019 •

edited

Loading

CA replication not working 1.6beta2 #6192

CA replication not working 1.6beta2 #6192

Comments

tristan-weil commented Jul 22, 2019 • edited Loading

Overview of the Issue

Reproduction Steps

Consul info for both Client and Server

Operating system and Environment details

Log Fragments

mkeeler commented Jul 22, 2019 • edited Loading

tristan-weil commented Jul 22, 2019 •

edited

Loading

mkeeler commented Jul 22, 2019 •

edited

Loading