One of three nodes didn't delete object which had been deleted by kubernetes #9251

clouduol · 2018-01-30T13:34:44Z

I have a three nodes etcd cluster. It is the data store of my kubernetes cluster. But I found inconsistency of the etcd cluster today. The state of my etcd cluster is:

https://192.168.139.50:2379, 6c66db9dc17b0eb5, 3.2.7, 136 MB, false, 54529, 17002106
https://192.168.139.51:2379, 5227f18ffa089aec, 3.2.7, 136 MB, true, 54529, 17002949
https://192.168.139.52:2379, e448a37828aa0925, 3.2.7, 240 MB, false, 54529, 17003472

I get all keys from the three nodes respectively, node 192.168.139.50 and node 192.168.139.51 are totally same. However, node 192.168.139.52 has some objects that I have deleted with kubectl.

To figure out the inconsistency, I run kubectl create ns etcd-test. Then get the object from three nodes, the command and results are:

ETCDCTL_API=3 etcdctl --cacert="/etc/kubernetes/ssl/ca.pem" --cert="/etc/flanneld/ssl/flanneld.pem" --key="/etc/flanneld/ssl/flanneld-key.pem" --endpoints=https://192.168.139.50:2379 get /registry/namespaces/etcd-test -w json | jq
{
  "header": {
    "cluster_id": 17573681554831794000,
    "member_id": 7811172074323120000,
    "revision": 7183747,
    "raft_term": 54529
  },
  "kvs": [
    {
      "key": "L3JlZ2lzdHJ5L25hbWVzcGFjZXMvZXRjZC10ZXN0",
      "create_revision": 7177967,
      "mod_revision": 7177967,
      "version": 1,
      "value": "azhzAAoPCgJ2MRIJTmFtZXNwYWNlEmEKRwoJZXRjZC10ZXN0EgAaACIAKiRjODQ4Y2EzMy0wNWJmLTExZTgtOWVhYy0wMDBjMjlhZWNjMTYyADgAQggIreDB0wUQAHoAEgwKCmt1YmVybmV0ZXMaCAoGQWN0aXZlGgAiAA=="
    }
  ],
  "count": 1
}

ETCDCTL_API=3 etcdctl --cacert="/etc/kubernetes/ssl/ca.pem" --cert="/etc/flanneld/ssl/flanneld.pem" --key="/etc/flanneld/ssl/flanneld-key.pem" --endpoints=https://192.168.139.51:2379 get /registry/namespaces/etcd-test -w json | jq
{
  "header": {
    "cluster_id": 17573681554831794000,
    "member_id": 5919965835879292000,
    "revision": 7183284,
    "raft_term": 54529
  },
  "kvs": [
    {
      "key": "L3JlZ2lzdHJ5L25hbWVzcGFjZXMvZXRjZC10ZXN0",
      "create_revision": 7177967,
      "mod_revision": 7177967,
      "version": 1,
      "value": "azhzAAoPCgJ2MRIJTmFtZXNwYWNlEmEKRwoJZXRjZC10ZXN0EgAaACIAKiRjODQ4Y2EzMy0wNWJmLTExZTgtOWVhYy0wMDBjMjlhZWNjMTYyADgAQggIreDB0wUQAHoAEgwKCmt1YmVybmV0ZXMaCAoGQWN0aXZlGgAiAA=="
    }
  ],
  "count": 1
}

 ETCDCTL_API=3 etcdctl --cacert="/etc/kubernetes/ssl/ca.pem" --cert="/etc/flanneld/ssl/flanneld.pem" --key="/etc/flanneld/ssl/flanneld-key.pem" --endpoints=https://192.168.139.52:2379 get /registry/namespaces/etcd-test -w json | jq
{
  "header": {
    "cluster_id": 17573681554831794000,
    "member_id": 16449577375444371000,
    "revision": 2613905,
    "raft_term": 54529
  },
  "kvs": [
    {
      "key": "L3JlZ2lzdHJ5L25hbWVzcGFjZXMvZXRjZC10ZXN0",
      "create_revision": 2611610,
      "mod_revision": 2611610,
      "version": 1,
      "value": "azhzAAoPCgJ2MRIJTmFtZXNwYWNlEmEKRwoJZXRjZC10ZXN0EgAaACIAKiRjODQ4Y2EzMy0wNWJmLTExZTgtOWVhYy0wMDBjMjlhZWNjMTYyADgAQggIreDB0wUQAHoAEgwKCmt1YmVybmV0ZXMaCAoGQWN0aXZlGgAiAA=="
    }
  ],
  "count": 1
}

Later I delete this namespace using kubectl delete ns etcd-test. Then run the same get command, the results are:

{
  "header": {
    "cluster_id": 17573681554831794000,
    "member_id": 7811172074323120000,
    "revision": 7208175,
    "raft_term": 54529
  }
}

{
  "header": {
    "cluster_id": 17573681554831794000,
    "member_id": 5919965835879292000,
    "revision": 7212676,
    "raft_term": 54529
  }
}

{
  "header": {
    "cluster_id": 17573681554831794000,
    "member_id": 16449577375444371000,
    "revision": 2630145,
    "raft_term": 54529
  },
  "kvs": [
    {
      "key": "L3JlZ2lzdHJ5L25hbWVzcGFjZXMvZXRjZC10ZXN0",
      "create_revision": 2611610,
      "mod_revision": 2611610,
      "version": 1,
      "value": "azhzAAoPCgJ2MRIJTmFtZXNwYWNlEmEKRwoJZXRjZC10ZXN0EgAaACIAKiRjODQ4Y2EzMy0wNWJmLTExZTgtOWVhYy0wMDBjMjlhZWNjMTYyADgAQggIreDB0wUQAHoAEgwKCmt1YmVybmV0ZXMaCAoGQWN0aXZlGgAiAA=="
    }
  ],
  "count": 1
}

It is obvious that node 192.168.139.53 didn't delete the object at all. I know there is a quorum mechanism in etcd cluster. But I am not sure if this is right.

What's more, according to https://coreos.com/etcd/docs/latest/faq.html, all request to etcd cluster followers will be forwarded to the leader. Why I got different results from different endpoints?

[configuration]
node 192.168.139.50

ExecStart=/root/local/bin/etcd \
  --name=etcd-host0 \
  --cert-file=/etc/etcd/ssl/etcd.pem \
  --key-file=/etc/etcd/ssl/etcd-key.pem \
  --peer-cert-file=/etc/etcd/ssl/etcd.pem \
  --peer-key-file=/etc/etcd/ssl/etcd-key.pem \
  --trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --initial-advertise-peer-urls=https://192.168.139.50:2380 \
  --listen-peer-urls=https://192.168.139.50:2380 \
  --listen-client-urls=https://192.168.139.50:2379,http://127.0.0.1:2379 \
  --advertise-client-urls=https://192.168.139.50:2379 \
  --initial-cluster-token=etcd-cluster-0 \
  --initial-cluster=etcd-host0=https://192.168.139.50:2380,etcd-host1=https://192.168.139.51:2380,etcd-host2=https://192.168.139.52:2380 \
  --initial-cluster-state=new \
  --data-dir=/var/lib/etcd

node 192.168.139.51

ExecStart=/root/local/bin/etcd \
  --name=etcd-host1 \
  --cert-file=/etc/etcd/ssl/etcd.pem \
  --key-file=/etc/etcd/ssl/etcd-key.pem \
  --peer-cert-file=/etc/etcd/ssl/etcd.pem \
  --peer-key-file=/etc/etcd/ssl/etcd-key.pem \
  --trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --initial-advertise-peer-urls=https://192.168.139.51:2380 \
  --listen-peer-urls=https://192.168.139.51:2380 \
  --listen-client-urls=https://192.168.139.51:2379,http://127.0.0.1:2379 \
  --advertise-client-urls=https://192.168.139.51:2379 \
  --initial-cluster-token=etcd-cluster-0 \
  --initial-cluster=etcd-host0=https://192.168.139.50:2380,etcd-host1=https://192.168.139.51:2380,etcd-host2=https://192.168.139.52:2380 \
  --initial-cluster-state=new \
  --data-dir=/var/lib/etcd

node 192.168.139.52

ExecStart=/root/local/bin/etcd \
  --name=etcd-host2 \
  --cert-file=/etc/etcd/ssl/etcd.pem \
  --key-file=/etc/etcd/ssl/etcd-key.pem \
  --peer-cert-file=/etc/etcd/ssl/etcd.pem \
  --peer-key-file=/etc/etcd/ssl/etcd-key.pem \
  --trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --initial-advertise-peer-urls=https://192.168.139.52:2380 \
  --listen-peer-urls=https://192.168.139.52:2380 \
  --listen-client-urls=https://192.168.139.52:2379,http://127.0.0.1:2379 \
  --advertise-client-urls=https://192.168.139.52:2379 \
  --initial-cluster-token=etcd-cluster-0 \
  --initial-cluster=etcd-host0=https://192.168.139.50:2380,etcd-host1=https://192.168.139.51:2380,etcd-host2=https://192.168.139.52:2380 \
  --initial-cluster-state=new \
  --data-dir=/var/lib/etcd

[version]
etcd

etcd Version: 3.2.7
Git SHA: bb66589
Go Version: go1.8.3
Go OS/Arch: linux/amd64

kubernetes

Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.5", GitCommit:"cce11c6a185279d037023e02ac5249e14daa22bf", GitTreeState:"clean", BuildDate:"2017-12-07T16:16:03Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.5", GitCommit:"cce11c6a185279d037023e02ac5249e14daa22bf", GitTreeState:"clean", BuildDate:"2017-12-07T16:05:18Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

The text was updated successfully, but these errors were encountered:

hexfusion · 2018-01-31T03:03:36Z

https://192.168.139.50:2379, 6c66db9dc17b0eb5, 3.2.7, 136 MB, false, 54529, 17002106
https://192.168.139.51:2379, 5227f18ffa089aec, 3.2.7, 136 MB, true, 54529, 17002949
https://192.168.139.52:2379, e448a37828aa0925, 3.2.7, 240 MB, false, 54529, 17003472

@learnMachining, thanks for the details last node has larger DB I believe you are hitting #8009 ref: #7116.

I will try to reproduce this, could you post the logfile for e448a37828aa0925?

clouduol · 2018-01-31T03:52:10Z

@hexfusion Thanks for reply. The log of e448a37828aa0925 is:

Jan 31 08:41:16 fedora24_k8s_node2 etcd[1084]: ready to serve client requests
Jan 31 08:41:16 fedora24_k8s_node2 etcd[1084]: serving client requests on 192.168.139.52:2379
Jan 31 08:41:16 fedora24_k8s_node2 etcd[1084]: ready to serve client requests
Jan 31 08:41:16 fedora24_k8s_node2 etcd[1084]: serving insecure client requests on 127.0.0.1:2379, this is strongly discouraged!
Jan 31 09:30:14 fedora24_k8s_node2 etcd[1084]: apply entries took too long [4.996481033s for 1 entries]
Jan 31 09:30:14 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 09:30:32 fedora24_k8s_node2 etcd[1084]: apply entries took too long [418.491146ms for 1 entries]
Jan 31 09:30:32 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 09:37:42 fedora24_k8s_node2 etcd[1084]: apply entries took too long [22.425247918s for 2 entries]
Jan 31 09:37:42 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 09:42:30 fedora24_k8s_node2 etcd[1084]: start to snapshot (applied: 21700269, lastsnap: 21600266)
Jan 31 09:42:30 fedora24_k8s_node2 etcd[1084]: saved snapshot at index 21700269
Jan 31 09:42:30 fedora24_k8s_node2 etcd[1084]: compacted raft log at 21695269
Jan 31 09:42:46 fedora24_k8s_node2 etcd[1084]: purged file /var/lib/etcd/member/snap/000000000000d567-0000000001437d86.snap successfully
Jan 31 09:44:43 fedora24_k8s_node2 etcd[1084]: start to snapshot (applied: 21800271, lastsnap: 21700269)
Jan 31 09:44:43 fedora24_k8s_node2 etcd[1084]: saved snapshot at index 21800271
Jan 31 09:44:43 fedora24_k8s_node2 etcd[1084]: compacted raft log at 21795271
Jan 31 09:44:46 fedora24_k8s_node2 etcd[1084]: purged file /var/lib/etcd/member/snap/000000000000d578-0000000001450427.snap successfully
Jan 31 09:46:56 fedora24_k8s_node2 etcd[1084]: start to snapshot (applied: 21900272, lastsnap: 21800271)
Jan 31 09:46:56 fedora24_k8s_node2 etcd[1084]: saved snapshot at index 21900272
Jan 31 09:46:56 fedora24_k8s_node2 etcd[1084]: compacted raft log at 21895272
Jan 31 09:47:16 fedora24_k8s_node2 etcd[1084]: purged file /var/lib/etcd/member/snap/000000000000d594-0000000001468ac8.snap successfully
Jan 31 10:04:04 fedora24_k8s_node2 etcd[1084]: store.index: compact 4043767
Jan 31 10:04:05 fedora24_k8s_node2 etcd[1084]: apply entries took too long [1.532524171s for 1 entries]
Jan 31 10:04:05 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 10:04:06 fedora24_k8s_node2 etcd[1084]: apply entries took too long [126.742086ms for 1 entries]
Jan 31 10:04:06 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 10:04:08 fedora24_k8s_node2 etcd[1084]: apply entries took too long [199.255851ms for 1 entries]
Jan 31 10:04:08 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 10:04:10 fedora24_k8s_node2 etcd[1084]: apply entries took too long [110.257772ms for 1 entries]
Jan 31 10:04:10 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 10:04:12 fedora24_k8s_node2 etcd[1084]: apply entries took too long [141.880276ms for 1 entries]
Jan 31 10:04:12 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 10:04:14 fedora24_k8s_node2 etcd[1084]: starting to defragment the storage backend...
Jan 31 10:04:25 fedora24_k8s_node2 etcd[1084]: starting to defragment the storage backend...
Jan 31 10:04:54 fedora24_k8s_node2 etcd[1084]: finished defragmenting the storage backend
Jan 31 10:05:24 fedora24_k8s_node2 etcd[1084]: finished defragmenting the storage backend
Jan 31 10:05:24 fedora24_k8s_node2 etcd[1084]: apply entries took too long [1m8.202936241s for 1 entries]
Jan 31 10:05:24 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 10:05:38 fedora24_k8s_node2 etcd[1084]: apply entries took too long [153.017874ms for 1 entries]
Jan 31 10:05:38 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 10:05:47 fedora24_k8s_node2 etcd[1084]: finished scheduled compaction at 4043767 (took 1m42.185732708s)
Jan 31 10:07:28 fedora24_k8s_node2 etcd[1084]: store.index: compact 4043768
Jan 31 10:07:28 fedora24_k8s_node2 etcd[1084]: apply entries took too long [197.173551ms for 1 entries]
Jan 31 10:07:28 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 10:07:30 fedora24_k8s_node2 etcd[1084]: finished scheduled compaction at 4043768 (took 2.389642113s)
Jan 31 10:07:35 fedora24_k8s_node2 etcd[1084]: starting to defragment the storage backend...
Jan 31 10:07:37 fedora24_k8s_node2 etcd[1084]: finished defragmenting the storage backend
Jan 31 10:07:37 fedora24_k8s_node2 etcd[1084]: apply entries took too long [2.414080868s for 1 entries]
Jan 31 10:07:37 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 10:10:16 fedora24_k8s_node2 etcd[1084]: apply entries took too long [400.616855ms for 1 entries]
Jan 31 10:10:16 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 10:20:49 fedora24_k8s_node2 etcd[1084]: apply entries took too long [1.660932211s for 1 entries]
Jan 31 10:20:49 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 10:21:09 fedora24_k8s_node2 etcd[1084]: apply entries took too long [658.732222ms for 1 entries]
Jan 31 10:21:09 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 10:24:41 fedora24_k8s_node2 etcd[1084]: start to snapshot (applied: 22000273, lastsnap: 21900272)
Jan 31 10:24:42 fedora24_k8s_node2 etcd[1084]: saved snapshot at index 22000273
Jan 31 10:24:42 fedora24_k8s_node2 etcd[1084]: compacted raft log at 21995273
Jan 31 10:24:46 fedora24_k8s_node2 etcd[1084]: purged file /var/lib/etcd/member/snap/000000000000d59d-0000000001481169.snap successfully
Jan 31 10:25:14 fedora24_k8s_node2 etcd[1084]: apply entries took too long [524.967678ms for 2 entries]
Jan 31 10:25:14 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!

clouduol · 2018-01-31T08:23:00Z

I have read the issues #8009 and #7116, I don't think the problem are the same. The node e448a37828aa0925 is larger than the other two because it didn't delete any objects kubernetes had deleted.

To find more information, I create an object with

ETCDCTL_API=3 etcdctl --cacert="/etc/kubernetes/ssl/ca.pem" --cert="/etc/flanneld/ssl/flanneld.pem" --key="/etc/flanneld/ssl/flanneld-key.pem" --endpoints=https://192.168.139.50:2379 put /registry/atest

and get this object from all the three nodes respectively and that works normally.

Then I delete this object with etcdctl and get it from all of the three nodes again. But this time, no nodes return values. So all of them have deleted the object.

What's more, I investigate logs of the three nodes and find something strange:

node 6c66db9dc17b0eb5 and node 5227f18ffa089aec compact every five minutes, but node e448a37828aa0925 not
node e448a37828aa0925 do snapshot sometimes

From the above, I guess the problem is relevant to kubernetes. Maybe this is normal behavior for a etcd cluster for kubernetes? But how kubernetes only delete objects from two of the three nodes? I am very confused.

clouduol · 2018-01-31T09:45:42Z

Now my etcd cluster run normally.

I stop all nodes and backup /var/lib/etcd for node e448a37828aa0925 . Then delete all data in /var/lib/etcd on this node and start all the three nodes.

After syncing, I redo the create ns test above. And this time all the three nodes delete the namespace created by kubectl.

And all the three nodes compact every five minutes. It seems normal.

I don't actually know what happened, but it works well till now.

gyuho · 2018-01-31T22:19:49Z

@learnMachining If you look at header from get response, the revision in 3rd node is quite different, which we need to figure out. How did you operate this etcd cluster? Any maintenance works?

clouduol · 2018-02-01T06:07:31Z

@gyuho I did nothing to this cluster. All is default. K8s uses etcd v3, calico and flannel use etcd v2. After installation, etcd, k8s, calico and flannel worked as their own way. And I found the inconsistency yesterday.

gyuho · 2018-02-01T06:09:31Z

@learnMachining Do you have all server logs from beginning (from revision 1) that you can share?

clouduol · 2018-02-02T02:56:10Z

@gyuho I'm afraid not. I cleared logs some times. But this is my own experiment environment. So I can give you all the logs I still have. Hope they are helpful.
etcd-node0-6c66db9dc17b0eb5.log.tar.gz
etcd-node1-5227f18ffa089aec.log.tar.gz
etcd-node2-e448a37828aa0925.log.tar.gz

gyuho · 2018-02-02T18:18:07Z

Also the db file from all 3 nodes?

clouduol · 2018-02-03T02:17:30Z

@gyuho I backup db files of node e448a37828aa0925 in /tmp directory, after restarting, I can't find them. Now all 3 nodes has the same data and the cluster run normallly.

I'm sorry for loosing the backup data. Please close this issue. I will reopen it if I encounter simmilar problems or find more useful information.

gyuho · 2018-02-03T03:09:09Z

@learnMachining Yeah there seems like too many components... but the revision mismatch should not happen. Please keep all the logs or just book-keep all the operations you do with etcd cluster, in case this happens again. And use latest etcd, if you can.

Let's close this for now, and reopen when it happens again.

Thanks.

gyuho closed this as completed Feb 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

One of three nodes didn't delete object which had been deleted by kubernetes #9251

One of three nodes didn't delete object which had been deleted by kubernetes #9251

clouduol commented Jan 30, 2018 •

edited

Loading

hexfusion commented Jan 31, 2018

clouduol commented Jan 31, 2018

clouduol commented Jan 31, 2018 •

edited

Loading

clouduol commented Jan 31, 2018 •

edited

Loading

gyuho commented Jan 31, 2018

clouduol commented Feb 1, 2018

gyuho commented Feb 1, 2018

clouduol commented Feb 2, 2018 •

edited

Loading

gyuho commented Feb 2, 2018

clouduol commented Feb 3, 2018

gyuho commented Feb 3, 2018 •

edited

Loading

One of three nodes didn't delete object which had been deleted by kubernetes #9251

One of three nodes didn't delete object which had been deleted by kubernetes #9251

Comments

clouduol commented Jan 30, 2018 • edited Loading

hexfusion commented Jan 31, 2018

clouduol commented Jan 31, 2018

clouduol commented Jan 31, 2018 • edited Loading

clouduol commented Jan 31, 2018 • edited Loading

gyuho commented Jan 31, 2018

clouduol commented Feb 1, 2018

gyuho commented Feb 1, 2018

clouduol commented Feb 2, 2018 • edited Loading

gyuho commented Feb 2, 2018

clouduol commented Feb 3, 2018

gyuho commented Feb 3, 2018 • edited Loading

clouduol commented Jan 30, 2018 •

edited

Loading

clouduol commented Jan 31, 2018 •

edited

Loading

clouduol commented Jan 31, 2018 •

edited

Loading

clouduol commented Feb 2, 2018 •

edited

Loading

gyuho commented Feb 3, 2018 •

edited

Loading