Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One of three nodes didn't delete object which had been deleted by kubernetes #9251

Closed
clouduol opened this issue Jan 30, 2018 · 11 comments
Closed

Comments

@clouduol
Copy link

clouduol commented Jan 30, 2018

I have a three nodes etcd cluster. It is the data store of my kubernetes cluster. But I found inconsistency of the etcd cluster today. The state of my etcd cluster is:

https://192.168.139.50:2379, 6c66db9dc17b0eb5, 3.2.7, 136 MB, false, 54529, 17002106
https://192.168.139.51:2379, 5227f18ffa089aec, 3.2.7, 136 MB, true, 54529, 17002949
https://192.168.139.52:2379, e448a37828aa0925, 3.2.7, 240 MB, false, 54529, 17003472

I get all keys from the three nodes respectively, node 192.168.139.50 and node 192.168.139.51 are totally same. However, node 192.168.139.52 has some objects that I have deleted with kubectl.

To figure out the inconsistency, I run kubectl create ns etcd-test. Then get the object from three nodes, the command and results are:

ETCDCTL_API=3 etcdctl --cacert="/etc/kubernetes/ssl/ca.pem" --cert="/etc/flanneld/ssl/flanneld.pem" --key="/etc/flanneld/ssl/flanneld-key.pem" --endpoints=https://192.168.139.50:2379 get /registry/namespaces/etcd-test -w json | jq
{
  "header": {
    "cluster_id": 17573681554831794000,
    "member_id": 7811172074323120000,
    "revision": 7183747,
    "raft_term": 54529
  },
  "kvs": [
    {
      "key": "L3JlZ2lzdHJ5L25hbWVzcGFjZXMvZXRjZC10ZXN0",
      "create_revision": 7177967,
      "mod_revision": 7177967,
      "version": 1,
      "value": "azhzAAoPCgJ2MRIJTmFtZXNwYWNlEmEKRwoJZXRjZC10ZXN0EgAaACIAKiRjODQ4Y2EzMy0wNWJmLTExZTgtOWVhYy0wMDBjMjlhZWNjMTYyADgAQggIreDB0wUQAHoAEgwKCmt1YmVybmV0ZXMaCAoGQWN0aXZlGgAiAA=="
    }
  ],
  "count": 1
}
ETCDCTL_API=3 etcdctl --cacert="/etc/kubernetes/ssl/ca.pem" --cert="/etc/flanneld/ssl/flanneld.pem" --key="/etc/flanneld/ssl/flanneld-key.pem" --endpoints=https://192.168.139.51:2379 get /registry/namespaces/etcd-test -w json | jq
{
  "header": {
    "cluster_id": 17573681554831794000,
    "member_id": 5919965835879292000,
    "revision": 7183284,
    "raft_term": 54529
  },
  "kvs": [
    {
      "key": "L3JlZ2lzdHJ5L25hbWVzcGFjZXMvZXRjZC10ZXN0",
      "create_revision": 7177967,
      "mod_revision": 7177967,
      "version": 1,
      "value": "azhzAAoPCgJ2MRIJTmFtZXNwYWNlEmEKRwoJZXRjZC10ZXN0EgAaACIAKiRjODQ4Y2EzMy0wNWJmLTExZTgtOWVhYy0wMDBjMjlhZWNjMTYyADgAQggIreDB0wUQAHoAEgwKCmt1YmVybmV0ZXMaCAoGQWN0aXZlGgAiAA=="
    }
  ],
  "count": 1
}
 ETCDCTL_API=3 etcdctl --cacert="/etc/kubernetes/ssl/ca.pem" --cert="/etc/flanneld/ssl/flanneld.pem" --key="/etc/flanneld/ssl/flanneld-key.pem" --endpoints=https://192.168.139.52:2379 get /registry/namespaces/etcd-test -w json | jq
{
  "header": {
    "cluster_id": 17573681554831794000,
    "member_id": 16449577375444371000,
    "revision": 2613905,
    "raft_term": 54529
  },
  "kvs": [
    {
      "key": "L3JlZ2lzdHJ5L25hbWVzcGFjZXMvZXRjZC10ZXN0",
      "create_revision": 2611610,
      "mod_revision": 2611610,
      "version": 1,
      "value": "azhzAAoPCgJ2MRIJTmFtZXNwYWNlEmEKRwoJZXRjZC10ZXN0EgAaACIAKiRjODQ4Y2EzMy0wNWJmLTExZTgtOWVhYy0wMDBjMjlhZWNjMTYyADgAQggIreDB0wUQAHoAEgwKCmt1YmVybmV0ZXMaCAoGQWN0aXZlGgAiAA=="
    }
  ],
  "count": 1
}

Later I delete this namespace using kubectl delete ns etcd-test. Then run the same get command, the results are:

{
  "header": {
    "cluster_id": 17573681554831794000,
    "member_id": 7811172074323120000,
    "revision": 7208175,
    "raft_term": 54529
  }
}
{
  "header": {
    "cluster_id": 17573681554831794000,
    "member_id": 5919965835879292000,
    "revision": 7212676,
    "raft_term": 54529
  }
}
{
  "header": {
    "cluster_id": 17573681554831794000,
    "member_id": 16449577375444371000,
    "revision": 2630145,
    "raft_term": 54529
  },
  "kvs": [
    {
      "key": "L3JlZ2lzdHJ5L25hbWVzcGFjZXMvZXRjZC10ZXN0",
      "create_revision": 2611610,
      "mod_revision": 2611610,
      "version": 1,
      "value": "azhzAAoPCgJ2MRIJTmFtZXNwYWNlEmEKRwoJZXRjZC10ZXN0EgAaACIAKiRjODQ4Y2EzMy0wNWJmLTExZTgtOWVhYy0wMDBjMjlhZWNjMTYyADgAQggIreDB0wUQAHoAEgwKCmt1YmVybmV0ZXMaCAoGQWN0aXZlGgAiAA=="
    }
  ],
  "count": 1
}

It is obvious that node 192.168.139.53 didn't delete the object at all. I know there is a quorum mechanism in etcd cluster. But I am not sure if this is right.

What's more, according to https://coreos.com/etcd/docs/latest/faq.html, all request to etcd cluster followers will be forwarded to the leader. Why I got different results from different endpoints?

[configuration]
node 192.168.139.50

ExecStart=/root/local/bin/etcd \
  --name=etcd-host0 \
  --cert-file=/etc/etcd/ssl/etcd.pem \
  --key-file=/etc/etcd/ssl/etcd-key.pem \
  --peer-cert-file=/etc/etcd/ssl/etcd.pem \
  --peer-key-file=/etc/etcd/ssl/etcd-key.pem \
  --trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --initial-advertise-peer-urls=https://192.168.139.50:2380 \
  --listen-peer-urls=https://192.168.139.50:2380 \
  --listen-client-urls=https://192.168.139.50:2379,http://127.0.0.1:2379 \
  --advertise-client-urls=https://192.168.139.50:2379 \
  --initial-cluster-token=etcd-cluster-0 \
  --initial-cluster=etcd-host0=https://192.168.139.50:2380,etcd-host1=https://192.168.139.51:2380,etcd-host2=https://192.168.139.52:2380 \
  --initial-cluster-state=new \
  --data-dir=/var/lib/etcd

node 192.168.139.51

ExecStart=/root/local/bin/etcd \
  --name=etcd-host1 \
  --cert-file=/etc/etcd/ssl/etcd.pem \
  --key-file=/etc/etcd/ssl/etcd-key.pem \
  --peer-cert-file=/etc/etcd/ssl/etcd.pem \
  --peer-key-file=/etc/etcd/ssl/etcd-key.pem \
  --trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --initial-advertise-peer-urls=https://192.168.139.51:2380 \
  --listen-peer-urls=https://192.168.139.51:2380 \
  --listen-client-urls=https://192.168.139.51:2379,http://127.0.0.1:2379 \
  --advertise-client-urls=https://192.168.139.51:2379 \
  --initial-cluster-token=etcd-cluster-0 \
  --initial-cluster=etcd-host0=https://192.168.139.50:2380,etcd-host1=https://192.168.139.51:2380,etcd-host2=https://192.168.139.52:2380 \
  --initial-cluster-state=new \
  --data-dir=/var/lib/etcd

node 192.168.139.52

ExecStart=/root/local/bin/etcd \
  --name=etcd-host2 \
  --cert-file=/etc/etcd/ssl/etcd.pem \
  --key-file=/etc/etcd/ssl/etcd-key.pem \
  --peer-cert-file=/etc/etcd/ssl/etcd.pem \
  --peer-key-file=/etc/etcd/ssl/etcd-key.pem \
  --trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --initial-advertise-peer-urls=https://192.168.139.52:2380 \
  --listen-peer-urls=https://192.168.139.52:2380 \
  --listen-client-urls=https://192.168.139.52:2379,http://127.0.0.1:2379 \
  --advertise-client-urls=https://192.168.139.52:2379 \
  --initial-cluster-token=etcd-cluster-0 \
  --initial-cluster=etcd-host0=https://192.168.139.50:2380,etcd-host1=https://192.168.139.51:2380,etcd-host2=https://192.168.139.52:2380 \
  --initial-cluster-state=new \
  --data-dir=/var/lib/etcd

[version]
etcd

etcd Version: 3.2.7
Git SHA: bb66589
Go Version: go1.8.3
Go OS/Arch: linux/amd64

kubernetes

Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.5", GitCommit:"cce11c6a185279d037023e02ac5249e14daa22bf", GitTreeState:"clean", BuildDate:"2017-12-07T16:16:03Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.5", GitCommit:"cce11c6a185279d037023e02ac5249e14daa22bf", GitTreeState:"clean", BuildDate:"2017-12-07T16:05:18Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
@hexfusion
Copy link
Contributor

https://192.168.139.50:2379, 6c66db9dc17b0eb5, 3.2.7, 136 MB, false, 54529, 17002106
https://192.168.139.51:2379, 5227f18ffa089aec, 3.2.7, 136 MB, true, 54529, 17002949
https://192.168.139.52:2379, e448a37828aa0925, 3.2.7, 240 MB, false, 54529, 17003472

@learnMachining, thanks for the details last node has larger DB I believe you are hitting #8009 ref: #7116.

I will try to reproduce this, could you post the logfile for e448a37828aa0925?

@clouduol
Copy link
Author

@hexfusion Thanks for reply. The log of e448a37828aa0925 is:

Jan 31 08:41:16 fedora24_k8s_node2 etcd[1084]: ready to serve client requests
Jan 31 08:41:16 fedora24_k8s_node2 etcd[1084]: serving client requests on 192.168.139.52:2379
Jan 31 08:41:16 fedora24_k8s_node2 etcd[1084]: ready to serve client requests
Jan 31 08:41:16 fedora24_k8s_node2 etcd[1084]: serving insecure client requests on 127.0.0.1:2379, this is strongly discouraged!
Jan 31 09:30:14 fedora24_k8s_node2 etcd[1084]: apply entries took too long [4.996481033s for 1 entries]
Jan 31 09:30:14 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 09:30:32 fedora24_k8s_node2 etcd[1084]: apply entries took too long [418.491146ms for 1 entries]
Jan 31 09:30:32 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 09:37:42 fedora24_k8s_node2 etcd[1084]: apply entries took too long [22.425247918s for 2 entries]
Jan 31 09:37:42 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 09:42:30 fedora24_k8s_node2 etcd[1084]: start to snapshot (applied: 21700269, lastsnap: 21600266)
Jan 31 09:42:30 fedora24_k8s_node2 etcd[1084]: saved snapshot at index 21700269
Jan 31 09:42:30 fedora24_k8s_node2 etcd[1084]: compacted raft log at 21695269
Jan 31 09:42:46 fedora24_k8s_node2 etcd[1084]: purged file /var/lib/etcd/member/snap/000000000000d567-0000000001437d86.snap successfully
Jan 31 09:44:43 fedora24_k8s_node2 etcd[1084]: start to snapshot (applied: 21800271, lastsnap: 21700269)
Jan 31 09:44:43 fedora24_k8s_node2 etcd[1084]: saved snapshot at index 21800271
Jan 31 09:44:43 fedora24_k8s_node2 etcd[1084]: compacted raft log at 21795271
Jan 31 09:44:46 fedora24_k8s_node2 etcd[1084]: purged file /var/lib/etcd/member/snap/000000000000d578-0000000001450427.snap successfully
Jan 31 09:46:56 fedora24_k8s_node2 etcd[1084]: start to snapshot (applied: 21900272, lastsnap: 21800271)
Jan 31 09:46:56 fedora24_k8s_node2 etcd[1084]: saved snapshot at index 21900272
Jan 31 09:46:56 fedora24_k8s_node2 etcd[1084]: compacted raft log at 21895272
Jan 31 09:47:16 fedora24_k8s_node2 etcd[1084]: purged file /var/lib/etcd/member/snap/000000000000d594-0000000001468ac8.snap successfully
Jan 31 10:04:04 fedora24_k8s_node2 etcd[1084]: store.index: compact 4043767
Jan 31 10:04:05 fedora24_k8s_node2 etcd[1084]: apply entries took too long [1.532524171s for 1 entries]
Jan 31 10:04:05 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 10:04:06 fedora24_k8s_node2 etcd[1084]: apply entries took too long [126.742086ms for 1 entries]
Jan 31 10:04:06 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 10:04:08 fedora24_k8s_node2 etcd[1084]: apply entries took too long [199.255851ms for 1 entries]
Jan 31 10:04:08 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 10:04:10 fedora24_k8s_node2 etcd[1084]: apply entries took too long [110.257772ms for 1 entries]
Jan 31 10:04:10 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 10:04:12 fedora24_k8s_node2 etcd[1084]: apply entries took too long [141.880276ms for 1 entries]
Jan 31 10:04:12 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 10:04:14 fedora24_k8s_node2 etcd[1084]: starting to defragment the storage backend...
Jan 31 10:04:25 fedora24_k8s_node2 etcd[1084]: starting to defragment the storage backend...
Jan 31 10:04:54 fedora24_k8s_node2 etcd[1084]: finished defragmenting the storage backend
Jan 31 10:05:24 fedora24_k8s_node2 etcd[1084]: finished defragmenting the storage backend
Jan 31 10:05:24 fedora24_k8s_node2 etcd[1084]: apply entries took too long [1m8.202936241s for 1 entries]
Jan 31 10:05:24 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 10:05:38 fedora24_k8s_node2 etcd[1084]: apply entries took too long [153.017874ms for 1 entries]
Jan 31 10:05:38 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 10:05:47 fedora24_k8s_node2 etcd[1084]: finished scheduled compaction at 4043767 (took 1m42.185732708s)
Jan 31 10:07:28 fedora24_k8s_node2 etcd[1084]: store.index: compact 4043768
Jan 31 10:07:28 fedora24_k8s_node2 etcd[1084]: apply entries took too long [197.173551ms for 1 entries]
Jan 31 10:07:28 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 10:07:30 fedora24_k8s_node2 etcd[1084]: finished scheduled compaction at 4043768 (took 2.389642113s)
Jan 31 10:07:35 fedora24_k8s_node2 etcd[1084]: starting to defragment the storage backend...
Jan 31 10:07:37 fedora24_k8s_node2 etcd[1084]: finished defragmenting the storage backend
Jan 31 10:07:37 fedora24_k8s_node2 etcd[1084]: apply entries took too long [2.414080868s for 1 entries]
Jan 31 10:07:37 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 10:10:16 fedora24_k8s_node2 etcd[1084]: apply entries took too long [400.616855ms for 1 entries]
Jan 31 10:10:16 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 10:20:49 fedora24_k8s_node2 etcd[1084]: apply entries took too long [1.660932211s for 1 entries]
Jan 31 10:20:49 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 10:21:09 fedora24_k8s_node2 etcd[1084]: apply entries took too long [658.732222ms for 1 entries]
Jan 31 10:21:09 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!
Jan 31 10:24:41 fedora24_k8s_node2 etcd[1084]: start to snapshot (applied: 22000273, lastsnap: 21900272)
Jan 31 10:24:42 fedora24_k8s_node2 etcd[1084]: saved snapshot at index 22000273
Jan 31 10:24:42 fedora24_k8s_node2 etcd[1084]: compacted raft log at 21995273
Jan 31 10:24:46 fedora24_k8s_node2 etcd[1084]: purged file /var/lib/etcd/member/snap/000000000000d59d-0000000001481169.snap successfully
Jan 31 10:25:14 fedora24_k8s_node2 etcd[1084]: apply entries took too long [524.967678ms for 2 entries]
Jan 31 10:25:14 fedora24_k8s_node2 etcd[1084]: avoid queries with large range/delete range!

@clouduol
Copy link
Author

clouduol commented Jan 31, 2018

I have read the issues #8009 and #7116, I don't think the problem are the same. The node e448a37828aa0925 is larger than the other two because it didn't delete any objects kubernetes had deleted.

To find more information, I create an object with

ETCDCTL_API=3 etcdctl --cacert="/etc/kubernetes/ssl/ca.pem" --cert="/etc/flanneld/ssl/flanneld.pem" --key="/etc/flanneld/ssl/flanneld-key.pem" --endpoints=https://192.168.139.50:2379 put /registry/atest

and get this object from all the three nodes respectively and that works normally.

Then I delete this object with etcdctl and get it from all of the three nodes again. But this time, no nodes return values. So all of them have deleted the object.

What's more, I investigate logs of the three nodes and find something strange:

  1. node 6c66db9dc17b0eb5 and node 5227f18ffa089aec compact every five minutes, but node e448a37828aa0925 not
  2. node e448a37828aa0925 do snapshot sometimes

From the above, I guess the problem is relevant to kubernetes. Maybe this is normal behavior for a etcd cluster for kubernetes? But how kubernetes only delete objects from two of the three nodes? I am very confused.

@clouduol
Copy link
Author

clouduol commented Jan 31, 2018

Now my etcd cluster run normally.

I stop all nodes and backup /var/lib/etcd for node e448a37828aa0925 . Then delete all data in /var/lib/etcd on this node and start all the three nodes.

After syncing, I redo the create ns test above. And this time all the three nodes delete the namespace created by kubectl.

And all the three nodes compact every five minutes. It seems normal.

I don't actually know what happened, but it works well till now.

@gyuho
Copy link
Contributor

gyuho commented Jan 31, 2018

@learnMachining If you look at header from get response, the revision in 3rd node is quite different, which we need to figure out. How did you operate this etcd cluster? Any maintenance works?

@clouduol
Copy link
Author

clouduol commented Feb 1, 2018

@gyuho I did nothing to this cluster. All is default. K8s uses etcd v3, calico and flannel use etcd v2. After installation, etcd, k8s, calico and flannel worked as their own way. And I found the inconsistency yesterday.

@gyuho
Copy link
Contributor

gyuho commented Feb 1, 2018

@learnMachining Do you have all server logs from beginning (from revision 1) that you can share?

@clouduol
Copy link
Author

clouduol commented Feb 2, 2018

@gyuho I'm afraid not. I cleared logs some times. But this is my own experiment environment. So I can give you all the logs I still have. Hope they are helpful.
etcd-node0-6c66db9dc17b0eb5.log.tar.gz
etcd-node1-5227f18ffa089aec.log.tar.gz
etcd-node2-e448a37828aa0925.log.tar.gz

@gyuho
Copy link
Contributor

gyuho commented Feb 2, 2018

Also the db file from all 3 nodes?

@clouduol
Copy link
Author

clouduol commented Feb 3, 2018

@gyuho I backup db files of node e448a37828aa0925 in /tmp directory, after restarting, I can't find them. Now all 3 nodes has the same data and the cluster run normallly.

I'm sorry for loosing the backup data. Please close this issue. I will reopen it if I encounter simmilar problems or find more useful information.

@gyuho
Copy link
Contributor

gyuho commented Feb 3, 2018

@learnMachining Yeah there seems like too many components... but the revision mismatch should not happen. Please keep all the logs or just book-keep all the operations you do with etcd cluster, in case this happens again. And use latest etcd, if you can.

Let's close this for now, and reopen when it happens again.

Thanks.

@gyuho gyuho closed this as completed Feb 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants