Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcdserver: mvcc: database space exceeded #11947

Closed
schlitzered opened this issue May 26, 2020 · 4 comments
Closed

etcdserver: mvcc: database space exceeded #11947

schlitzered opened this issue May 26, 2020 · 4 comments

Comments

@schlitzered
Copy link
Contributor

schlitzered commented May 26, 2020

hi, i am running a 3 node etcd v 3.4.9 cluster.

after running the benchmark tool with these parameters, i am not able to put new key/value pairs:

benchmark --endpoints=https://$(hostname):2379 --conns=1000 --clients=1000 put --key-size=256 --key-space-size=100000 --sequential-keys --total=10000000 --val-size=512 --user user:password

i always get this error:

./etcdctl put test blub
{"level":"warn","ts":"2020-05-26T06:52:20.826-0400","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-6ae0ff7d-58c6-463c-a70f-ca9e403cecc2/etcd-1.example.com:12379","attempt":0,"error":"rpc error: code = ResourceExhausted desc = etcdserver: mvcc: database space exceeded"}
Error: etcdserver: mvcc: database space exceeded

this is what i see in the etcd logs:

May 26 06:59:12 etcd-1.example.com etcd[95994]: authorized user, token is duYvARamyDkyQjoF.3095447
May 26 06:59:12 etcd-1.example.com etcd[95994]: failed to apply request "header:<ID:9866105099223188490 username:\"user\" auth_revision:27 > put:<key:\"test\" value_size:4 >" with response "" took (4.311µs) to execute, err is etcdserver: no space

but there is plenty of disk space available. i also tried "defrag" with no success

here is the systemd unit file for starting the etcd cluster:

[Unit]
Description=etcd
Documentation=https://github.com/coreos/etcd

[Service]
Type=notify
Restart=always
RestartSec=5s
LimitNOFILE=40000
TimeoutStartSec=0

ExecStart=/home/etcd_current/etcd --name etcd-1.example.com \
  --data-dir /home/etcd_data \
  --listen-client-urls https://192.168.0.11:12379 \
  --advertise-client-urls https://192.168.0.12:12379 \
  --listen-peer-urls https://192.168.0.13:2380 \
  --initial-advertise-peer-urls https://192.168.0.11:2380 \
  --initial-cluster etcd-1.example.com=https://192.168.0.11:2380,etcd-2.example.com=https://192.168.0.12:2380,etcd-3.example.com=https://192.168.0.13:2380 \
  --initial-cluster-token tkn \
  --initial-cluster-state new \
  --cert-file /home/etcd_data/ssl.crt \
  --enable-pprof --metrics extensive \
  --key-file /home/etcd_data/ssl.key \
  --peer-auto-tls --log-level=debug --bcrypt-cost 4

[Install]
WantedBy=multi-user.target

here is a current metrics file:

metrics.txt

@tangcong
Copy link
Contributor

tangcong commented May 27, 2020

it seems that you do not compact key history revision. please see doc to try it.

@schlitzered
Copy link
Contributor Author

i have now restarted etcd with the following options added "--auto-compaction-retention=10 --auto-compaction-mode=revision"

but still, etcd is not recovering.

the compaction attempt by etcd yield this log message:

failed to apply request "header:<ID:1338820876570589190 > compaction:<revision:2508912 > " with response "" took (14.085µs) to execute, err is mvcc: required revision has been compacted

when trying to put something to etcd i also still get this error message:

failed to apply request "header:<ID:1338820876570589191 username:\"root\" auth_revision:27 > put:<key:\"test\" value_size:4 >" with response "" took (7.153µs) to execute, err is etcdserver: no space
Jun 04 04:29:49 etcd-1.prod.nj.dc.linux.factset.com etcd[116769]: start time = 2020-06-04 04:29:49.783073556 -0400 EDT m=+361.670802313, time spent = 2.738957ms, remote = 172.21.58.126:34100, response type = /etcdserverpb.KV/Put, request count = 1, request size = 12, response count = 0, response size = 0, request content = key:"test" value_size:4

@schlitzered
Copy link
Contributor Author

this was the peace i was missing "etcdctl alarm disarm"

@kvaps
Copy link

kvaps commented Oct 25, 2021

Hi I just wanted to link this issue with another solution which helped me in that case:
kubernetes/kops#4005 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants