Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check perf is increasing the db size drastically #9326

Closed
raoofm opened this issue Feb 15, 2018 · 17 comments
Closed

check perf is increasing the db size drastically #9326

raoofm opened this issue Feb 15, 2018 · 17 comments

Comments

@raoofm
Copy link
Contributor

raoofm commented Feb 15, 2018

before perf check

$ etcdctl --endpoints=https://node01.qa.etcd.net:2379,https://node02.qa.etcd.net:2379,https://node03.qa.etcd.net:2379,https://node04.qa.etcd.net:2379,https://node05.qa.etcd.net:2379 --user=root endpoint status -w table
Failed to get the status of endpoint https://node03.qa.etcd.net:2379 (context deadline exceeded)
+------------------------------------------------+------------------+---------+---------+-----------+-----------+------------+
|                    ENDPOINT                    |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+------------------------------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://node01.qa.etcd.net:2379 | 318e4ede0816b8ed |  3.1.10 |  1.2 MB |     false |      5478 |   99151302 |
| https://node02.qa.etcd.net:2379 | de98d7a88277647a |  3.1.10 |  1.2 MB |      true |      5478 |   99151303 |
| https://node04.qa.etcd.net:2379 | eaf02d9d1857d9d6 |  3.1.10 |  1.2 MB |     false |      5478 |   99151305 |
| https://node05.qa.etcd.net:2379 | 91eb8d5ebe44b736 |  3.1.10 |  1.2 MB |     false |      5478 |   99151306 |
+------------------------------------------------+------------------+---------+---------+-----------+-----------+------------+

1st perf check

$ etcdctl --endpoints=https://node01.qa.etcd.net:2379,https://node02.qa.etcd.net:2379,https://node03.qa.etcd.net:2379,https://node04.qa.etcd.net:2379,https://node05.qa.etcd.net:2379 --user=root check perf
 60 / 60 Booooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00%1m0s
PASS: Throughput is 150 writes/s
Slowest request took too long: 0.509933s
PASS: Stddev is 0.032255s
FAIL

after 1st perf check

$ etcdctl --endpoints=https://node01.qa.etcd.net:2379,https://node02.qa.etcd.net:2379,https://node03.qa.etcd.net:2379,https://node04.qa.etcd.net:2379,https://node05.qa.etcd.net:2379 --user=root endpoint status -w table
Failed to get the status of endpoint https://node03.qa.etcd.net:2379 (context deadline exceeded)
+------------------------------------------------+------------------+---------+---------+-----------+-----------+------------+
|                    ENDPOINT                    |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+------------------------------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://node01.qa.etcd.net:2379 | 318e4ede0816b8ed |  3.1.10 |   22 MB |     false |      5478 |   99160359 |
| https://node02.qa.etcd.net:2379 | de98d7a88277647a |  3.1.10 |   22 MB |      true |      5478 |   99160360 |
| https://node04.qa.etcd.net:2379 | eaf02d9d1857d9d6 |  3.1.10 |   22 MB |     false |      5478 |   99160362 |
| https://node05.qa.etcd.net:2379 | 91eb8d5ebe44b736 |  3.1.10 |   22 MB |     false |      5478 |   99160363 |
+------------------------------------------------+------------------+---------+---------+-----------+-----------+------------+

2nd perf check

$ etcdctl --endpoints=https://node01.qa.etcd.net:2379,https://node02.qa.etcd.net:2379,https://node03.qa.etcd.net:2379,https://node04.qa.etcd.net:2379,https://node05.qa.etcd.net:2379 --user=root check perf
 60 / 60 Booooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00%1m0s
PASS: Throughput is 150 writes/s
PASS: Slowest request took 0.301787s
PASS: Stddev is 0.021030s
PASS

after 2nd perf check

$ etcdctl --endpoints=https://node01.qa.etcd.net:2379,https://node02.qa.etcd.net:2379,https://node03.qa.etcd.net:2379,https://node04.qa.etcd.net:2379,https://node05.qa.etcd.net:2379 --user=root endpoint status -w table
Failed to get the status of endpoint https://node03.qa.etcd.net:2379 (context deadline exceeded)
+------------------------------------------------+------------------+---------+---------+-----------+-----------+------------+
|                    ENDPOINT                    |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+------------------------------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://node01.qa.etcd.net:2379 | 318e4ede0816b8ed |  3.1.10 |   44 MB |     false |      5478 |   99169452 |
| https://node02.qa.etcd.net:2379 | de98d7a88277647a |  3.1.10 |   44 MB |      true |      5478 |   99169453 |
| https://node04.qa.etcd.net:2379 | eaf02d9d1857d9d6 |  3.1.10 |   44 MB |     false |      5478 |   99169455 |
| https://node05.qa.etcd.net:2379 | 91eb8d5ebe44b736 |  3.1.10 |   44 MB |     false |      5478 |   99169456 |
+------------------------------------------------+------------------+---------+---------+-----------+-----------+------------+

@xiang90 Please help me understand, is it something I'm doing wrong. I ran defrag but as I expected this didn't fix this. Should I run compaction. Not really sure, why the db should increase by 22mb.

How to reduce it back?

@raoofm
Copy link
Contributor Author

raoofm commented Feb 15, 2018

You can just ignore node03 as the machine is taken off and we are replacing it.

@gyuho
Copy link
Contributor

gyuho commented Feb 15, 2018

Probably due to freelist sync in bolt db #8009?

@raoofm
Copy link
Contributor Author

raoofm commented Feb 15, 2018

@gyuho thanks, I looked at that earlier but couldn't relate to what I'm doing to what was described in those related issues.

What is the suggested workaround to bring the db size back to normal?

@gyuho
Copy link
Contributor

gyuho commented Feb 15, 2018

All etcds before 3.3 sync freelists on disk. We disabled freelist sync in 3.3. Upgrading to 3.3 should resolve this issue. Could you try the same test with 3.3? Thanks!

@raoofm
Copy link
Contributor Author

raoofm commented Feb 15, 2018

Doesn't seem like they are related, but I'll give it a try

@raoofm
Copy link
Contributor Author

raoofm commented Feb 15, 2018

@xiang90 @gyuho I can replicate with 3.3.1 too

$ etcdctl endpoint status
127.0.0.1:2379, 8e9e05c52164694d, 3.3.1, 25 kB, true, 4, 8
$ etcdctl check perf
 60 / 60 Booooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00%1m0s
PASS: Throughput is 150 writes/s
PASS: Slowest request took 0.024217s
PASS: Stddev is 0.003386s
PASS
$ etcdctl endpoint status
127.0.0.1:2379, 8e9e05c52164694d, 3.3.1, 22 MB, true, 4, 9009

@xiang90
Copy link
Contributor

xiang90 commented Feb 15, 2018

it should not be related. defrag should be able to reduce the size if the keys are all deleted and compacted after the test.

@raoofm
Copy link
Contributor Author

raoofm commented Feb 15, 2018

@xiang90 thanks. Is the user expected to delete the keys manually? Shouldn't the test do it at the end. What is the path to be deleted from?

@raoofm
Copy link
Contributor Author

raoofm commented Feb 15, 2018

@xiang90 I see from the code that /etcdctl-check-perf/ is the prefix. I'll update.

@raoofm
Copy link
Contributor Author

raoofm commented Feb 15, 2018

@xiang90 @gyuho not sure where I'm going wrong
$ etcdctl del --prefix /etcdctl-check-perf/
0
$ etcdctl del --prefix etcdctl-check-perf
0

@gyuho
Copy link
Contributor

gyuho commented Feb 15, 2018

Is the user expected to delete the keys manually?

@raoofm We delete all written keys when check perf command exits.

@raoofm
Copy link
Contributor Author

raoofm commented Feb 15, 2018

then why is the db size not coming down. It remains at 44mb instead of 25kb

@gyuho
Copy link
Contributor

gyuho commented Feb 15, 2018

@raoofm Oh, just found out we do not compact... Will fix. Thanks for reporting.

@raoofm
Copy link
Contributor Author

raoofm commented Feb 15, 2018

@gyuho thanks.
I compacted on my local and the size came back to normal. A quick question though, how do you determine the compaction revision. What if I forget the last revision I used for compaction as it is either throwing etcdserver: mvcc: required revision has been compacted or etcdserver: mvcc: required revision is a future revision

Is there a recommended way to get the last compacted revision and the current revision? Is there a document to read to automate compaction and set the revision?

@raoofm
Copy link
Contributor Author

raoofm commented Feb 15, 2018

How do we usually choose compaction revision?

@raoofm
Copy link
Contributor Author

raoofm commented Feb 16, 2018

@gyuho gentle ping

@gyuho
Copy link
Contributor

gyuho commented Feb 16, 2018

@raoofm As #9330 does, you can get latest revision from API response header. #9330 compacts all revisions after deleting all keys prefixed by check-perf.

@gyuho gyuho closed this as completed Feb 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants