Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fio job file to test disk performance. #10577

Closed
matte21 opened this issue Mar 22, 2019 · 13 comments
Closed

Add fio job file to test disk performance. #10577

matte21 opened this issue Mar 22, 2019 · 13 comments

Comments

@matte21
Copy link

matte21 commented Mar 22, 2019

Disk performance is paramount to Etcd. https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/hardware.md suggests measuring it with fio. But disk I/O can happen in a lot of different ways and fio is complex to use. For a user who is not experienced with Etcd disk I/O and/or fio, but needs to asses whether its storage lives up to the requirements Etcd has, writing a meaningful fio job file which does I/O in the same way Etcd does is hard.

I think having such a file or at least some guidelines on how to write such a file would be extremely beneficial for the users. There are different disk metrics which are crucial to Etcd (WAL f(data)sync duration, backend commit time). Maybe one file for each metric is needed? Maybe the cli parameters in #10414 (comment) are good candidates? @hexfusion what do you think? In the comment you wrote you wanted to add something similar to the repo.

@hexfusion
Copy link
Contributor

@matte21 are you asking for an example of fio usage? Here is an incantation I have used in the past I will add it to the docs unless you would like to or if you have a better version feel free to improve mine.

fio --randrepeat=1 \
  --ioengine=libaio \
  --direct=1 \
   --gtod_reduce=1 \
   --name=etcd-disk-io-test \
   --filename=etcd_read_write.io \
   --bs=4k --iodepth=64 --size=4G \
   --readwrite=randrw --rwmixread=75

Does this answer your question?

@hexfusion
Copy link
Contributor

In general I agree I should of done it a while ago, thanks for the reminder.

@matte21
Copy link
Author

matte21 commented Mar 22, 2019

@hexfusion I am measuring Etcd performance (we're using SSDs) and seeing that both backend commit and WAL f(data)sync duration are above recommended thresholds reported at https://github.com/etcd-io/etcd/blob/master/Documentation/faq.md#what-does-the-etcd-warning-apply-entries-took-too-long-mean and https://github.com/etcd-io/etcd/blob/master/Documentation/faq.md#what-does-the-etcd-warning-failed-to-send-out-heartbeat-on-time-mean, and was looking for a fio job file to benchmark those. But the point of the issue was abstracting from my personal use case and have some fio job files added to the docs. I would have done it myself if I was able to, unfortunately I am very inexperienced with fio, disk I/O and Etcd.

@hexfusion
Copy link
Contributor

No problem at all I will add this now

@MikeSpreitzer
Copy link

@hexfusion : we wrote up something like what we think is needed. See https://www.ibm.com/blogs/bluemix/2019/04/using-fio-to-tell-whether-your-storage-is-fast-enough-for-etcd/

@hexfusion
Copy link
Contributor

@MikeSpreitzer thanks for doing this I am excited to read it over the weekend, l will think on where to best link this from the docs but if you have a vision please open PR and we can add.

@MikeSpreitzer
Copy link

Any news here?

@matte21
Copy link
Author

matte21 commented Apr 27, 2019

I opened a PR: #10685

@matte21 matte21 closed this as completed May 3, 2019
@cgwalters
Copy link

@hexfusion : we wrote up something like what we think is needed. See https://www.ibm.com/blogs/bluemix/2019/04/using-fio-to-tell-whether-your-storage-is-fast-enough-for-etcd/

This link seems to be broken now.

@MikeSpreitzer
Copy link

See https://www.ibm.com/cloud/blog/using-fio-to-tell-whether-your-storage-is-fast-enough-for-etcd

@cgwalters
Copy link

Thanks. So...there's one huge discrepancy between #10577 (comment) and that blog entry, which is --direct=1 in the former and not the latter. Does etcd really use O_DIRECT? It doesn't look like it to me. Using O_DIRECT (or not) has a lot of implications.

@matte21
Copy link
Author

matte21 commented Jul 10, 2020

Does etcd really use O_DIRECT?

I don't remember for sure.
But the fio parameters in the blog entry produce a disk I/O which is much more similar to etcd's than the fio parameters in #10577 (comment) (at least that was the case when we wrote the blog post).
The fio parameters in the blog post were derived by comparing the system calls traces of fio and etcd and by trying to make them as similar as possible in the parts that affect disk I/O. I clearly remember that using #10577 (comment) the system calls trace portion describing disk I/O was significantly different than etcd's.
So I'd say if --direct=1 is missing from the blog entry you should not use it (re-added warning: this was true some time ago).

@ml0renz0
Copy link

See https://www.ibm.com/cloud/blog/using-fio-to-tell-whether-your-storage-is-fast-enough-for-etcd

This link is also broken, I'll leave here an archive.org link just in case someone comes looking for it as me: https://web.archive.org/web/20210527090640/https://www.ibm.com/cloud/blog/using-fio-to-tell-whether-your-storage-is-fast-enough-for-etcd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants