Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why it took about 8s to time out after publishing fail? Is there any way to shorten the time? #8977

Closed
langkeer opened this issue Dec 5, 2017 · 10 comments

Comments

@langkeer
Copy link

langkeer commented Dec 5, 2017

we often meet publishing failure first time, and from the etcd logs, it seems that etcd won't publish again until the timer (duration is 8s) expires. so it will delay the ectd service about 8s. so we want to know:

  1. why don't etcd publish again just after the previous publishing fails?
  2. is there any way to shorten the duration of the timer?
    thanks!

attach part of the etcd logs:
Nov 10 15:06:15 etcd[1627]: peer 75af8d23d13d9b91 became active etcd[1627]: peer 75af8d23d13d9b91 became active
Nov 10 15:06:15 etcd[1627]: established a TCP streaming connection with peer 75af8d23d13d9b91 (stream Message reader) etcd[1627]: established a TCP streaming connection with peer 75af8d23d13d9b91 (stream Message reader)
Nov 10 15:06:15 etcd[1627]: established a TCP streaming connection with peer 75af8d23d13d9b91 (stream MsgApp v2 reader) etcd[1627]: established a TCP streaming connection with peer 75af8d23d13d9b91 (stream MsgApp v2 reader)
Nov 10 15:06:15 etcd[1627]: established a TCP streaming connection with peer 75af8d23d13d9b91 (stream Message writer) etcd[1627]: established a TCP streaming connection with peer 75af8d23d13d9b91 (stream Message writer)
Nov 10 15:06:15 etcd[1627]: established a TCP streaming connection with peer 75af8d23d13d9b91 (stream MsgApp v2 writer) etcd[1627]: established a TCP streaming connection with peer 75af8d23d13d9b91 (stream MsgApp v2 writer)
Nov 10 15:06:15 etcd[1627]: established a TCP streaming connection with peer 5e07cc8337c0395c (stream MsgApp v2 writer) etcd[1627]: established a TCP streaming connection with peer 5e07cc8337c0395c (stream MsgApp v2 writer)
Nov 10 15:06:15 etcd[1627]: established a TCP streaming connection with peer 5e07cc8337c0395c (stream Message writer) etcd[1627]: established a TCP streaming connection with peer 5e07cc8337c0395c (stream Message writer)
Nov 10 15:06:23 etcd[1627]: publish error: etcdserver: request timed out, possibly due to previous leader failure etcd[1627]: publish error: etcdserver: request timed out, possibly due to previous leader failure
Nov 10 15:06:23 etcd[1627]: ready to serve client requests etcd[1627]: published {Name:mn-0 ClientURLs:[http://192.168.1.6:2379]} to cluster d89ecb0dff10fcd4
Nov 10 15:06:23 etcd[1627]: ready to serve client requests
Nov 10 15:06:23 etcd[1627]: published {Name:mn-0 ClientURLs:[http://192.168.1.6:2379]} to cluster d89ecb0dff10fcd4

@langkeer
Copy link
Author

langkeer commented Dec 5, 2017

etcd logs of publish error.txt
attach the log file. thanks!

@langkeer
Copy link
Author

langkeer commented Dec 6, 2017

anyone can help to answer it?

@absolute8511
Copy link
Contributor

may be caused by leader changed, same as #8975

@langkeer
Copy link
Author

langkeer commented Dec 7, 2017

OK. so you will fail fast by fixing code? is any other way to shorten the duration of the timer?

@langkeer
Copy link
Author

anyone can help to answer my questions? thanks!

@hexfusion
Copy link
Contributor

anyone can help to answer my questions? thanks!

@langkeer your logs note: Nov 10 15:06:15 etcd[1627]: etcd Version: 3.1.4 have you tested most recent version ie 3.1.11?

Can you consolidate the ENV vars and flags you are passing for the cluster restart I see in the logs? Thanks!

@langkeer
Copy link
Author

langkeer commented Apr 9, 2018

Need-More-Investigation? do you need more investigation on the problem? what can we do to support for you? more, we don't think that #9067 and #9137 can solve it, because our problem happen during etcd start up with no any local database (we deploy it in ramdisk, everything shall loss after node restart) after node restart.

@langkeer
Copy link
Author

langkeer commented Apr 9, 2018

and our problem is that it takes about 8s to time out and publish again after publishing fail. after publishing OK, etcd then become ready in systemd. now we just want to shorten the timeout.

@langkeer
Copy link
Author

@gyuho
any comments?

@gyuho
Copy link
Contributor

gyuho commented May 2, 2018

@langkeer

Sorry for delay.

etcd won't publish again until the timer (duration is 8s) expires

Easiest way is just restart (can be easily done with systemd service file).

Since publish only happens at the beginning, we won't adjust the timeouts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants