why it took about 8s to time out after publishing fail? Is there any way to shorten the time? #8977

langkeer · 2017-12-05T10:02:48Z

we often meet publishing failure first time, and from the etcd logs, it seems that etcd won't publish again until the timer (duration is 8s) expires. so it will delay the ectd service about 8s. so we want to know:

why don't etcd publish again just after the previous publishing fails?
is there any way to shorten the duration of the timer?
thanks!

attach part of the etcd logs:
Nov 10 15:06:15 etcd[1627]: peer 75af8d23d13d9b91 became active etcd[1627]: peer 75af8d23d13d9b91 became active
Nov 10 15:06:15 etcd[1627]: established a TCP streaming connection with peer 75af8d23d13d9b91 (stream Message reader) etcd[1627]: established a TCP streaming connection with peer 75af8d23d13d9b91 (stream Message reader)
Nov 10 15:06:15 etcd[1627]: established a TCP streaming connection with peer 75af8d23d13d9b91 (stream MsgApp v2 reader) etcd[1627]: established a TCP streaming connection with peer 75af8d23d13d9b91 (stream MsgApp v2 reader)
Nov 10 15:06:15 etcd[1627]: established a TCP streaming connection with peer 75af8d23d13d9b91 (stream Message writer) etcd[1627]: established a TCP streaming connection with peer 75af8d23d13d9b91 (stream Message writer)
Nov 10 15:06:15 etcd[1627]: established a TCP streaming connection with peer 75af8d23d13d9b91 (stream MsgApp v2 writer) etcd[1627]: established a TCP streaming connection with peer 75af8d23d13d9b91 (stream MsgApp v2 writer)
Nov 10 15:06:15 etcd[1627]: established a TCP streaming connection with peer 5e07cc8337c0395c (stream MsgApp v2 writer) etcd[1627]: established a TCP streaming connection with peer 5e07cc8337c0395c (stream MsgApp v2 writer)
Nov 10 15:06:15 etcd[1627]: established a TCP streaming connection with peer 5e07cc8337c0395c (stream Message writer) etcd[1627]: established a TCP streaming connection with peer 5e07cc8337c0395c (stream Message writer)
Nov 10 15:06:23 etcd[1627]: publish error: etcdserver: request timed out, possibly due to previous leader failure etcd[1627]: publish error: etcdserver: request timed out, possibly due to previous leader failure
Nov 10 15:06:23 etcd[1627]: ready to serve client requests etcd[1627]: published {Name:mn-0 ClientURLs:[http://192.168.1.6:2379]} to cluster d89ecb0dff10fcd4
Nov 10 15:06:23 etcd[1627]: ready to serve client requests
Nov 10 15:06:23 etcd[1627]: published {Name:mn-0 ClientURLs:[http://192.168.1.6:2379]} to cluster d89ecb0dff10fcd4

langkeer · 2017-12-05T10:07:41Z

etcd logs of publish error.txt
attach the log file. thanks!

langkeer · 2017-12-06T03:11:43Z

anyone can help to answer it?

absolute8511 · 2017-12-06T04:41:42Z

may be caused by leader changed, same as #8975

langkeer · 2017-12-07T01:53:42Z

OK. so you will fail fast by fixing code? is any other way to shorten the duration of the timer?

langkeer · 2017-12-13T05:43:08Z

anyone can help to answer my questions? thanks!

hexfusion · 2017-12-13T10:39:48Z

anyone can help to answer my questions? thanks!

@langkeer your logs note: Nov 10 15:06:15 etcd[1627]: etcd Version: 3.1.4 have you tested most recent version ie 3.1.11?

Can you consolidate the ENV vars and flags you are passing for the cluster restart I see in the logs? Thanks!

langkeer · 2018-04-09T07:55:15Z

Need-More-Investigation? do you need more investigation on the problem? what can we do to support for you? more, we don't think that #9067 and #9137 can solve it, because our problem happen during etcd start up with no any local database (we deploy it in ramdisk, everything shall loss after node restart) after node restart.

langkeer · 2018-04-09T07:57:51Z

and our problem is that it takes about 8s to time out and publish again after publishing fail. after publishing OK, etcd then become ready in systemd. now we just want to shorten the timeout.

langkeer · 2018-04-13T01:50:16Z

@gyuho
any comments?

gyuho · 2018-05-02T22:19:28Z

@langkeer

Sorry for delay.

etcd won't publish again until the timer (duration is 8s) expires

Easiest way is just restart (can be easily done with systemd service file).

Since publish only happens at the beginning, we won't adjust the timeouts.

absolute8511 mentioned this issue Dec 25, 2017

raft: let raft step return error when proposal is dropped to allow fail-fast #9067

Merged

absolute8511 mentioned this issue Jan 12, 2018

raft: Propose in raft node wait the proposal result so we can fail fast while dropping proposal #9137

Merged

gyuho added the stage/investigating label Feb 25, 2018

gyuho closed this as completed May 2, 2018

gyuho added type/question and removed stage/investigating labels May 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why it took about 8s to time out after publishing fail? Is there any way to shorten the time? #8977

why it took about 8s to time out after publishing fail? Is there any way to shorten the time? #8977

langkeer commented Dec 5, 2017

langkeer commented Dec 5, 2017

langkeer commented Dec 6, 2017

absolute8511 commented Dec 6, 2017

langkeer commented Dec 7, 2017

langkeer commented Dec 13, 2017

hexfusion commented Dec 13, 2017

langkeer commented Apr 9, 2018

langkeer commented Apr 9, 2018

langkeer commented Apr 13, 2018

gyuho commented May 2, 2018

why it took about 8s to time out after publishing fail? Is there any way to shorten the time? #8977

why it took about 8s to time out after publishing fail? Is there any way to shorten the time? #8977

Comments

langkeer commented Dec 5, 2017

langkeer commented Dec 5, 2017

langkeer commented Dec 6, 2017

absolute8511 commented Dec 6, 2017

langkeer commented Dec 7, 2017

langkeer commented Dec 13, 2017

hexfusion commented Dec 13, 2017

langkeer commented Apr 9, 2018

langkeer commented Apr 9, 2018

langkeer commented Apr 13, 2018

gyuho commented May 2, 2018