-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consul allows stale reads during start-up #2644
Comments
Hi @j-pnd thanks for opening an issue. Looks like we need to make sure we are the leader and that the initial barrier has been cleared before we serve any consistent results. |
There's likely a race (related to #2644) where the catalog update might be in but the leader tracking doesn't report a leader, so this blocks forever and then times out. As a workaround we can lower the query wait time to always allow for a few retries.
@j-pnd could you run your script again to see if you are able to reproduce the stale read on 0.8.3? I've tried it with up to 30 iterations and it hasn't failed with a stale read yet. |
Hi @preetapan, Unfortunately the bug still seems to be present. Though it takes much longer for it to repro. For me I saw it after 74 iterations. Iteration 70 |
@j-pnd thanks for the confirmation. I saw it fail after 105 iterations, we will look into it. |
Fix stale reads on server startup. Consistent reads will now wait for up to config.RPCHoldTimeout for the server to get past its raft log, before returning an error. Servers that are starting up will eventually catch up. This fixes issue #2644
Fixed in #3154. |
@slackpad @preetapan this issue was fixed for "consistent" reads at https://github.com/hashicorp/consul/pull/3154/files#diff-cefe66c3774a83b3ec294243f1550944R441, which are not generally recommended (https://www.consul.io/api/features/consistency.html#consistent). It seems like it would be simple to also wait for whether the server is ready for consistent reads for the "default" mode without taking the cost of checking the peers on each such request via something like
(adding the else if !queryOpts.AllowStale logic). |
consul version
for both Client and ServerClient: (HTTP API)
Server: Tried with v0.6.0 and v0.7.0
consul info
for both Client and ServerServer:
Operating system and Environment details
Ubuntu 14.04. Client and server on same host
Description of the Issue (and unexpected/desired result)
If consul is queried during start-up, it can return stale versions of data.
Reproduction steps
Run the following bash script with consul on your path. (Warning, this script will issue pkill consul).
The test writes a value to a test key, restarts consul, then reads the value out and checks if it matches. It then increments the value and continues in a loop.
The issue is present with or without the consistent read flag, and is present with clean shutdown or with kill -9. After the test finds the stale read it will exit. If you query the test key from consul after you will see that it eventually has the correct value. It seems that reads are being serviced during log replay.
sample run:
jpound:~/src/consul-bug> ./run.sh
./run.sh: line 21: 9907 Killed $consul_cmd > "$consul_log" 2>&1
./run.sh: line 21: 10046 Killed $consul_cmd > "$consul_log" 2>&1
./run.sh: line 21: 10248 Killed $consul_cmd > "$consul_log" 2>&1
./run.sh: line 21: 10442 Killed $consul_cmd > "$consul_log" 2>&1
./run.sh: line 29: 10620 Killed $consul_cmd > "$consul_log" 2>&1
Incorrect value read, expected 4 found 1
Script:
Log Fragments
Log for last iteration (which produced the stale read)
The text was updated successfully, but these errors were encountered: