Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul allows stale reads during start-up #2644

Closed
j-pnd opened this issue Jan 10, 2017 · 6 comments
Closed

Consul allows stale reads during start-up #2644

j-pnd opened this issue Jan 10, 2017 · 6 comments
Labels
theme/internal-cleanup Used to identify tech debt, testing improvements, code refactoring, and non-impactful optimization type/bug Feature does not function as expected

Comments

@j-pnd
Copy link

j-pnd commented Jan 10, 2017

consul version for both Client and Server

Client: (HTTP API)
Server: Tried with v0.6.0 and v0.7.0

consul info for both Client and Server

Server:

agent:
	check_monitors = 0
	check_ttls = 0
	checks = 0
	services = 1
build:
	prerelease = 
	revision = 46499d6e
	version = 0.6.0
consul:
	bootstrap = true
	known_datacenters = 1
	leader = true
	server = true
raft:
	applied_index = 61
	commit_index = 61
	fsm_pending = 0
	last_contact = never
	last_log_index = 61
	last_log_term = 9
	last_snapshot_index = 0
	last_snapshot_term = 0
	num_peers = 0
	state = Leader
	term = 9
runtime:
	arch = amd64
	cpu_count = 4
	goroutines = 61
	max_procs = 4
	os = linux
	version = go1.5.1
serf_lan:
	encrypted = false
	event_queue = 1
	event_time = 10
	failed = 0
	intent_queue = 0
	left = 0
	member_time = 1
	members = 1
	query_queue = 0
	query_time = 1
serf_wan:
	encrypted = false
	event_queue = 0
	event_time = 1
	failed = 0
	intent_queue = 0
	left = 0
	member_time = 1
	members = 1
	query_queue = 0
	query_time = 1

Operating system and Environment details

Ubuntu 14.04. Client and server on same host

Description of the Issue (and unexpected/desired result)

If consul is queried during start-up, it can return stale versions of data.

Reproduction steps

Run the following bash script with consul on your path. (Warning, this script will issue pkill consul).
The test writes a value to a test key, restarts consul, then reads the value out and checks if it matches. It then increments the value and continues in a loop.

The issue is present with or without the consistent read flag, and is present with clean shutdown or with kill -9. After the test finds the stale read it will exit. If you query the test key from consul after you will see that it eventually has the correct value. It seems that reads are being serviced during log replay.

sample run:
jpound:~/src/consul-bug> ./run.sh
./run.sh: line 21: 9907 Killed $consul_cmd > "$consul_log" 2>&1
./run.sh: line 21: 10046 Killed $consul_cmd > "$consul_log" 2>&1
./run.sh: line 21: 10248 Killed $consul_cmd > "$consul_log" 2>&1
./run.sh: line 21: 10442 Killed $consul_cmd > "$consul_log" 2>&1
./run.sh: line 29: 10620 Killed $consul_cmd > "$consul_log" 2>&1
Incorrect value read, expected 4 found 1

Script:

#!/bin/bash

consul_dir="consul.foo"
consul_cmd="consul agent -data-dir consul.foo -client 127.0.0.1 -bind 127.0.0.1 -server -bootstrap-expect 1"
consul_log="consul.out"

# ensure consul isn't running
pkill consul

# clean persistence and start consul
rm -rf "$consul_dir"
$consul_cmd  > "$consul_log"  2>&1 &
pid=$!
sleep 2

# Run test:
# Write a value to a test key, then read it out and check if it matches. Restart
# consul in between the write and read.
i="0"
while true; do
    # Write the value of 'i' to the test key
    ret="$(curl -s -X PUT -d "$i" http://127.0.0.1:8500/v1/kv/test)"
    if [ "$ret" != "true" ]; then
        echo "Failed to write key, try increasing the sleep time after the initial consul start ($ret)"
        exit 1
    fi

    # Restart consul
    kill -9 $pid
    wait
    $consul_cmd  > "$consul_log"  2>&1 &
    pid=$!

    # Read the value of 'i' from the test key, wait for consul to return a valid response
    val=""
    while [ "$val" == "No cluster leader" ] || [ "$val" == "" ]; do
        val="$(curl -s -X GET 'http://127.0.0.1:8500/v1/kv/test?raw&consistent')"
    done

    # ensure read value matches what we wrote
    if (( val != i )); then
        echo "Incorrect value read, expected $i found $val"
        exit -1
    fi

    # increment 'i'
    i=$[$i+1]
    if (( i % 10 == 0 )); then
        echo "Iteration $i"
    fi
done

Log Fragments

Log for last iteration (which produced the stale read)

==> WARNING: BootstrapExpect Mode is specified as 1; this is the same as Bootstrap mode.
==> WARNING: Bootstrap mode enabled! Do not enable unless necessary
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
         Node name: 'jpound-vb'
        Datacenter: 'dc1'
            Server: true (bootstrap: true)
       Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
      Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

==> Log data will now stream in as it occurs:

    2017/01/10 08:59:34 [INFO] serf: EventMemberJoin: jpound-vb 127.0.0.1
    2017/01/10 08:59:34 [INFO] raft: Node at 127.0.0.1:8300 [Follower] entering Follower state
    2017/01/10 08:59:34 [WARN] serf: Failed to re-join any previously known node
    2017/01/10 08:59:34 [INFO] serf: EventMemberJoin: jpound-vb.dc1 127.0.0.1
    2017/01/10 08:59:34 [WARN] serf: Failed to re-join any previously known node
    2017/01/10 08:59:34 [INFO] consul: adding WAN server jpound-vb.dc1 (Addr: 127.0.0.1:8300) (DC: dc1)
    2017/01/10 08:59:34 [ERR] agent: failed to sync remote state: No cluster leader
    2017/01/10 08:59:34 [INFO] consul: adding LAN server jpound-vb (Addr: 127.0.0.1:8300) (DC: dc1)
    2017/01/10 08:59:34 [ERR] http: Request GET /v1/kv/test?raw&consistent, error: No cluster leader

   (... above message repeated as client loops ...)

   2017/01/10 08:59:35 [WARN] raft: Heartbeat timeout reached, starting election
    2017/01/10 08:59:35 [INFO] raft: Node at 127.0.0.1:8300 [Candidate] entering Candidate state
    2017/01/10 08:59:35 [INFO] raft: Election won. Tally: 1
    2017/01/10 08:59:35 [INFO] raft: Node at 127.0.0.1:8300 [Leader] entering Leader state
    2017/01/10 08:59:35 [INFO] consul: cluster leadership acquired
    2017/01/10 08:59:35 [INFO] consul: New leader elected: jpound-vb
    2017/01/10 08:59:35 [INFO] raft: Disabling EnableSingleNode (bootstrap)

@slackpad
Copy link
Contributor

Hi @j-pnd thanks for opening an issue. Looks like we need to make sure we are the leader and that the initial barrier has been cleared before we serve any consistent results.

@slackpad slackpad added the type/bug Feature does not function as expected label Jan 10, 2017
@slackpad slackpad added this to the 0.7.4 milestone Jan 10, 2017
slackpad added a commit that referenced this issue Feb 6, 2017
There's likely a race (related to #2644) where the catalog update might be in but the leader tracking doesn't report a leader, so this blocks forever and then times out. As a workaround we can lower the query wait time to always allow for a few retries.
@slackpad slackpad modified the milestones: 0.8.0, Triaged Feb 8, 2017
@slackpad slackpad modified the milestones: 0.8.0, 0.8.1 Mar 30, 2017
@slackpad slackpad modified the milestones: 0.8.1, 0.8.2 Apr 12, 2017
@slackpad slackpad removed this from the 0.8.2 milestone Apr 25, 2017
@slackpad slackpad added the theme/internal-cleanup Used to identify tech debt, testing improvements, code refactoring, and non-impactful optimization label May 25, 2017
@preetapan
Copy link
Contributor

@j-pnd could you run your script again to see if you are able to reproduce the stale read on 0.8.3? I've tried it with up to 30 iterations and it hasn't failed with a stale read yet.

@j-pnd
Copy link
Author

j-pnd commented Jun 9, 2017

Hi @preetapan, Unfortunately the bug still seems to be present. Though it takes much longer for it to repro. For me I saw it after 74 iterations.

Iteration 70
./run.sh: line 30: 5324 Killed $consul_cmd > "$consul_log" 2>&1
./run.sh: line 30: 5346 Killed $consul_cmd > "$consul_log" 2>&1
./run.sh: line 30: 5381 Killed $consul_cmd > "$consul_log" 2>&1
./run.sh: line 30: 5400 Killed $consul_cmd > "$consul_log" 2>&1
Incorrect value read, expected 73 found 55

@preetapan
Copy link
Contributor

@j-pnd thanks for the confirmation. I saw it fail after 105 iterations, we will look into it.

preetapan pushed a commit that referenced this issue Jun 21, 2017
Fix stale reads on server startup. Consistent reads will now wait for up to config.RPCHoldTimeout for the server to get past its raft log, before returning an error. Servers that are starting up will eventually catch up. 
This fixes issue #2644
@slackpad
Copy link
Contributor

Fixed in #3154.

@adias408
Copy link

adias408 commented Mar 22, 2020

@slackpad @preetapan this issue was fixed for "consistent" reads at https://github.com/hashicorp/consul/pull/3154/files#diff-cefe66c3774a83b3ec294243f1550944R441, which are not generally recommended (https://www.consul.io/api/features/consistency.html#consistent). It seems like it would be simple to also wait for whether the server is ready for consistent reads for the "default" mode without taking the cost of checking the peers on each such request via something like

        if queryOpts.RequireConsistent {
                if err := s.consistentRead(); err != nil {
                        return err
                }
        } else if !queryOpts.AllowStale {
                if !s.isReadyForConsistentReads() {
                        return structs.ErrNotReadyForConsistentReads
                }
        }

(adding the else if !queryOpts.AllowStale logic).
Is there any downside to doing something like this? We ran into this kind of situation when doing queries as a system is restarting after stopping all nodes. We wouldn't want to do consistent queries in general because of the cost but would want to avoid getting stale values due to the raft.db log not yet having been fully processed on startup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/internal-cleanup Used to identify tech debt, testing improvements, code refactoring, and non-impactful optimization type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

4 participants