Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: use background context when applying snapshots #73484

Merged
merged 1 commit into from
Dec 7, 2021

Conversation

erikgrinaker
Copy link
Contributor

@erikgrinaker erikgrinaker commented Dec 5, 2021

Following a3fd4fb, context errors are now propagated back up the
stack when receiving snapshots. However, this triggered a
maybeFatalOnRaftReadyErr assertion which crashed the node.

handleRaftReadyRaftMuLocked (which is called directly when applying
snapshots) does not appear prepared to safely handle arbitrary context
cancellation, as it is typically called with the Raft scheduler's
background context. Furthermore, by the time we call it we have already
received the entire snapshot, so it does not seem useful to abort
snapshot application just because the caller goes away.

This patch therefore uses a new background context for applying
snapshots, disconnected from the client's context, once the entire
snapshot has been received.

Resolves #73371.
Resolves #73469.
Resolves #73482.

Release note: None

@erikgrinaker erikgrinaker requested a review from a team as a code owner December 5, 2021 13:13
@erikgrinaker erikgrinaker self-assigned this Dec 5, 2021
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this Erik!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @tbg)


pkg/kv/kvserver/store_raft.go, line 341 at r1 (raw file):

		var err error
		stats, expl, err = r.handleRaftReadyRaftMuLocked(ctx, inSnap)
		if errors.IsAny(err, context.Canceled, context.DeadlineExceeded) {

This is an interesting approach. It seems like this would work, but it also begs the question of what state the replica is left in if one of these errors is returned. Can the range recover from a partial snapshot application? Are the state transitions in handleRaftReadyRaftMuLocked all atomic and independent?

Given the location that we currently observe context cancellation within handleRaftReadyRaftMuLocked, things should be fine today. However, I don't know if it's worth exposing ourselves to future bugs that could be masked or obfuscated by ignoring a context cancellation here. For instance, if we began detecting context cancellation between appends to a replica's durable log and updates to its in-memory view of the log (e.g. r.mu.lastIndex), we could run into very subtle coherence bugs.

The alternative to this approach would be being honest about handleRaftReadyRaftMuLocked not being built with cancellation tolerance in mind. All other callers of the state machine loop use the non-cancellable raft scheduler context. We don't go through the Raft scheduler during snapshots as a convenience, but maybe we should still disconnect the context used for this call from the context of the snapshot RPC stream.

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @nvanbenschoten)


pkg/kv/kvserver/store_raft.go, line 341 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This is an interesting approach. It seems like this would work, but it also begs the question of what state the replica is left in if one of these errors is returned. Can the range recover from a partial snapshot application? Are the state transitions in handleRaftReadyRaftMuLocked all atomic and independent?

Given the location that we currently observe context cancellation within handleRaftReadyRaftMuLocked, things should be fine today. However, I don't know if it's worth exposing ourselves to future bugs that could be masked or obfuscated by ignoring a context cancellation here. For instance, if we began detecting context cancellation between appends to a replica's durable log and updates to its in-memory view of the log (e.g. r.mu.lastIndex), we could run into very subtle coherence bugs.

The alternative to this approach would be being honest about handleRaftReadyRaftMuLocked not being built with cancellation tolerance in mind. All other callers of the state machine loop use the non-cancellable raft scheduler context. We don't go through the Raft scheduler during snapshots as a convenience, but maybe we should still disconnect the context used for this call from the context of the snapshot RPC stream.

Agreed with Nathan's reservations, we need to be explicit about which context cancellations we make graceful or we make it harder to keep the system correct. It also seems more correct to detach the snapshot application's ctx from the caller; whether the caller goes away should have no say in whether the snapshot is going to be fully applied.

We may want to add a sanity check in handleRaftReadyRaftMuLocked that ctx.Done() is nil and when it isn't, return an error (which will in turn shut down the server, so this is in effect an assertion), to prevent similar bugs. (The alternative, making the ctx uncancelable, is unidiomatic, I went back and forth over this with Andrei a while back, don't remember the PR right now but can find it if there's interest).

Copy link
Contributor Author

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)


pkg/kv/kvserver/store_raft.go, line 341 at r1 (raw file):
Yeah, it's clearly better to use a background context when applying the snapshot. I somehow missed that we'd received the entire snapshot by the time we called into this, and figured we'd already have to be tolerant to clients going away -- don't know why I was thinking that, since holding onto raftMu while streaming would clearly be horrible. And as @tbg points out, once we've received the snapshot there's no point in abandoning it just because the client happens to go away. Updated the PR.

We may want to add a sanity check in handleRaftReadyRaftMuLocked that ctx.Done() is nil and when it isn't, return an error (which will in turn shut down the server, so this is in effect an assertion), to prevent similar bugs. (The alternative, making the ctx uncancelable, is unidiomatic, I went back and forth over this with Andrei a while back, don't remember the PR right now but can find it if there's interest).

Added on a commit for this. @knz has mentioned before that we may want to somehow mark contexts that are tied to clients, so that we can assert against them in code paths like these, but I can't remember where we discussed this.

@knz
Copy link
Contributor

knz commented Dec 6, 2021

we may want to somehow mark contexts that are tied to clients

can you remind me the context of the convo? there's multiple things I had in mind and I'm not sure which one you're referring to.

@erikgrinaker
Copy link
Contributor Author

we may want to somehow mark contexts that are tied to clients

can you remind me the context of the convo? there's multiple things I had in mind and I'm not sure which one you're referring to.

I think it was that we kicked off async cleanup tasks that inherited the client's context, possibly txn record cleanup (which would then never run in cases where the client disconnected in the middle of a txn).

@erikgrinaker erikgrinaker changed the title kvserver: handle context cancellation when receiving snapshots kvserver: use background context when applying snapshots Dec 6, 2021
@knz
Copy link
Contributor

knz commented Dec 6, 2021

Oh yes right, we're already assuming that client context cancellation propagates.

However, today, the client context cancellation and the context cancellation that results from e.g a network disconnect, a server shutdown or other places where we use cancellation too are not distinguishable from each other - they are currently all reflected in the context interface as a ctx.Done() closure and a context.Canceled in ctx.Err().

A way forward could be to link client context cancellation to a specific error wrapper returned by ctx.Err(). This would be infra work which I don't think we've thought through yet. Would be interesting though.

@tbg
Copy link
Member

tbg commented Dec 6, 2021

A way forward could be to link client context cancellation to a specific error wrapper returned by ctx.Err(). This would be infra work which I don't think we've thought through yet. Would be interesting though.

Haven't looked at it in a while but I think that's what I wanted in #66884

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r4, 2 of 2 files at r6, 1 of 1 files at r8, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @nvanbenschoten)


pkg/kv/kvserver/replica_raft.go, line 522 at r8 (raw file):

		// handleRaftReadyRaftMuLocked is not prepared to handle context
		// cancellation, so assert that it's never canceled.
		if err := ctx.Err(); err != nil {

I thought we would check ctx.Done() == nil, i.e. check that the context cannot be canceled, which is an always-on dynamic check. Any reason you didn't do that?

Copy link
Contributor Author

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)


pkg/kv/kvserver/replica_raft.go, line 522 at r8 (raw file):
Ah, wasn't aware of that trick -- thought you meant doing a non-blocking channel read. The docs only say that it may be nil though:

Done may return nil if this context can never be canceled.

Sure we want to rely on this?

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @nvanbenschoten)


pkg/kv/kvserver/replica_raft.go, line 522 at r8 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Ah, wasn't aware of that trick -- thought you meant doing a non-blocking channel read. The docs only say that it may be nil though:

Done may return nil if this context can never be canceled.

Sure we want to rely on this?

Hm, that's mildly awkward, but it's definitely true in practice. I think this assertion isn't super useful if it only catches the case in which the ctx is actually canceled, so given that we know exactly who the callers here are and that they don't go through too many custom contexts (grpc and our own), I would still employ the nil check.

Copy link
Contributor Author

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)


pkg/kv/kvserver/replica_raft.go, line 522 at r8 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Hm, that's mildly awkward, but it's definitely true in practice. I think this assertion isn't super useful if it only catches the case in which the ctx is actually canceled, so given that we know exactly who the callers here are and that they don't go through too many custom contexts (grpc and our own), I would still employ the nil check.

Sure, probably fine -- done.

Copy link
Contributor Author

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)


pkg/kv/kvserver/replica_raft.go, line 522 at r8 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Sure, probably fine -- done.

Turns out this blew up, because we're actually using a cancellable context for the Raft scheduler. It's rooted in the CLI context from here:

cockroach/pkg/cli/start.go

Lines 407 to 408 in c3e8d85

ctx, cancel := context.WithCancel(context.Background())
defer cancel()

I suppose we'll need to set up a new background context in raftScheduler.Start() and populate it with necessary info from the passed context? Would've been nice to have @andreimatei's contextutil.WithoutCancel() from #66387 here.

Copy link
Contributor Author

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)


pkg/kv/kvserver/replica_raft.go, line 522 at r8 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Turns out this blew up, because we're actually using a cancellable context for the Raft scheduler. It's rooted in the CLI context from here:

cockroach/pkg/cli/start.go

Lines 407 to 408 in c3e8d85

ctx, cancel := context.WithCancel(context.Background())
defer cancel()

I suppose we'll need to set up a new background context in raftScheduler.Start() and populate it with necessary info from the passed context? Would've been nice to have @andreimatei's contextutil.WithoutCancel() from #66387 here.

In the interest of getting the tests fixed, I suggest we split off this assertion and the context handling fallout into a separate PR and deal with it there. If I can get a ✅ on the first commit I'll move things around and get this merged.

@tbg
Copy link
Member

tbg commented Dec 7, 2021

LGTM for first commit.

Following a3fd4fb, context errors are now propagated back up the
stack when receiving snapshots. However, this triggered a
`maybeFatalOnRaftReadyErr` assertion which crashed the node.

`handleRaftReadyRaftMuLocked` (which is called directly when applying
snapshots) does not appear prepared to safely handle arbitrary context
cancellation, as it is typically called with the Raft scheduler's
background context. Furthermore, by the time we call it we have already
received the entire snapshot, so it does not seem useful to abort
snapshot application just because the caller goes away.

This patch therefore uses a new background context for applying
snapshots, disconnected from the client's context, once the entire
snapshot has been received.

Release note: None
Copy link
Contributor Author

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)


pkg/kv/kvserver/replica_raft.go, line 522 at r8 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

In the interest of getting the tests fixed, I suggest we split off this assertion and the context handling fallout into a separate PR and deal with it there. If I can get a ✅ on the first commit I'll move things around and get this merged.

Split out to #73554.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 2 files at r4, 1 of 2 files at r6, 1 of 1 files at r10, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)

@erikgrinaker
Copy link
Contributor Author

bors r=nvanbenschoten,tbg

@craig
Copy link
Contributor

craig bot commented Dec 7, 2021

Build succeeded:

@craig craig bot merged commit 67662d8 into cockroachdb:master Dec 7, 2021
@erikgrinaker erikgrinaker deleted the raft-snapshot-context-error branch December 8, 2021 09:26
craig bot pushed a commit that referenced this pull request Dec 9, 2021
73554: kvserver: use and assert non-cancellable Raft scheduler context r=tbg a=erikgrinaker

`handleRaftReadyRaftMuLocked` is not prepared to handle context
cancellation. It is typically called via the Raft scheduler, which uses
a background context, but can be called via other paths as well (e.g.
snapshot application).

This patch adds an assertion that the given context is not cancellable,
and creates a new background context for the main scheduler code path
instead of using the CLI's cancellable context.

Release note: None

---

Split off from #73484, see previous discussion there.

Turns out that this fails because the Raft scheduler context is in fact cancellable. It's rooted at the CLI context:

https://github.com/cockroachdb/cockroach/blob/c3e8d8568467809c50a8eb8911fd120fe22661bb/pkg/cli/start.go#L407-L408

There's a few different options here, including:

1. Fixing `handleRaftReady` to to handle context cancellation safely.
2. Using a new background context for the Raft scheduler and populating it with necessary data from the passed context.
3. Getting @andreimatei's `contextutil.WithoutCancel()` from #66387 merged, and use it.

Co-authored-by: Erik Grinaker <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants