-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc: pgwire-compatible query cancellation #75870
Conversation
0a76f92
to
6df50d1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @petermattis, and @rafiss)
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 16 at r1 (raw file):
## Summary The Postgres query cancel protocol is hard to implement since it only uses 64
I think the summary needs to start with a sentence that says what the query cancel protocol is and why it's important.
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 19 at r1 (raw file):
bits of data as an identifier, and is sent over a separate (unauthenticated) connection, different from the SQL connection. For dedicated clusters, these 64 bits of data need to identify a node and query to cancel. We can use 32 bits to
node and session. The query is implicit (current query).
Ben says we should think about the query, but I disagree. This is explained here:
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 91 at r1 (raw file):
#### SQL Node Changes The connExecutor will be updated to generate a random 32-bit integer (secretID)
Technically, we only need a secretID per connExec that's associated with a client SQL session, not for those used inside InternalExecutor.
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 97 at r1 (raw file):
BackendKeyData that is sent back to the client. The status server will have a new endpoint named PGWireCancelQuery, analogous to
I'd call it CancelQueryByKey
, no need to say "pg" in there.
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 108 at r1 (raw file):
cancellation requests. If a cancel request fails, which is likely to only happen if an attacker is spamming requests, it will be penalized by holding onto the semaphore for an extra second. If we set the concurrency of the semaphore to 256
You have not explained what this semaphore is yet at this point.
Also, as explained in the PR review already, I think the semaphore should be at the ingress, on the first sql server that receives the cancel request over pgwire, not in the target handler of the RPC; otherwise an attacker can flood the pod-to-pod network with bogus requests.
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 138 at r1 (raw file):
When a proxy sees a CancelRequest, it first extracts the IP address component of the message. If needed, it will forward the request to the proxy with that address using an RPC that is only for proxy-proxy communication. This RPC
This proxy-to-proxy communication is new. The RFC should highlight this, and contain a sub-section that explains how proxies will authenticate themselves to each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skimmed this RFC at it matches up with my understanding of what was discussed elsewhere. I'll leave the detailed review to others.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @rafiss)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 94 at r1 (raw file):
when it is initialized, and register it with the server’s [SessionRegistry](https://github.com/cockroachdb/cockroach/blob/a434c8418c36dbeb64e73588bcd4dd5b248c3238/pkg/sql/conn_executor.go#L1692). The 32-bit SQLInstanceID and the random 32 bit secretID will be used as the
In a multi-tenant setting, SQLInstanceID should never be more than 15 bits. We need this to be true for unique_rowid
to remain sound. We have an outstanding problem whereby we use the node ID as the instance ID indiscriminately. We potentially ought to change the dedicated servers to utilize the same leasing protocol used in multi-tenant settings for unique_rowid()
and potentially this as well. If we do so, we can have a bunch more bits for the secret, which will make the argument more compelling. I suggest we leave that as future work, because it's a lot to stake on this project right now, but it doesn't seem too far off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz)
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 16 at r1 (raw file):
Previously, knz (kena) wrote…
I think the summary needs to start with a sentence that says what the query cancel protocol is and why it's important.
good point
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 19 at r1 (raw file):
Previously, knz (kena) wrote…
node and session. The query is implicit (current query).
Ben says we should think about the query, but I disagree. This is explained here:
nice catch
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 91 at r1 (raw file):
Previously, knz (kena) wrote…
Technically, we only need a secretID per connExec that's associated with a client SQL session, not for those used inside InternalExecutor.
true
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 94 at r1 (raw file):
Previously, ajwerner wrote…
In a multi-tenant setting, SQLInstanceID should never be more than 15 bits. We need this to be true for
unique_rowid
to remain sound. We have an outstanding problem whereby we use the node ID as the instance ID indiscriminately. We potentially ought to change the dedicated servers to utilize the same leasing protocol used in multi-tenant settings forunique_rowid()
and potentially this as well. If we do so, we can have a bunch more bits for the secret, which will make the argument more compelling. I suggest we leave that as future work, because it's a lot to stake on this project right now, but it doesn't seem too far off.
i think we can still use more bits in basically all cases, without needing to do that future work. Thanks to @knz for the idea back in his initial review of my PR from 6 months ago
I'm including something like this in the next revision of my original PR
const (
lower32BitsMask uint64 = 0x00000000FFFFFFFF
lower52BitsMask uint64 = 0x000FFFFFFFFFFFFF
leadingBitMask = 1 << 63
)
func MakeBackendKeyData(rng *rand.Rand, sqlInstanceID base.SQLInstanceID) BackendKeyData {
ret := rng.Uint64()
if sqlInstanceID < 1<<11 {
// Only keep the lower 52 bits
ret = ret & lower52BitsMask
// Set the upper 12 bits based on the sqlInstanceID.
ret = ret | (uint64(sqlInstanceID) << 52)
} else {
// Only keep the lower 32 bits.
ret = ret & lower32BitsMask
// Set the leading bit.
ret = ret | leadingBitMask
// Set the other upper 31 bits based on the sqlInstanceID.
ret = ret | (uint64(sqlInstanceID) << 32)
}
return BackendKeyData(ret)
}
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 97 at r1 (raw file):
Previously, knz (kena) wrote…
I'd call it
CancelQueryByKey
, no need to say "pg" in there.
donce
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 108 at r1 (raw file):
Previously, knz (kena) wrote…
You have not explained what this semaphore is yet at this point.
Also, as explained in the PR review already, I think the semaphore should be at the ingress, on the first sql server that receives the cancel request over pgwire, not in the target handler of the RPC; otherwise an attacker can flood the pod-to-pod network with bogus requests.
definitely agree about adding the semaphore at the ingress. i'm already incorporating that in the next revision of my original PR, which i'll publish soon
will update RFC
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 138 at r1 (raw file):
Previously, knz (kena) wrote…
This proxy-to-proxy communication is new. The RFC should highlight this, and contain a sub-section that explains how proxies will authenticate themselves to each other.
will add a proxy-proxy communication highlight.
i'm not sure how proxies will authenticate themselves to each other. would it be sufficient to say that they don't need to if they are all in the same private subnet? (proxies aren't exposed to the internet, since they are behind a cloud load balancer)
6df50d1
to
bcfaf5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @rafiss)
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 138 at r1 (raw file):
would it be sufficient to say that they don't need to if they are all in the same private subnet?
heh, no. Defense in depth, remember? If private subnets were sufficient, we wouldn't care about using TLS throughout the CC infrastructure. And yet we do.
bcfaf5f
to
3078823
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz)
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 138 at r1 (raw file):
Previously, knz (kena) wrote…
would it be sufficient to say that they don't need to if they are all in the same private subnet?
heh, no. Defense in depth, remember? If private subnets were sufficient, we wouldn't care about using TLS throughout the CC infrastructure. And yet we do.
got it, added more details now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
things here make sense, must've been a lot of discussions :)
i think a few diagrams here would make it easier to digest.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @rafiss)
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 11 at r3 (raw file):
The proposal here is entirely dependent on many thoughtful discussions and prior work with Jordan Lewis, knz, Andrew Werner, Andy Kimball, Peter Mattis, Ben
haha that's not a full name :')
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 48 at r3 (raw file):
Nearly all Postgres drivers support the Postgres query cancellation protocol. For example, the PGJDBC [setQueryTimeout](https://jdbc.postgresql.org/documentation/publicapi/org/postgresql/jdbc/PgStatement.html#setQueryTimeout-int-)
nit: if you want to avoid these long lines you could use the markdown thing where you put links down at the bottom.
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 101 at r3 (raw file):
set to the SQLInstanceID. Note that SQLInstanceIDs are always positive, so it's safe to use the leading bit in this way. This BackendKeyData is sent back to the client.
it would be great if we could have a visualisation of how the 64 bit space is used (but i am a very visual person so up to you)
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 113 at r3 (raw file):
Since this endpoint is unauthenticated, before performing any business logic, it will use a semaphore to guard the number of concurrent cancellation requests. If
are we penalised if the query just succeeded before the CANCEL QUERY request?
not saying it's too important, but i can imagine a crazy-ish case where say there's a blocked query, we issue 3000 cancels to all existing queries, blocked query gets cancelled, everything else succeeds but we still gotta wait 3000/concurrency seconds. very hypothetical and not important, just curious.
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 138 at r3 (raw file):
to the server. When a proxy sees a BackendKeyData, it will generate a new random 32-bit secret
it would be great to get a sequence diagram of the protocol here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with nit about clarifying semaphore semantics.
I'll let oliver finish iterating with you.
Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @knz, @otan, and @rafiss)
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 113 at r3 (raw file):
Previously, otan (Oliver Tan) wrote…
are we penalised if the query just succeeded before the CANCEL QUERY request?
not saying it's too important, but i can imagine a crazy-ish case where say there's a blocked query, we issue 3000 cancels to all existing queries, blocked query gets cancelled, everything else succeeds but we still gotta wait 3000/concurrency seconds. very hypothetical and not important, just curious.
good point. I see in the code it's a TryAcquire, not Acquire, but that's worth explaining here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @knz and @otan)
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 11 at r3 (raw file):
Previously, otan (Oliver Tan) wrote…
haha that's not a full name :')
:D
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 48 at r3 (raw file):
Previously, otan (Oliver Tan) wrote…
nit: if you want to avoid these long lines you could use the markdown thing where you put links down at the bottom.
eh i'm not a huge fan of that actually
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 101 at r3 (raw file):
Previously, otan (Oliver Tan) wrote…
it would be great if we could have a visualisation of how the 64 bit space is used (but i am a very visual person so up to you)
nice idea
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 113 at r3 (raw file):
Previously, knz (kena) wrote…
good point. I see in the code it's a TryAcquire, not Acquire, but that's worth explaining here.
adding more details
docs/RFCS/20220202_pgwire_compatible_cancel.md, line 138 at r3 (raw file):
Previously, otan (Oliver Tan) wrote…
it would be great to get a sequence diagram of the protocol here
will try to add something. let's see how my text diagram skills are
Release note: None
3078823
to
bfb9b8b
Compare
thanks for the reviews all. I've added a few diagrams as suggested. bors r=otan,knz |
Build succeeded: |
67501: sql/pgwire: implement pgwire query cancellation r=knz a=rafiss closes #41335 See individual commits for more details. This adds a BackendKeyData struct which identifies a session and SQLInstance running that session. Clients send it to cancel a request, and it's handled by a new CancelQueryByBackendKeyData endpoint in the status server. The cancellation gets forwarded to the correct node, and there is a per-node rate limit to prevent a DoS attack. Also, refer to the RFC: #75870 This PR only adds support for non-multitenant clusters. Refer to the RFC for additional work needed for serverless/multitenant support. Co-authored-by: Rafi Shamim <[email protected]>
refs #41335
Release note: None