rfc: pgwire-compatible query cancellation

Release note: None
cockroachdb · Feb 6, 2022 · bcfaf5f · bcfaf5f
1 parent a434c84
commit bcfaf5f
Showing 1 changed file with 224 additions and 0 deletions.
diff --git a/docs/RFCS/20220202_pgwire_compatible_cancel.md b/docs/RFCS/20220202_pgwire_compatible_cancel.md
@@ -0,0 +1,224 @@
+- Feature Name: Postgres-Compatible Cancel Protocol
+- Status: draft
+- Start Date: 2022-02-02
+- Authors: Rafi Shamim
+- RFC PR: https://github.com/cockroachdb/cockroach/pull/75870
+- Cockroach Issue: https://github.com/cockroachdb/cockroach/issues/41335
+
+## Dedication
+
+The proposal here is entirely dependent on many thoughtful discussions and prior
+work with Jordan Lewis, knz, Andrew Werner, Andy Kimball, Peter Mattis, Ben
+Darnell, and several others going back to 2019 all the way until now.
+
+## Summary
+
+The Postgres (pgwire) query cancel protocol provides a way to cancel a query
+running in a SQL session. Many database drivers use this protocol, but
+currently, CockroachDB just ignores any pgwire cancel request. The protocol is
+hard to implement since it only uses 64 bits of data as an identifier, and is
+sent over a separate (unauthenticated) connection, different from the SQL
+connection. For dedicated clusters, these 64 bits of data need to identify a
+node and session to cancel. We need at most 32 bits to identify which node is
+running the query, so that when any node receives a cancellation request, it can
+forward the request to the correct node in a cluster. We use the other bits to
+identify a session running on that node. Finally, we add a semaphore to guard
+the cancel logic so that an attacker cannot spam guesses for session IDs.
+
+In CockroachDB Serverless, a SQL proxy instance also needs to identify which
+tenant to send the cancel request to. To solve this, each SQL proxy will save
+the 64-bit cancel keys returned by the SQL node, then send a different 64-bit
+key back to the client. This proxy-client key will contain the IP address of the
+proxy that is able to handle this key. (Or, if the proxies are all on the same
+subnet, fewer bits can be used to identify the proxy.) The remaining bits are
+random. When any proxy receives a cancel request, it can look at the key to
+figure out which other proxy to forward the request to. Then when the correct
+proxy receives the request, it checks that the random bits of the key are in its
+in-memory map of cancel keys, and forwards the original cancel key to the tenant
+where it came from.
+
+Serverless support can be implemented entirely separately from
+dedicated/self-hosted support.
+
+
+## Motivation
+
+Nearly all Postgres drivers support the Postgres query cancellation protocol.
+For example, the PGJDBC
+[setQueryTimeout](https://jdbc.postgresql.org/documentation/publicapi/org/postgresql/jdbc/PgStatement.html#setQueryTimeout-int-)
+setting
+[uses](https://github.com/pgjdbc/pgjdbc/blob/3a54d28e0b416a84353d85e73a23180a6719435e/pgjdbc/src/main/java/org/postgresql/core/QueryExecutorBase.java#L171)
+it. Currently, when a client sends a cancellation request using this protocol,
+CockroachDB simply ignores it. Implementing it would mean that applications
+using drivers like this would immediately benefit. Specifically, it would allow
+CockroachDB to stop executing queries that the client is no longer waiting for,
+thereby reducing load on the cluster.
+
+This protocol is the top unimplemented feature in our telemetry data. According
+to our [Looker dashboard](https://cockroachlabs.looker.com/looks/47), 3,430
+long-running clusters have attempted to use it (compared to 456 clusters for the
+next unimplemented feature).
+
+
+## Background
+
+The [Postgres
+documentation](https://www.postgresql.org/docs/14/protocol-flow.html#id-1.10.5.7.9)
+describes how the protocol works. During connection startup, the server returns
+a BackendKeyData message to the client. This is a 64-bit value; in Postgres 32
+bits are used for a process ID, and 32 bits are used for a random secret that is
+generated when the connection starts.
+
+To issue a cancel request, the client opens a new unencrypted connection to the
+server and sends a CancelRequest message. For security reasons, the server never
+replies to this message. If the data in the request matches the BackendKeyData
+that was generated earlier, then the query is cancelled.
+
+The Postgres documentation also mentions that this protocol is best-effort, and
+specifically says, "Issuing a cancel simply improves the odds that the current
+query will finish soon, and improves the odds that it will fail with an error
+message instead of succeeding."
+
+There have been internal discussions about this in [April
+2020](https://github.com/cockroachdb/cockroach/pull/34520#discussion_r407414290),
+[July 2021](https://github.com/cockroachdb/cockroach/pull/67501), and [January
+2022](https://cockroachlabs.slack.com/archives/CGA9F858R/p1643382222564939).
+
+
+## Technical Design
+
+
+#### SQL Node Changes
+
+The connExecutor will be updated to generate a random 64-bit integer
+(BackendKeyData) when it is initialized, and register it with the server’s
+[SessionRegistry](https://github.com/cockroachdb/cockroach/blob/a434c8418c36dbeb64e73588bcd4dd5b248c3238/pkg/sql/conn_executor.go#L1692).
+If the SQL node's 32-bit SQLInstanceID fits in 11 bits, then the leading bit of
+the BackendKeyData is set to 0, and the following 11 bits are set to the
+SQLInstanceID. Otherwise, the leading bit is set to 1, and the next 31 bits are
+set to the SQLInstanceID. Note that SQLInstanceIDs are always positive, so it's
+safe to use the leading bit in this way. This BackendKeyData is sent back to the
+client.
+
+The status server will have a new endpoint named CancelQueryByBackendKeyData,
+analogous to the existing CancelQuery endpoint. The main difference is that it
+only has a 64-bit BackendKeyData in the request body. The endpoint will extract
+the SQLInstanceID from the BackendKeyData and will forward the request to the
+correct node. This endpoint will only be called by the pgwire/server code that
+handles a CancelRequest – the endpoint is not meant to be called directly by a
+client, so therefore it is not exposed on HTTP. The endpoint will call the
+SessionRegistry's cancellation function using the BackendKeyData.
+
+Since this endpoint is unauthenticated, before performing any business logic, it
+will use a semaphore to guard the number of concurrent cancellation requests. If
+a cancel request fails, which is likely to only happen if an attacker is
+spamming requests, it will be penalized by holding onto the semaphore for an
+extra second. This semaphore prevents an attacker from being able to cause
+excess inter-node network traffic, and from being able to brute force a
+successful cancel request. If we set the concurrency of the semaphore to 256
+(2^8), then that means it would take an attacker 2^24 seconds to guess all
+possible 32-bit secrets and be guaranteed to cancel something. If we suppose
+there are 256 concurrent queries running on the node, then on average it would
+take 2^16 seconds (18 hours) to cancel any one of the queries. In the more
+common case, where we use 52-bits of randomness in the BackendKeyData, the
+time-to-expected-cancel increases to 2^36 seconds (2117 years).
+
+An attacker could still spam cancel requests and use up the entire quota of the
+semaphore, and therefore starve legitimate cancel requests from being handled.
+We consider this risk acceptable in the short-term, since the protocol is
+best-effort, and this behavior is no worse than the status quo.
+
+
+#### SQL Proxy Changes
+
+The proxy code will be updated to intercept BackendKeyData messages that are
+sent to the client as well as CancelRequest messages that are sent by the client
+to the server.
+
+When a proxy sees a BackendKeyData, it will generate a new random 32-bit secret
+named proxySecretID. (NB: This proxySecretID could be larger depending on how
+many bits are needed to identify a proxy instance.) The proxySecretID will be
+used to key a map whose values are structs containing (1) the original
+BackendKeyData, (2) the address of the SQL node, and (3) the remote client
+address that initiated this connection. The proxy will then return a new
+BackendKeyData consisting of 32 bits for the proxy’s IP address, and 32 bits for
+the proxySecretID. If the proxies are all on a subnet, then fewer bits are
+needed for the IP address.
+
+When a proxy sees a CancelRequest, it first extracts the IP address component of
+the message. If needed, it will forward the request to the proxy with that
+address using an RPC that is only for proxy-proxy communication. This RPC
+request will include the remote address of the client that sent the
+CancelRequest. If the proxy is the intended recipient, then it will extract the
+proxySecretID and check if it exists in the BackendKeyData map. If so, it will
+check that the remote client address in the map matches the address that sent
+the CancelRequest. If that matches, then the proxy will send a CancelRequest
+with the original BackendKeyData to the SQL node using the address that is
+stored in the map.
+
+If the proxy migrates a session from one SQL node to another, that will make the
+BackendKeyData it had previously saved obsolete. When the migration occurs, the
+proxy will need to update the contents of its BackendKeyData map and replace the
+old data with the new BackendKeyData provided by the session on the new SQL
+node.
+
+If the proxy crashes unexpectedly, then all the BackendKeyData entries will be
+lost. From the clients’ perspective, the connection will be broken when the
+proxy crashes, and they will not be able to interact with any in-flight queries,
+so it’s not a huge problem that the queries can no longer be cancelled.
+
+#### Proxy to Proxy communication
+
+The previous section mentioned that proxies will start making RPCs to other
+proxies. Proxy to proxy communication does not currently exist anywhere else in
+the architecture. When it's added, we need to ensure that the RPC is only
+exposed to other proxy instances. One way of achieving this is to make sure the
+proxies are in a private subnet that is not exposed to the internet.
+
+
+### Alternatives
+
+
+#### Make SQL nodes verify the remote address
+
+Instead of using a semaphore, the SQL nodes could also make sure the remote
+client address that received the BackendKeyData matches the remote address that
+sent the CancelRequest. This would work well in dedicated clusters. But in
+serverless clusters, this would mean that the SQL proxy needs to propagate the
+remote address of the client while sending the CancelRequest. For normal SQL
+sessions, this is done using the **crdb:remote_addr** StartupMessage, but the
+cancel protocol does not have a similar way of sending data like this.
+
+
+#### Obfuscate the SQL proxy identifiers
+
+Using 32 bits of the proxy-client BackendKeyData for the proxy IP address means
+that an attacker could more easily guess an address and spam it. Instead, we
+could obfuscate the address by hashing it along with a salt that is shared by
+all the proxy instances. This would require additional secret management at the
+proxy layer. To avoid this complexity, the proposal instead is to validate the
+address sending the CancelRequest. This still allows an attacker to cause
+additional network traffic, but prevents them from doing any functional damage.
+
+### Possible Future Extensions
+
+#### Custom protocol for SQL node to SQL proxy communication
+
+The SQL node to SQL proxy part of the protocol described above could be changed
+to be entirely custom. This would allow us to use more bits to identify the
+session to cancel, and would eliminate the need for a semaphore. However, this
+custom protocol could only be used in deployments where there is a SQL proxy in
+front of the SQL nodes.
+
+#### IP-based rate limiting
+
+Another way of preventing an attacker from spamming cancel requests is to rate
+limit these requests by IP. This also would eliminate the need for a semaphore,
+and additionally, would prevent an attacker from causing extra network traffic
+and starving legitimate cancel requests. However, we cannot add an IP-based rate
+limiter at the SQL node layer, unless we also develop a custom protocol for
+propagating the remote client address from the proxy to the SQL node.
+
+It is possible that in the future, _all_ clusters might live behind a SQL proxy
+process (possibly an embedded process). If we wait until that is the case, then
+we can add IP-based rate limiting at the proxy layer exclusively.