Skip to content

Commit

Permalink
rfc: pgwire-compatible query cancellation
Browse files Browse the repository at this point in the history
Release note: None
  • Loading branch information
rafiss committed Feb 6, 2022
1 parent a434c84 commit bcfaf5f
Showing 1 changed file with 224 additions and 0 deletions.
224 changes: 224 additions & 0 deletions docs/RFCS/20220202_pgwire_compatible_cancel.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
- Feature Name: Postgres-Compatible Cancel Protocol
- Status: draft
- Start Date: 2022-02-02
- Authors: Rafi Shamim
- RFC PR: https://github.com/cockroachdb/cockroach/pull/75870
- Cockroach Issue: https://github.com/cockroachdb/cockroach/issues/41335

## Dedication

The proposal here is entirely dependent on many thoughtful discussions and prior
work with Jordan Lewis, knz, Andrew Werner, Andy Kimball, Peter Mattis, Ben
Darnell, and several others going back to 2019 all the way until now.

## Summary

The Postgres (pgwire) query cancel protocol provides a way to cancel a query
running in a SQL session. Many database drivers use this protocol, but
currently, CockroachDB just ignores any pgwire cancel request. The protocol is
hard to implement since it only uses 64 bits of data as an identifier, and is
sent over a separate (unauthenticated) connection, different from the SQL
connection. For dedicated clusters, these 64 bits of data need to identify a
node and session to cancel. We need at most 32 bits to identify which node is
running the query, so that when any node receives a cancellation request, it can
forward the request to the correct node in a cluster. We use the other bits to
identify a session running on that node. Finally, we add a semaphore to guard
the cancel logic so that an attacker cannot spam guesses for session IDs.

In CockroachDB Serverless, a SQL proxy instance also needs to identify which
tenant to send the cancel request to. To solve this, each SQL proxy will save
the 64-bit cancel keys returned by the SQL node, then send a different 64-bit
key back to the client. This proxy-client key will contain the IP address of the
proxy that is able to handle this key. (Or, if the proxies are all on the same
subnet, fewer bits can be used to identify the proxy.) The remaining bits are
random. When any proxy receives a cancel request, it can look at the key to
figure out which other proxy to forward the request to. Then when the correct
proxy receives the request, it checks that the random bits of the key are in its
in-memory map of cancel keys, and forwards the original cancel key to the tenant
where it came from.

Serverless support can be implemented entirely separately from
dedicated/self-hosted support.


## Motivation

Nearly all Postgres drivers support the Postgres query cancellation protocol.
For example, the PGJDBC
[setQueryTimeout](https://jdbc.postgresql.org/documentation/publicapi/org/postgresql/jdbc/PgStatement.html#setQueryTimeout-int-)
setting
[uses](https://github.com/pgjdbc/pgjdbc/blob/3a54d28e0b416a84353d85e73a23180a6719435e/pgjdbc/src/main/java/org/postgresql/core/QueryExecutorBase.java#L171)
it. Currently, when a client sends a cancellation request using this protocol,
CockroachDB simply ignores it. Implementing it would mean that applications
using drivers like this would immediately benefit. Specifically, it would allow
CockroachDB to stop executing queries that the client is no longer waiting for,
thereby reducing load on the cluster.

This protocol is the top unimplemented feature in our telemetry data. According
to our [Looker dashboard](https://cockroachlabs.looker.com/looks/47), 3,430
long-running clusters have attempted to use it (compared to 456 clusters for the
next unimplemented feature).


## Background

The [Postgres
documentation](https://www.postgresql.org/docs/14/protocol-flow.html#id-1.10.5.7.9)
describes how the protocol works. During connection startup, the server returns
a BackendKeyData message to the client. This is a 64-bit value; in Postgres 32
bits are used for a process ID, and 32 bits are used for a random secret that is
generated when the connection starts.

To issue a cancel request, the client opens a new unencrypted connection to the
server and sends a CancelRequest message. For security reasons, the server never
replies to this message. If the data in the request matches the BackendKeyData
that was generated earlier, then the query is cancelled.

The Postgres documentation also mentions that this protocol is best-effort, and
specifically says, "Issuing a cancel simply improves the odds that the current
query will finish soon, and improves the odds that it will fail with an error
message instead of succeeding."

There have been internal discussions about this in [April
2020](https://github.com/cockroachdb/cockroach/pull/34520#discussion_r407414290),
[July 2021](https://github.com/cockroachdb/cockroach/pull/67501), and [January
2022](https://cockroachlabs.slack.com/archives/CGA9F858R/p1643382222564939).


## Technical Design


#### SQL Node Changes

The connExecutor will be updated to generate a random 64-bit integer
(BackendKeyData) when it is initialized, and register it with the server’s
[SessionRegistry](https://github.com/cockroachdb/cockroach/blob/a434c8418c36dbeb64e73588bcd4dd5b248c3238/pkg/sql/conn_executor.go#L1692).
If the SQL node's 32-bit SQLInstanceID fits in 11 bits, then the leading bit of
the BackendKeyData is set to 0, and the following 11 bits are set to the
SQLInstanceID. Otherwise, the leading bit is set to 1, and the next 31 bits are
set to the SQLInstanceID. Note that SQLInstanceIDs are always positive, so it's
safe to use the leading bit in this way. This BackendKeyData is sent back to the
client.

The status server will have a new endpoint named CancelQueryByBackendKeyData,
analogous to the existing CancelQuery endpoint. The main difference is that it
only has a 64-bit BackendKeyData in the request body. The endpoint will extract
the SQLInstanceID from the BackendKeyData and will forward the request to the
correct node. This endpoint will only be called by the pgwire/server code that
handles a CancelRequest – the endpoint is not meant to be called directly by a
client, so therefore it is not exposed on HTTP. The endpoint will call the
SessionRegistry's cancellation function using the BackendKeyData.

Since this endpoint is unauthenticated, before performing any business logic, it
will use a semaphore to guard the number of concurrent cancellation requests. If
a cancel request fails, which is likely to only happen if an attacker is
spamming requests, it will be penalized by holding onto the semaphore for an
extra second. This semaphore prevents an attacker from being able to cause
excess inter-node network traffic, and from being able to brute force a
successful cancel request. If we set the concurrency of the semaphore to 256
(2^8), then that means it would take an attacker 2^24 seconds to guess all
possible 32-bit secrets and be guaranteed to cancel something. If we suppose
there are 256 concurrent queries running on the node, then on average it would
take 2^16 seconds (18 hours) to cancel any one of the queries. In the more
common case, where we use 52-bits of randomness in the BackendKeyData, the
time-to-expected-cancel increases to 2^36 seconds (2117 years).

An attacker could still spam cancel requests and use up the entire quota of the
semaphore, and therefore starve legitimate cancel requests from being handled.
We consider this risk acceptable in the short-term, since the protocol is
best-effort, and this behavior is no worse than the status quo.


#### SQL Proxy Changes

The proxy code will be updated to intercept BackendKeyData messages that are
sent to the client as well as CancelRequest messages that are sent by the client
to the server.

When a proxy sees a BackendKeyData, it will generate a new random 32-bit secret
named proxySecretID. (NB: This proxySecretID could be larger depending on how
many bits are needed to identify a proxy instance.) The proxySecretID will be
used to key a map whose values are structs containing (1) the original
BackendKeyData, (2) the address of the SQL node, and (3) the remote client
address that initiated this connection. The proxy will then return a new
BackendKeyData consisting of 32 bits for the proxy’s IP address, and 32 bits for
the proxySecretID. If the proxies are all on a subnet, then fewer bits are
needed for the IP address.

When a proxy sees a CancelRequest, it first extracts the IP address component of
the message. If needed, it will forward the request to the proxy with that
address using an RPC that is only for proxy-proxy communication. This RPC
request will include the remote address of the client that sent the
CancelRequest. If the proxy is the intended recipient, then it will extract the
proxySecretID and check if it exists in the BackendKeyData map. If so, it will
check that the remote client address in the map matches the address that sent
the CancelRequest. If that matches, then the proxy will send a CancelRequest
with the original BackendKeyData to the SQL node using the address that is
stored in the map.

If the proxy migrates a session from one SQL node to another, that will make the
BackendKeyData it had previously saved obsolete. When the migration occurs, the
proxy will need to update the contents of its BackendKeyData map and replace the
old data with the new BackendKeyData provided by the session on the new SQL
node.

If the proxy crashes unexpectedly, then all the BackendKeyData entries will be
lost. From the clients’ perspective, the connection will be broken when the
proxy crashes, and they will not be able to interact with any in-flight queries,
so it’s not a huge problem that the queries can no longer be cancelled.

#### Proxy to Proxy communication

The previous section mentioned that proxies will start making RPCs to other
proxies. Proxy to proxy communication does not currently exist anywhere else in
the architecture. When it's added, we need to ensure that the RPC is only
exposed to other proxy instances. One way of achieving this is to make sure the
proxies are in a private subnet that is not exposed to the internet.


### Alternatives


#### Make SQL nodes verify the remote address

Instead of using a semaphore, the SQL nodes could also make sure the remote
client address that received the BackendKeyData matches the remote address that
sent the CancelRequest. This would work well in dedicated clusters. But in
serverless clusters, this would mean that the SQL proxy needs to propagate the
remote address of the client while sending the CancelRequest. For normal SQL
sessions, this is done using the **crdb:remote_addr** StartupMessage, but the
cancel protocol does not have a similar way of sending data like this.


#### Obfuscate the SQL proxy identifiers

Using 32 bits of the proxy-client BackendKeyData for the proxy IP address means
that an attacker could more easily guess an address and spam it. Instead, we
could obfuscate the address by hashing it along with a salt that is shared by
all the proxy instances. This would require additional secret management at the
proxy layer. To avoid this complexity, the proposal instead is to validate the
address sending the CancelRequest. This still allows an attacker to cause
additional network traffic, but prevents them from doing any functional damage.

### Possible Future Extensions

#### Custom protocol for SQL node to SQL proxy communication

The SQL node to SQL proxy part of the protocol described above could be changed
to be entirely custom. This would allow us to use more bits to identify the
session to cancel, and would eliminate the need for a semaphore. However, this
custom protocol could only be used in deployments where there is a SQL proxy in
front of the SQL nodes.

#### IP-based rate limiting

Another way of preventing an attacker from spamming cancel requests is to rate
limit these requests by IP. This also would eliminate the need for a semaphore,
and additionally, would prevent an attacker from causing extra network traffic
and starving legitimate cancel requests. However, we cannot add an IP-based rate
limiter at the SQL node layer, unless we also develop a custom protocol for
propagating the remote client address from the proxy to the SQL node.

It is possible that in the future, _all_ clusters might live behind a SQL proxy
process (possibly an embedded process). If we wait until that is the case, then
we can add IP-based rate limiting at the proxy layer exclusively.

0 comments on commit bcfaf5f

Please sign in to comment.