Feature request: traffic mirroring for load testing #302

NikolayS · 2023-01-29T01:38:36Z

@levkk Thanks for the very interesting tool, it's impressive how fast it's getting new features!

Is your feature request related to a problem? Please describe.
I wonder if it would make sense and if it wouldn't be too difficult to implement mirroring of queries coming to a node to allow testing similar to what is described in https://heapanalytics.com/blog/engineering/testing-database-changes-right-way.

Describe the solution you'd like
To be able to perform load testing using the mirroring feature on the one hand, and to reduce risks on the other, the following would be needed, I guess:

if mirroring is configured for a node, all SQL queries coming to it, are repeated to some other node that is being tested (to compare its behavior / resource utilization against the node being actually used)
Such queries are sent asynchronously, not blocking anything, and the result (response from Postgres) is ignored
The latency overhead from using this feature should be very low (<1 ms per query), not to affect the "real" queries

Describe alternatives you've considered
Mirroring of such kind would enable a very good way to perform load testing when performing Postgres major upgrades and similar activities. Usually, building a good benchmarking framework is a difficult task – synthetic benchmarks like pgbench, sysbench, replaying logs, traffic simulation – all these approaches have big cons, the main of which is, the resulting workload is very different from what production actually has. The mirroring approach provides exactly the same workload, so the results of the load testing are the most reliable.

Additional context
Specking of context, of course, synchronization of the state of the clone being tested is am interesting task, but it's solvable – we can start from building a physical replica for a node, then convert it to logical (there is a way to do it very fast, even for very large databases), and then perform the change we need (such as PG major upgrade, OS upgrade, etc.), then catch up in terms of the lag – and start mirroring + benchmarking.

levkk · 2023-01-29T18:32:21Z

Thank you!

I agree, this is a great feature to have. I'll put it in our backlog.

Thinking out loud, we could have something like this in our config to support this:

servers = [
   [ "127.0.0.1", 5432, "primary" ],
   [ "127.0.0.1", 6432, "mirror" ],
]

and for each message we send to "primary" we send the same to "mirror". We might need to manage the "mirror" connection still I think, to make sure it's in a state that it can receive messages and is actually processing queries. We could also add some metrics there as well, like latency, to show how fast the mirror is compared to the primary.

I agree that if we do this in the pgcat, the latency impact will be minimal, but it will use twice as much CPU during mirroring. As long as the host is provisioned to have enough CPU cores, we should be ok.

drdrsh · 2023-01-29T22:46:42Z

I am wondering if we can make it more generic to allow mirroring to multiple servers

servers = [
   [ "127.0.0.1", 5432, "primary" ],
   [ "127.0.0.1", 6432, "replica" ],
]
mirrors = [
  ["1.8.8.8", 5432, 0], # mirrors instance 0
  ["2.8.8.8", 5432, 0], # mirrors instance 0
  ["3.8.8.8", 5432, 1], # mirrors instance 1 
]

Keeping the replicas in sync is indeed an interesting problem.

NikolayS · 2023-01-30T22:06:09Z

@levkk thanks for the quick response! Just in case, I could help benchmark the implementation to ensure that everything is as expected in terms of overhead, etc.

@drdrsh

I am wondering if we can make it more generic to allow mirroring to multiple servers

this would enable interesting use cases, like building a "lab" that has different servers (different Postgres major versions, different OSes, different hardware), sits next to production servers, and runs exactly the same workload on all at the same time, performing load testing with "real traffic" and comparing. Of course, as mentioned, if there are enough CPU resources on the server with pgcat (and if the network allows – in some cases, it can be a problem too).

This is an implementation of Query mirroring in PgCat (outlined here #302) In configs, we match mirror hosts with the servers handling the traffic. A mirror host will receive the same protocol messages as the main server it was matched with. This is done by creating an async task for each mirror server, it communicates with the main server through two channels, one for the protocol messages and one for the exit signal. The mirror server sends the protocol packets to the underlying PostgreSQL server. We receive from the underlying PostgreSQL server as soon as the data is available and we immediately discard it. We use bb8 to manage the life cycle of the connection, not for pooling since each mirror server handler is more or less single-threaded. We don't have any connection pooling in the mirrors. Matching each mirror connection to an actual server connection guarantees that we will not have more connections to any of the mirrors than the parent pool would allow.

drdrsh · 2023-03-10T12:26:02Z

Added in #341

drdrsh · 2023-03-10T16:36:28Z

@NikolayS FYI. The feature has been merged. I still have a follow up to expose stats from mirror traffic in SHOW * commands.

NikolayS changed the title ~~Possible feature: traffic mirroring for load testing~~ Feature request: traffic mirroring for load testing Jan 29, 2023

levkk added the enhancement New feature or request label Jan 29, 2023

drdrsh mentioned this issue Mar 4, 2023

PgCat Query Mirroring #341

Merged

drdrsh closed this as completed Mar 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: traffic mirroring for load testing #302

Feature request: traffic mirroring for load testing #302

NikolayS commented Jan 29, 2023 •

edited

Loading

levkk commented Jan 29, 2023 •

edited

Loading

drdrsh commented Jan 29, 2023

NikolayS commented Jan 30, 2023

drdrsh commented Mar 10, 2023

drdrsh commented Mar 10, 2023

Feature request: traffic mirroring for load testing #302

Feature request: traffic mirroring for load testing #302

Comments

NikolayS commented Jan 29, 2023 • edited Loading

levkk commented Jan 29, 2023 • edited Loading

drdrsh commented Jan 29, 2023

NikolayS commented Jan 30, 2023

drdrsh commented Mar 10, 2023

drdrsh commented Mar 10, 2023

NikolayS commented Jan 29, 2023 •

edited

Loading

levkk commented Jan 29, 2023 •

edited

Loading