Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: traffic mirroring for load testing #302

Closed
NikolayS opened this issue Jan 29, 2023 · 5 comments
Closed

Feature request: traffic mirroring for load testing #302

NikolayS opened this issue Jan 29, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@NikolayS
Copy link

NikolayS commented Jan 29, 2023

@levkk Thanks for the very interesting tool, it's impressive how fast it's getting new features!

Is your feature request related to a problem? Please describe.
I wonder if it would make sense and if it wouldn't be too difficult to implement mirroring of queries coming to a node to allow testing similar to what is described in https://heapanalytics.com/blog/engineering/testing-database-changes-right-way.

Describe the solution you'd like
To be able to perform load testing using the mirroring feature on the one hand, and to reduce risks on the other, the following would be needed, I guess:

  • if mirroring is configured for a node, all SQL queries coming to it, are repeated to some other node that is being tested (to compare its behavior / resource utilization against the node being actually used)
  • Such queries are sent asynchronously, not blocking anything, and the result (response from Postgres) is ignored
  • The latency overhead from using this feature should be very low (<1 ms per query), not to affect the "real" queries

Describe alternatives you've considered
Mirroring of such kind would enable a very good way to perform load testing when performing Postgres major upgrades and similar activities. Usually, building a good benchmarking framework is a difficult task – synthetic benchmarks like pgbench, sysbench, replaying logs, traffic simulation – all these approaches have big cons, the main of which is, the resulting workload is very different from what production actually has. The mirroring approach provides exactly the same workload, so the results of the load testing are the most reliable.

Additional context
Specking of context, of course, synchronization of the state of the clone being tested is am interesting task, but it's solvable – we can start from building a physical replica for a node, then convert it to logical (there is a way to do it very fast, even for very large databases), and then perform the change we need (such as PG major upgrade, OS upgrade, etc.), then catch up in terms of the lag – and start mirroring + benchmarking.

@NikolayS NikolayS changed the title Possible feature: traffic mirroring for load testing Feature request: traffic mirroring for load testing Jan 29, 2023
@levkk
Copy link
Contributor

levkk commented Jan 29, 2023

Thank you!

I agree, this is a great feature to have. I'll put it in our backlog.

Thinking out loud, we could have something like this in our config to support this:

servers = [
   [ "127.0.0.1", 5432, "primary" ],
   [ "127.0.0.1", 6432, "mirror" ],
]

and for each message we send to "primary" we send the same to "mirror". We might need to manage the "mirror" connection still I think, to make sure it's in a state that it can receive messages and is actually processing queries. We could also add some metrics there as well, like latency, to show how fast the mirror is compared to the primary.

I agree that if we do this in the pgcat, the latency impact will be minimal, but it will use twice as much CPU during mirroring. As long as the host is provisioned to have enough CPU cores, we should be ok.

@levkk levkk added the enhancement New feature or request label Jan 29, 2023
@drdrsh
Copy link
Collaborator

drdrsh commented Jan 29, 2023

I am wondering if we can make it more generic to allow mirroring to multiple servers

servers = [
   [ "127.0.0.1", 5432, "primary" ],
   [ "127.0.0.1", 6432, "replica" ],
]
mirrors = [
  ["1.8.8.8", 5432, 0], # mirrors instance 0
  ["2.8.8.8", 5432, 0], # mirrors instance 0
  ["3.8.8.8", 5432, 1], # mirrors instance 1 
]

Keeping the replicas in sync is indeed an interesting problem.

@NikolayS
Copy link
Author

@levkk thanks for the quick response! Just in case, I could help benchmark the implementation to ensure that everything is as expected in terms of overhead, etc.

@drdrsh

I am wondering if we can make it more generic to allow mirroring to multiple servers

this would enable interesting use cases, like building a "lab" that has different servers (different Postgres major versions, different OSes, different hardware), sits next to production servers, and runs exactly the same workload on all at the same time, performing load testing with "real traffic" and comparing. Of course, as mentioned, if there are enough CPU resources on the server with pgcat (and if the network allows – in some cases, it can be a problem too).

drdrsh added a commit that referenced this issue Mar 10, 2023
This is an implementation of Query mirroring in PgCat (outlined here #302)

In configs, we match mirror hosts with the servers handling the traffic. A mirror host will receive the same protocol messages as the main server it was matched with.

This is done by creating an async task for each mirror server, it communicates with the main server through two channels, one for the protocol messages and one for the exit signal. The mirror server sends the protocol packets to the underlying PostgreSQL server. We receive from the underlying PostgreSQL server as soon as the data is available and we immediately discard it. We use bb8 to manage the life cycle of the connection, not for pooling since each mirror server handler is more or less single-threaded.

We don't have any connection pooling in the mirrors. Matching each mirror connection to an actual server connection guarantees that we will not have more connections to any of the mirrors than the parent pool would allow.
@drdrsh
Copy link
Collaborator

drdrsh commented Mar 10, 2023

Added in #341

@drdrsh drdrsh closed this as completed Mar 10, 2023
@drdrsh
Copy link
Collaborator

drdrsh commented Mar 10, 2023

@NikolayS FYI. The feature has been merged. I still have a follow up to expose stats from mirror traffic in SHOW * commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants