-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: traffic mirroring for load testing #302
Comments
Thank you! I agree, this is a great feature to have. I'll put it in our backlog. Thinking out loud, we could have something like this in our config to support this: servers = [
[ "127.0.0.1", 5432, "primary" ],
[ "127.0.0.1", 6432, "mirror" ],
] and for each message we send to "primary" we send the same to "mirror". We might need to manage the "mirror" connection still I think, to make sure it's in a state that it can receive messages and is actually processing queries. We could also add some metrics there as well, like latency, to show how fast the mirror is compared to the primary. I agree that if we do this in the pgcat, the latency impact will be minimal, but it will use twice as much CPU during mirroring. As long as the host is provisioned to have enough CPU cores, we should be ok. |
I am wondering if we can make it more generic to allow mirroring to multiple servers
Keeping the replicas in sync is indeed an interesting problem. |
@levkk thanks for the quick response! Just in case, I could help benchmark the implementation to ensure that everything is as expected in terms of overhead, etc.
this would enable interesting use cases, like building a "lab" that has different servers (different Postgres major versions, different OSes, different hardware), sits next to production servers, and runs exactly the same workload on all at the same time, performing load testing with "real traffic" and comparing. Of course, as mentioned, if there are enough CPU resources on the server with pgcat (and if the network allows – in some cases, it can be a problem too). |
This is an implementation of Query mirroring in PgCat (outlined here #302) In configs, we match mirror hosts with the servers handling the traffic. A mirror host will receive the same protocol messages as the main server it was matched with. This is done by creating an async task for each mirror server, it communicates with the main server through two channels, one for the protocol messages and one for the exit signal. The mirror server sends the protocol packets to the underlying PostgreSQL server. We receive from the underlying PostgreSQL server as soon as the data is available and we immediately discard it. We use bb8 to manage the life cycle of the connection, not for pooling since each mirror server handler is more or less single-threaded. We don't have any connection pooling in the mirrors. Matching each mirror connection to an actual server connection guarantees that we will not have more connections to any of the mirrors than the parent pool would allow.
Added in #341 |
@NikolayS FYI. The feature has been merged. I still have a follow up to expose stats from mirror traffic in |
@levkk Thanks for the very interesting tool, it's impressive how fast it's getting new features!
Is your feature request related to a problem? Please describe.
I wonder if it would make sense and if it wouldn't be too difficult to implement mirroring of queries coming to a node to allow testing similar to what is described in https://heapanalytics.com/blog/engineering/testing-database-changes-right-way.
Describe the solution you'd like
To be able to perform load testing using the mirroring feature on the one hand, and to reduce risks on the other, the following would be needed, I guess:
Describe alternatives you've considered
Mirroring of such kind would enable a very good way to perform load testing when performing Postgres major upgrades and similar activities. Usually, building a good benchmarking framework is a difficult task – synthetic benchmarks like pgbench, sysbench, replaying logs, traffic simulation – all these approaches have big cons, the main of which is, the resulting workload is very different from what production actually has. The mirroring approach provides exactly the same workload, so the results of the load testing are the most reliable.
Additional context
Specking of context, of course, synchronization of the state of the clone being tested is am interesting task, but it's solvable – we can start from building a physical replica for a node, then convert it to logical (there is a way to do it very fast, even for very large databases), and then perform the change we need (such as PG major upgrade, OS upgrade, etc.), then catch up in terms of the lag – and start mirroring + benchmarking.
The text was updated successfully, but these errors were encountered: