-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove COM_MULTI batching #946
Comments
@bgrainger that's sad to hear, I was hoping the new ADO batching API would be beneficial in the MySQL case too... |
new ADO batching API is not entirely new. COM_MULTI was a misunderstanding, basically same thing can be achieved with protocol pipelining, entirely in the client, by sending multiple commands and reading multiple responses, without changing anything in the server. |
If the server supports it. Aurora doesn't (#486 (comment)) but there's no server capabilities flag (AFAIK) that indicates whether it would be safe or not. With the wide varieties of back-ends out there that speak some variation of the MySQL text protocol, it would be risky to assume it's supported. It could be an opt-in behaviour with a connection string option, of course. (This is just a quick reply; I need to spend some more time thinking about if/how |
Of course there is no capability flag that would say: my server will read, and silently drop client data.. What Aurora implements is called a bug. What MariaDB JDBC implements, is pipelining by default, but a flag to disable it, for Aurora only. |
I know nothing about MySQL/MariaDB/Aurora... But the first time Npgsql connects to a server (host/port combination), it sends out a special query to learn certain things about the server, which may affect later behavior. You could do something similar to detect the various MySQL variants, and then assume e.g. that Aurora doesn't support batching. Another way to approach this, would be for MySqlBatch to simply implement the pipelining behavior; using that API could mean you're opting into pipelining. Semicolon batching could still be supported inside a single MySqlCommand, or you could have some sort of flag on MySqlBatch itself to control this or whatever. One small note about pipelining vs. batching - the two are indeed very similar but not exactly identical... For example, in PG, failure in an earlier batched command causes skipping later ones. Similarly, the batched commands run in an implicit transaction (unless an explicit transaction is already in progress). All this isn't mandatory or anything - just discussing other aspects of the feature. If we're really discussing pure pipelining, i.e. pushing more commands into the pipe before earlier ones have completed, then that could also be implemented simply by allowing MySqlConnector to call DbCommand.ExecuteReader before a previous reader has completed. MySqlConnector would have to internally track ordering of commands (via a queue), which may be a bit more complex than a simple batching implementation; but it would also be slightly more powerful, since users don't have to gather all outgoing commands before sending them all out (i.e. they can be sent out gradually). |
Alright, MySQL protocol is such that it is the server that sends some information already when the client connects - the version string, capability flags and such. From what I found on the internet, Aurora might be detectable with a version string, too (https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Updates.Versions.html#AuroraMySQL.Updates.EngineVersions). I did not check it myself, no AWS access right now. There is no implicit transaction for the semicolon batch, but yes, an error would stop a batch execution, just like for PG. One more pipelining pecularity is that if there are large-ish result sets, client would still need to read them in parallel somehow, before it finished sending the full batch. Otherwise, when client only sends and does not read results, TCP buffers will fill up, and server will stuck not being able to send more data to client, and then the client will stuck not being able to send more data to server. Perhaps, if client detects there is anything on a socket, while it still writes the batch, it could stop writing for a while, and start reading results |
Right, but there's nothing like this with "pipeline batching", right? The PG protocol has a special Sync message whose meaning is "when there's an error, discard all messages up to this one". This provides the skip-on-error behavior.
FWIW I'm personally against this form of optimization - the driver should not be in the business of optimizing the user's SQL, that's the user's responsibility (why not have the driver analyze and change your SQL to do more efficient JOINs too...). Parsing SQL at the driver also has an overhead which slows everything down. AFAIK this kind of rewriting is mostly relevant for bulk insert (as in your example); but if you're doing this and really care about perf, a separate bulk insert protocol would probably be more appropriate anyway (assuming MySQL has one).
The exact same deadlock situation exists in PostgreSQL - you cannot first finish writing, and only then allow the user to start reading. The Npgsql details are a bit complex, but the tl;dr is that writing of a batch is done asynchronously and in parallel, in the background as the user is reading results from earlier queries in the batch. |
we don't have a special batching marker. The batching worked just out of the box, but yes, a "batch start" and "batch end" commands would be interesting FWIW, I'm all for "whatever works best" wrt to optimization. For example, prepared statements in MySQL, and by extension in MariaDB are suboptimally implemented, and are not much more than memory hog on the server (a parsed string, more or less, once per connection, and optimizer does not reuse the plans even for the same connection). Thus, most of connectors implement client-side prepare, i.e replacing question marks with actual, correctly escaped data. This also turns out helpful when building a multi-insert strings for batching. |
Batching was added in #650.
An efficient implementation (using
COM_MULTI
) is only supported by MariaDB server, and that support is being removed in 10.6.0.The implementation complicates the client code and will soon serve little purpose (as users upgrade away from the MariaDB server versions that support it). It also has some bugs that are only exposed on certain code paths: #937.
The open question would be whether the current
MySqlBatch
code is retained (which will essentially string-concatenate statements behind the scenes), or if that is removed (for simplicity) and only supported via generic ADO.NET support in .NET 6. CC @roji.The text was updated successfully, but these errors were encountered: