Remove COM_MULTI batching #946

bgrainger · 2021-02-14T21:26:48Z

Batching was added in #650.

An efficient implementation (using COM_MULTI) is only supported by MariaDB server, and that support is being removed in 10.6.0.

The implementation complicates the client code and will soon serve little purpose (as users upgrade away from the MariaDB server versions that support it). It also has some bugs that are only exposed on certain code paths: #937.

The open question would be whether the current MySqlBatch code is retained (which will essentially string-concatenate statements behind the scenes), or if that is removed (for simplicity) and only supported via generic ADO.NET support in .NET 6. CC @roji.

The text was updated successfully, but these errors were encountered:

roji · 2021-02-18T08:25:47Z

@bgrainger that's sad to hear, I was hoping the new ADO batching API would be beneficial in the MySQL case too...

vaintroub · 2021-05-05T21:43:27Z

new ADO batching API is not entirely new.
JDBC's Statement.addBatch/Statement.executeBatch have efficient implementations in some MySQL drivers, and probably the most sophisticated is in MariaDB JDBC, which ranges from simple semicolon batching to protocol pipelining to converting multiple insert statements to multi-insert, to using COM_BULK MariaDB protocol command.

COM_MULTI was a misunderstanding, basically same thing can be achieved with protocol pipelining, entirely in the client, by sending multiple commands and reading multiple responses, without changing anything in the server.

bgrainger · 2021-05-06T00:05:50Z

protocol pipelining, entirely in the client

If the server supports it. Aurora doesn't (#486 (comment)) but there's no server capabilities flag (AFAIK) that indicates whether it would be safe or not. With the wide varieties of back-ends out there that speak some variation of the MySQL text protocol, it would be risky to assume it's supported. It could be an opt-in behaviour with a connection string option, of course.

(This is just a quick reply; I need to spend some more time thinking about if/how MySqlBatch could be implemented with a pipelined approach, for servers that do support it.)

vaintroub · 2021-05-06T04:20:34Z

Of course there is no capability flag that would say: my server will read, and silently drop client data.. What Aurora implements is called a bug. What MariaDB JDBC implements, is pipelining by default, but a flag to disable it, for Aurora only.

roji · 2021-05-06T10:08:58Z

I know nothing about MySQL/MariaDB/Aurora... But the first time Npgsql connects to a server (host/port combination), it sends out a special query to learn certain things about the server, which may affect later behavior. You could do something similar to detect the various MySQL variants, and then assume e.g. that Aurora doesn't support batching.

Another way to approach this, would be for MySqlBatch to simply implement the pipelining behavior; using that API could mean you're opting into pipelining. Semicolon batching could still be supported inside a single MySqlCommand, or you could have some sort of flag on MySqlBatch itself to control this or whatever.

One small note about pipelining vs. batching - the two are indeed very similar but not exactly identical... For example, in PG, failure in an earlier batched command causes skipping later ones. Similarly, the batched commands run in an implicit transaction (unless an explicit transaction is already in progress). All this isn't mandatory or anything - just discussing other aspects of the feature.

If we're really discussing pure pipelining, i.e. pushing more commands into the pipe before earlier ones have completed, then that could also be implemented simply by allowing MySqlConnector to call DbCommand.ExecuteReader before a previous reader has completed. MySqlConnector would have to internally track ordering of commands (via a queue), which may be a bit more complex than a simple batching implementation; but it would also be slightly more powerful, since users don't have to gather all outgoing commands before sending them all out (i.e. they can be sent out gradually).

vaintroub · 2021-05-06T12:14:18Z

Alright, MySQL protocol is such that it is the server that sends some information already when the client connects - the version string, capability flags and such. From what I found on the internet, Aurora might be detectable with a version string, too (https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Updates.Versions.html#AuroraMySQL.Updates.EngineVersions). I did not check it myself, no AWS access right now.

There is no implicit transaction for the semicolon batch, but yes, an error would stop a batch execution, just like for PG.
In general, pipelining does not provide benefits as big as "smart" query rewriting - a prepared statement INSERT INTO t values(?,?,?) , batches with thousands of rows, in JDBC would result into single multi-row insert, which is orders of magnitudes faster than inserting rows one-by-one, with or without pipelining. This is known as query rewriting.

One more pipelining pecularity is that if there are large-ish result sets, client would still need to read them in parallel somehow, before it finished sending the full batch. Otherwise, when client only sends and does not read results, TCP buffers will fill up, and server will stuck not being able to send more data to client, and then the client will stuck not being able to send more data to server. Perhaps, if client detects there is anything on a socket, while it still writes the batch, it could stop writing for a while, and start reading results

roji · 2021-05-06T15:31:47Z

There is no implicit transaction for the semicolon batch, but yes, an error would stop a batch execution, just like for PG.

Right, but there's nothing like this with "pipeline batching", right? The PG protocol has a special Sync message whose meaning is "when there's an error, discard all messages up to this one". This provides the skip-on-error behavior.

In general, pipelining does not provide benefits as big as "smart" query rewriting - a prepared statement INSERT INTO t values(?,?,?) , batches with thousands of rows, in JDBC would result into single multi-row insert, which is orders of magnitudes faster than inserting rows one-by-one, with or without pipelining. This is known as query rewriting.

FWIW I'm personally against this form of optimization - the driver should not be in the business of optimizing the user's SQL, that's the user's responsibility (why not have the driver analyze and change your SQL to do more efficient JOINs too...). Parsing SQL at the driver also has an overhead which slows everything down. AFAIK this kind of rewriting is mostly relevant for bulk insert (as in your example); but if you're doing this and really care about perf, a separate bulk insert protocol would probably be more appropriate anyway (assuming MySQL has one).

One more pipelining pecularity is that if there are large-ish result sets, client would still need to read them in parallel somehow,

The exact same deadlock situation exists in PostgreSQL - you cannot first finish writing, and only then allow the user to start reading. The Npgsql details are a bit complex, but the tl;dr is that writing of a batch is done asynchronously and in parallel, in the background as the user is reading results from earlier queries in the batch.

vaintroub · 2021-05-06T17:35:50Z

we don't have a special batching marker. The batching worked just out of the box, but yes, a "batch start" and "batch end" commands would be interesting

FWIW, I'm all for "whatever works best" wrt to optimization. For example, prepared statements in MySQL, and by extension in MariaDB are suboptimally implemented, and are not much more than memory hog on the server (a parsed string, more or less, once per connection, and optimizer does not reuse the plans even for the same connection). Thus, most of connectors implement client-side prepare, i.e replacing question marks with actual, correctly escaped data. This also turns out helpful when building a multi-insert strings for batching.
Having said that, MariaDB (but not MySQL) also has a COM_BULK protocol command for prepared statements, where you can bind multiple rows for a single server-side prepared statement. This is not really suitable for terabytes of data, since server will load full payload into memory, but it spares some logic on the client , and works with UPDATE too

roji mentioned this issue Aug 13, 2021

Implement new ADO.NET batching API #650

Closed

bgrainger mentioned this issue Feb 13, 2022

sphinxql: syntax error, unexpected ';', expecting $end near ';' #1133

Closed

bgrainger mentioned this issue Oct 29, 2023

Error in MySqlBatch leaves connection in bad state with MariaDB #937

Closed

bgrainger closed this as completed in 5ad4cf3 Oct 29, 2023

bgrainger added this to the 2.3 milestone Nov 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove COM_MULTI batching #946

Remove COM_MULTI batching #946

bgrainger commented Feb 14, 2021

roji commented Feb 18, 2021

vaintroub commented May 5, 2021 •

edited

Loading

bgrainger commented May 6, 2021

vaintroub commented May 6, 2021 •

edited

Loading

roji commented May 6, 2021

vaintroub commented May 6, 2021 •

edited

Loading

roji commented May 6, 2021

vaintroub commented May 6, 2021 •

edited

Loading

Remove COM_MULTI batching #946

Remove COM_MULTI batching #946

Comments

bgrainger commented Feb 14, 2021

roji commented Feb 18, 2021

vaintroub commented May 5, 2021 • edited Loading

bgrainger commented May 6, 2021

vaintroub commented May 6, 2021 • edited Loading

roji commented May 6, 2021

vaintroub commented May 6, 2021 • edited Loading

roji commented May 6, 2021

vaintroub commented May 6, 2021 • edited Loading

vaintroub commented May 5, 2021 •

edited

Loading

vaintroub commented May 6, 2021 •

edited

Loading

vaintroub commented May 6, 2021 •

edited

Loading

vaintroub commented May 6, 2021 •

edited

Loading