Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: Relational: Use SequentialAccess when reading data? #885

Open
Tracked by #22953
anpete opened this issue Oct 15, 2014 · 28 comments
Open
Tracked by #22953

Query: Relational: Use SequentialAccess when reading data? #885

anpete opened this issue Oct 15, 2014 · 28 comments

Comments

@anpete
Copy link
Contributor

anpete commented Oct 15, 2014

Possibly use it if its faster.

@smitpatel
Copy link
Member

If we are sorting Primary key properties to appear first in GetProperties then new materializer would be accessing in sequence only.

@AndriySvyryd AndriySvyryd changed the title Query: Relational: Sync: Use SequentialAccess when reading data? Query: Relational: Use SequentialAccess when reading data? Aug 22, 2019
@AndriySvyryd
Copy link
Member

Compare sync and async and make this configurable.

@roji
Copy link
Member

roji commented Aug 23, 2019

It's not clear to me that there's a real perf advantage here. Sequential mode is important mainly for big rows (e.g. blobs); smaller rows are likely to be buffered in memory by the ADO driver and sequential access would have no advantage (Npgsql the default buffering is 8k).

In fact, at least in Npgsql there's a good chance that sequential access is slightly slower, since when reading values in non-sequential mode we can assume data is already in memory (and for example avoid code paths that involve async state machines).

@AndriySvyryd
Copy link
Member

@roji It varies from provider to provider and with the data being read, but in EF6 with SqlClient we've seen significant differences in throughput and allocations making this a useful knob to have.

@roji
Copy link
Member

roji commented Aug 23, 2019

Agreed that it's a useful knob, probably controlled on a per-provider basis. This probably indicates a perf issue in SqlClient more than anything, although that's another problem.

@roji
Copy link
Member

roji commented Oct 4, 2019

See #18221 which raises this as a perf issue, and especially #18221 (comment) and dotnet/SqlClient#245. tl;dr it seems that an odd "behavior" makes async reading of large fields slow regardless of command behavior.

One last note: CommandBehavior.Sequential makes sense for large values. On SQL Server the provider can reasonably "guess" that a property will contain large values based on its store type: VARCHAR(MAX) would probably be a good candidate, whereas VARCHAR(255) obviously would not be. This doesn't necessarily hold for other providers: in PostgreSQL users are encouraged to just use the TEXT store type for all strings, regardless of length, so it would probably be a bad idea to always access it with SequentialAccess. Maybe an additional user hint makes sense here.

@tomasbruckner
Copy link

@roji Hi, it looks like SqlClient has fixed this issue in version 1.1.0. Will it be possible to upgrade the SqlClient package for SqlServer 2.x? It is a big problem in our project right now.

@roji
Copy link
Member

roji commented Nov 26, 2019

@tomasbruckner you can take a dependency on the new version in your csproj by adding an explicit to it - everything should work.

@roji
Copy link
Member

roji commented Nov 26, 2019

FYI the version has already been updated in our master branch.

@AndriySvyryd
Copy link
Member

2.x doesn't depend on Microsoft.Data.SqlClient, so upgrading it will have no effect.

@roji
Copy link
Member

roji commented Nov 26, 2019

Oops, missed that this was about 2.x. For now you'll have to upgrade to 3.x to use Microsoft.Data.SqlClient (any version).

@tomasbruckner
Copy link

Oh my bad, I have seen SqlClient in dependencies in Nuget, but it is System.Data.SqlClient instead of Microsoft.Data.SqlClient. Thanks for clarifying.

@AndriySvyryd
Copy link
Member

Note to implementor: Compare LOH allocations

@panoskj
Copy link

panoskj commented Mar 1, 2021

In some cases, async with SequentialAccess is much faster, even with the most recent releases of EFCore and SqlClient. See #24290 as an example.

It appears that the shaper/materializer already accesses columns sequentially, but it (needlessly) reads some of them twice (the primary keys). Edit: this isn't true for complex queries.

So I tried to make a wrapper for DbDataReader, which simply caches the most recently read value, and wrapped the actual reader using an interceptor, like I did in #24290. The results: SequentialAccess worked without errors and its performance matched the default sync version.

As a side note, this bug affected the performance of my queries regardless of the actual size of the data. Even with a completely empty varchar(max) column, there is a great difference between the default sync and async versions.

@OzBob
Copy link

OzBob commented Jun 5, 2024

@anpete or @ajcvickers - update the description of the issue to reflect the major performance impact when the target column is nvarchar(max) as reported by @panoskj in #24290

@roji
Copy link
Member

roji commented Jun 5, 2024

@OzBob we wouldn't be implementing SequentialAccess in order to work around the SqlClient performance issue (dotnet/SqlClient#593), which is basically a sort of bug. That needs to be fixed at that level.

@panoskj
Copy link

panoskj commented Jun 6, 2024

@roji @OzBob It turns out the performance boost is because EFCore will actually execute the sync code of SqlClient instead of the async when sequential access is enabled.

A more detailed explanation would be that while EFCore calls SqlDataReader.ReadAsync for each row, it doesn't call GetFieldValueAsync to access the columns but GetFieldValue instead. If I remember correctly, when sequential access is not enabled, ReadAsync ends up loading all columns first. But when sequential access is enabled, ReadAsync doesn't load any columns at all and the next call to any variant of GetFieldValue will have to load its results.

To sum up, if EFCore used GetFieldValueAsync instead of GetFieldValue, then using sequential access would not improve the situation with SqlClient's async performance.

This doesn't mean EFCore doesn't have room for improvement. For example, sequential access should help reduce allocations for some specific use cases where large strings are loaded from the database, because SqlClient will not have to buffer the whole row but just one column at a time. Of course without measuring, this is just a guess. I also find it weird that GetFieldValueAsync is not used in the async path and that the same columns may be read multiple times as mentioned in #24290.

EDIT: I just noticed that @roji said in another comment:

when reading values in non-sequential mode we can assume data is already in memory (and for example avoid code paths that involve async state machines).

So this is probably the explanation why GetFieldValue is used instead of GetFieldValueAsync. Though I'm not sure whether this behavior is true in all cases and for all supported providers and it is very hard to notice if GetFieldValue ever blocks in there, like it happened in my case. Is there any documentation indicating that the whole row has to be loaded in memory when DbDataReader.Read returns in non-sequential mode? I'm asking because this behavior could change in order to optimize performance, unless it is documented.

@roji
Copy link
Member

roji commented Jun 6, 2024

@panoskj if memory serves, one main reason why EF doesn't use async I/O when reading columns, is that the code reading columns (the materializer) is code-generated with LINQ expression trees, which do not support async. I haven't personally looked into this extensively, but AFAIK we're basically blocked here by a .NET limitation.

For example, sequential access should help reduce allocations for some specific use cases where large strings are loaded from the database, because SqlClient will not have to buffer the whole row but just one column at a time.

That's possible, depending on exactly how SqlClient implements things internally.

Note that where sequential access really shines, is e.g. when huge binary data is coming back from the database, and a streaming DbDataReader API is used to reader it - that indeed allows never having the entire huge column in memory at the same time. But in EF's case, EF really does need to read the entire data as a byte[], since that's what it needs to inject to the user's .NET type (it's an ORM). So that doesn't apply here AFAICT.

Though I'm not sure whether this behavior is true in all cases and for all supported providers and it is very hard to notice if GetFieldValue ever blocks in there, like it happened in my case.

Are you saying you know for a fact that SqlClient can perform I/O when reading a column, even when the reader is in non-sequential mode? Can you point to code/docs/whatever that show this?

Is there any documentation indicating that the whole row has to be loaded in memory when DbDataReader.Read returns in non-sequential mode?

This behavior isn't standardized/documented as far as I'm aware, no.

@panoskj
Copy link

panoskj commented Jun 6, 2024

Are you saying you know for a fact that SqlClient can perform I/O when reading a column, even when the reader is in non-sequential mode?

Oh no I meant when I used it with sequential mode, it was quite hard to figure out there was synchronous IO (because it happens inside the materializer). So I can imagine if this behavior changed in some implementation, it's possible the issue wouldn't be noticed. Though it sounds very unlikely.

Note that where sequential access really shines, is e.g. when huge binary data is coming back from the database, and a streaming DbDataReader API is used to reader it - that indeed allows never having the entire huge column in memory at the same time. But in EF's case, EF really does need to read the entire data as a byte[], since that's what it needs to inject to the user's .NET type (it's an ORM). So that doesn't apply here AFAICT.

But the DbDataReader has to give a copy of the byte array when in non-sequential mode, because the same column can be read again and the array given initially can be modified. I think there has to be at least twice the memory usage than the theoretically optimal plus unnecessary copying. I'm not sure whether the same would apply to strings actually (because they are immutable).

@roji
Copy link
Member

roji commented Jun 6, 2024

But the DbDataReader has to give a copy of the byte array when in non-sequential mode, because the same column can be read again and the array given initially can be modified.

It's very likely that DbDataReader has to give a copy in any case - the database driver typically has some sort of raw byte buffer internally, which contains data as it came in via the network. When the consumer requests a byte array by calling DbDataReader.GetFieldValue<byte[]>(), the driver needs to copy that data from the internal buffer to a new array that gets returned to the user. Note that I have no concrete knowledge on exactly what SqlClient does here, but this is e.g. how things work in Npgsql and it's hard to imagine it working very differently.

Note that for all other data types (strings, ints) the data must also be parsed/decoded from the internal raw buffered data to the actual CLR type that needs to get returned to the user (.NET string, int). For byte[] in theory it's possible to return a view directly into the driver's internal buffer (e.g. a ReadOnlyMemory<byte>, but in practice that gets very complicated very soon: the internal buffer can no longer be reused or returned to the pool, since there's a user-given view that references it.

Does that all make sense?

@panoskj
Copy link

panoskj commented Jun 6, 2024

Yes it makes sense. I wasn't thinking about ReadOnlyMemory or anything related, I know it would be complicated to use that instead.

What I was thinking is, when each packet is parsed in sequential mode, you don't have to keep "some sort of raw byte buffer", you can simply parse the packets and return the results without buffering anything. Of course I know it's easier said than done.

Also, I don't know how Npgsql works, but having looked at SqlClient I can say it has a pretty complicated parsing logic. For example, a byte array is not simply the same bytes as received from the network, it is split into packets (logical packets, if that makes sense, not network packets) with headers, so it has to be parsed and saved in a new buffer before it can be returned to the user.

@roji
Copy link
Member

roji commented Jun 7, 2024

What I was thinking is, when each packet is parsed in sequential mode, you don't have to keep "some sort of raw byte buffer", you can simply parse the packets and return the results without buffering anything.

That's not really how network I/O works - there's always a raw memory buffer somewhere inside the driver/client application, into which network data is read. For example, you can read from the network via NetworkStream.ReadAsync(), which like all stream methods, accepts a byte[] into which data is written (other methods for reading I/O are conceptually similar).

I think what you're trying to say, is that buffered data (raw or otherwise) can be released earlier with SequentialAccess, since once the user has read a column, they can no longer read it again (unlike without SequentialAccess, where one can go back and forth, and read the same column multiple times). That's true, and could in theory help memory pressure, but as long as the row is generally consumed in one go and then released (by calling ReadAsync() to move to the next row) - which is what EF generally does - I can't see any actual impact here.

In any case, to summarize, at this point I'm not sure exactly what benefits SequentialAccess would bring to EF specifically (it can definitely help for various non-EF scenarios). There may be some specific driver bugs/issues where using SequentialAccess may help, but we generally try to not introduce a lot of complexity into EF for working around lower-level bugs - these should be fixed at their level instead.

@panoskj
Copy link

panoskj commented Jun 7, 2024

I think what you're trying to say, is that buffered data (raw or otherwise) can be released earlier with SequentialAccess

Yes, but there is more to it. Suppose you are trying to read a 10MB byte array column. Based on what you say, my understanding is in Npgsql you would read the whole row into your raw network buffer (10MB total memory usage), create a copy of the column (now 20MB total memory usage), give the copy back to the user and finally release your own buffer (back to 10MB memory usage). Is this correct?

The alternative would be that when you receive the first packet indicating that you are parsing the large column, instead of saving it into your raw network buffer, you save it directly into a new buffer that can be returned to the user. This way you can stay at 10MB max memory usage at all times. Does this example make sense? The conclusion is you don't simply release memory earlier this way, but you allocate 50% less memory.

Let me explain how SqlClient handles this as another example:

SqlClient by default uses the native SNI library for networking. The SNI library has its own thread, receiving packets (e.g. 8KB) from the network. This is the first buffering layer. SqlClient will occasionally try to read a packet from the network, providing a buffer long enough for the packet to fit, and SNI will copy the next packet into it while releasing it from its own buffer. By the way, this doesn't block most of the time (because SNI tries to stay ahead of SqlClient).

So at this point, SqlClient has a buffer with the packet. Note that SqlClient does not copy packets into contiguous buffers like you would do - each packet has its own independent buffer. In synchronous mode, SqlClient doesn't even need to keep the previous packets buffered, it only needs the packet being processed. That's because when SqlClient successfully parses a column, it will save the result into another buffer, which contains the CLR data type, I'll call this the CLR buffer. That's why it can discard the packet buffer and the user can still read previous columns of the same row. This also means the packet buffer can be reused, reducing garbage collections (looks like they optimized the sync way but when trying to reuse the same code for async they ended up doing all kinds of shenanigans to make it work without radical changes).

You may think in the case of byte arrays there is no need to copy data from the packet buffer into the CLR buffer, but this isn't the case. The TDS protocol has something called PLP data types. This basically means that when transmitting a byte array through the network, the byte array can be split into smaller chunks, each chunk having its own header. So you can't simply copy the data and give it to the user, you have to parse it as well. By storing the parsed array in a new buffer, SqlClient doesn't have to parse the array again if you try to read it multiple times.

If I remember correctly, there is one more reason why the raw network stream can't be used directly: each packet has its own header and metadata which has to be checked and removed before its data can be used. It's more like a stream inside a stream.

I know SqlClient's implementation could be improved, but at the same time I think specifying SequentialAccess would allow DbDataReader implementations to perform more optimizations. I'm not saying SequentialAccess should be enabled to work around driver specific issues, I'm just saying it would be nice to have as an option that the user can set on a case-by-case basis with an appropriate default depending on the driver. I also believe any driver can be made faster (or less memory consuming) in sequential than non-sequential mode, if it isn't already.

@roji
Copy link
Member

roji commented Jun 7, 2024

The alternative would be that when you receive the first packet indicating that you are parsing the large column, instead of saving it into your raw network buffer, you save it directly into a new buffer that can be returned to the user. This way you can stay at 10MB max memory usage at all times. Does this example make sense? The conclusion is you don't simply release memory earlier this way, but you allocate 50% less memory.

It's worth clarifying exactly what you mean by "packet" - TCP or some protocol framing. In any case, at the low-level networking level, you don't receive a packet indicating that large data is about to come; you're just streaming in low-level PG protocol data, which includes both control headers and data interleaved together. All that is read into an internal buffer, and from that the driver needs to copy the column data into a byte[] for that data, to be provided to the user. In other words, reads from the network don't arrive structured in a way that matches PG protocol data (e.g. columns).

I'm vaguely familiar with how TDS works, and only quite vaguely with how SqlClient works; I suspect there's at least some copying going on inside SqlClient that's probably quite inefficient. Without going into those details though, regardless of concerns such as protocol framing and layering, at the very basic level you read TCP data (via e.g. NetworkStream) into a raw buffer somewhere; at that point you haven't yet parsed any protocol framing or anything like that. When the user requests a value, you should then be able to parse it out of that low-level buffer, without any additional intermediate copying, e.g. into "packet buffers". In other words, if you have anything more than a single "copy" from the raw binary data as it comes out of NetworkStream and into the actual CLR type you're going to hand off to the user, you're probably doing something inefficient.

Note that I wrote "copy" in quotes because for almost all data types, this is in fact parsing/decoding. For example, PG typically sends strings in UTF8, and so Npgsql decodes that directly from its internal raw data buffer to a string which it gives to the user. In that sense, there are actually zero copies inside the driver - just a single decoding from the internal buffer. byte[] can seem a bit confusing since the data doesn't require parsing/decoding, but as we discussed above lifecycle/mutability issues prevent returning a view into the internal buffer.

To summarize, I'd recommend carefully distinguishing here between SqlClient's internal implementation and possible performance issues that come out of that, and using SequentialAccess at the ADO.NET level. It may very well be that in the SqlClient case, doing SequentialAccess in EF could help in some way (I have no idea), but if it is, that's likely to be more indicative of a SqlClient perf issue than a general thing.

@panoskj
Copy link

panoskj commented Jun 7, 2024

It's worth clarifying exactly what you mean by "packet"

Fair question. What I meant there is the chunk of bytes you read from the TCP socket/stream. Though you didn't answer my own question:

Based on what you say, my understanding is in Npgsql you would read the whole row into your raw network buffer (10MB total memory usage), create a copy of the column (now 20MB total memory usage), give the copy back to the user and finally release your own buffer (back to 10MB memory usage). Is this correct?

The whole question boils down to this: how much memory do you need to allocate in order to read a single row, containing columns of N total size in bytes and return the result to the user?

In sequential mode, you should need something like N + (some fixed size buffer such as 8KB) ideally. For example, you start with reading 8KB from the TCP socket/stream and parsing them. At some point inside the first 8KB you will probably find the first column's data, so you start copying that data into a new buffer that will be returned to the user. When you finish with this packet, you read the next 8KB from the socket and repeat. Therefore the total memory allocated can be as low as N + (some fixed size buffer such as 8KB) in sequential mode. In non-sequential mode the theoretically minimum memory usage is more than 2N. Just because you release that extra memory right after returning the results, it doesn't mean there was no cost. You literally allocated twice the memory than needed and then released half of it. Edit: I noticed this shouldn't be such a big problem if multiple rows are read and you reuse the "extra memory". Still it's not optimal.

By the way, the TCP socket can also have its own buffering (fetching next packets before needed, all sockets I have used do this by default). But the buffer size is not dependent on N.

PS: I know SqlClient's implementation is bad. I'm debating about what the theoretically best possible implementation is with sequential vs non-sequential mode.

@roji
Copy link
Member

roji commented Jun 7, 2024

@panoskj you're right, I got somewhat sidetracked/confused by the additional copying/SqlClient question above. As you say, at the end of the day, the driver's internal buffering requirements can be fixed (and small) when using SequentialAccess, and when SequentialAccess isn't in use, are generally as large as the largest row in the resultset. As you wrote, recycling the buffer (or pooling) helps considerably, but at the end of the day non-sequential mode requires more memory internally in the driver.

This is indeed exactly how Npgsql operates today (with an 8K internal read buffer by default).

It's worth mentioning that there may be some other aspects to it though... If we assume that non-SequentialAccess prebuffers the row (as it does in Npgsql, at least), that means that any column reads after the row's ReadAsync() never perform I/O, since columns are guaranteed to be buffered in memory. This allows using the sync APIs (e.g. GetFieldValue<int> rather than GetFieldValueAsync<int>) to read columns, which, if the driver's implementation is highly optimized, can be more efficient (less async state machines involved). In a world where there are many small columns (as opposed to huge columns, which are rather the exception), this can have an impact, and possibly tip the scales in favor of non-SequentialAccess. On the other hand, there may be an advantage in allowing earlier columns to be read before later columns in the row have even been delivered to the client, rather than sitting around and waiting for the entire row to arrive in ReadAsync().

Note that I haven't actually benchmarked the above - the point is only that this is a complex subject, and the answer is likely to be one choice of trade-offs over another, rather than a clear and definite answer.

@panoskj
Copy link

panoskj commented Jun 7, 2024

I agree there is a tradeoff here, which is why I think it should be an option for the end user.

By the way, in sync mode, there seems to be no downside reading sequentially by default (except for the current materializer implementation reading non-sequentially). But I haven't benchmarked any of these theories either.

As for using GetFieldValue instead of GetFieldValueAsync to reduce async state machines, I'm not really sure about the impact. I get your point and it may be true for some or most cases, but it's something I would test if I were to make this choice. In addition, I suspect whatever optimization you can do to reduce async calls in non-sequential mode, you can probably apply in sequential mode as well. Things like checking if the data is already available, calling the synchronous path and returning Task.FromResult from the driver comes to mind. It still involves more checks to be done but I have no idea how they compare to the greater picture. Note the caller is already running an async state machine (there is a call to ReadAsync prior to running the materializer).

But as you said, System.Linq.Expressions doesn't support async methods. Though it supports declaring variables (programmatically) so you could make the materializer read the columns sequentially with relatively low effort. I mention variables because apparently some columns are read multiple times by the current implementation.

@roji
Copy link
Member

roji commented Jun 7, 2024

In addition, I suspect whatever optimization you can do to reduce async calls in non-sequential mode, you can probably apply in sequential mode as well.

That's at least partially true for sure, and inside Npgsql we indeed make a lot of effort to avoid async state machines where we know the data is already buffered, even when the async API is used.

Note the caller is already running an async state machine (there is a call to ReadAsync prior to running the materializer).

Of course, but it's a matter of reducing the number of such state machines (the caller's state machine is totally orthogonal to any additional state machine we'd introduce - or not - inside the driver).

But as you said, System.Linq.Expressions doesn't support async methods.

Yeah, at the end of the day this is the crux of it. If we didn't have this hurdle, there's a good chance we'd have already looked into this and possibly even made the switch; but this makes it considerably more complex to do (and possibly also to do in an efficient way).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants