-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
createQueryStream loads a big set of data into memory before streaming #1073
Comments
@koenvanzuijlen This is the case because createQueryStream() is a streamified version of the query() method, which retrieves pages of results using job.getQueryResults(). |
@steffnay Is there an option available to stream query results directly without loading a lot of results into memory first? |
No, unfortunately, we only have the option of streamifying the |
Is there a way to limit the size of this chunk of data? Because it is causing an OOM exception for us on some tables. Can we use |
The actual |
Also having the same problem. Loading all the results in memory isn't exactly what one would expect from streaming. |
The
bigQuery.createQueryStream
seems to load an entire set of data into memory before the stream starts actually piping data into the next streams.Environment details
@google-cloud/bigquery
version: 5.10.0Steps to reproduce
Using this test script I can see over 300mb of data is loaded into memory before the stream starts piping to the next streams. And I am only selecting one column, so this is a lot of records in that case.
If I log each entry in the transform stream it also seems to come into batches. It pauses for a while and suddenly starts piping again. This makes me think internally a whole page is loaded into memory and then piped to the readable stream, but this might not be the issue.
The text was updated successfully, but these errors were encountered: